Re: Setting some clocks back to DUMMY fixes spdif output on imx6q wandboard rev B1

2016-08-30 Thread Nicolin Chen
On Tue, Aug 30, 2016 at 01:14:14PM +0200, Xavi Drudis Ferran wrote:

> linux-libre-4.7 without my patch, i.e. clocks defined like this :
> arch/arm/boot/dts/imx6qdl.dtsi:
> aips-bus@0200 { /* AIPS1 */
> [...]
>spba-bus@0200 {
> [...]
>spdif: spdif@02004000 {
>   clocks = < IMX6QDL_CLK_SPDIF_GCLK>, < IMX6QDL_CLK_OSC>,
>< IMX6QDL_CLK_SPDIF>, < IMX6QDL_CLK_ASRC>,
>< IMX6QDL_CLK_DUMMY>, < IMX6QDL_CLK_ESAI_EXTAL>,
>< IMX6QDL_CLK_IPG>, < IMX6QDL_CLK_MLB>,
>< IMX6QDL_CLK_DUMMY>, < IMX6QDL_CLK_SPBA>;
>   clock-names = "core",  "rxtx0",
> "rxtx1", "rxtx2",
> "rxtx3", "rxtx4",
> "rxtx5", "rxtx6",
> "rxtx7", "spba";
> [...]

> [9.376398] fsl-spdif-dai 2004000.spdif: use rxtx6 as tx clock source for 
> 44100Hz sample rate
> [9.376404] fsl-spdif-dai 2004000.spdif: use txclk df 94 for 44100Hz 
> sample rate
> [9.376409] fsl-spdif-dai 2004000.spdif: the best rate for 44100Hz sample 
> rate is 43882Hz

Without your patch, it chose rxtx6 (MLB) as the source for 44.1KHz.

> linux-libre-4.7 with my patch, i.e. clocks defined like this :
> arch/arm/boot/dts/imx6qdl.dtsi:
> aips-bus@0200 { /* AIPS1 */
> [...]
>spba-bus@0200 {
> [...]
>   spdif: spdif@02004000 {
>   clocks = < IMX6QDL_CLK_SPDIF_GCLK>, < IMX6QDL_CLK_OSC>,
>< IMX6QDL_CLK_SPDIF>, < IMX6QDL_CLK_DUMMY>,
>< IMX6QDL_CLK_DUMMY>, < IMX6QDL_CLK_DUMMY>,
>< IMX6QDL_CLK_DUMMY>, < IMX6QDL_CLK_DUMMY>,
>< IMX6QDL_CLK_DUMMY>, < IMX6QDL_CLK_SPBA>;
>   clock-names = "core",  "rxtx0",
> "rxtx1", "rxtx2",
> "rxtx3", "rxtx4",
> "rxtx5", "rxtx6",
> "rxtx7", "spba";
> [...]

> [6.662922] fsl-spdif-dai 2004000.spdif: use rxtx1 as tx clock source for 
> 44100Hz sample rate
> [6.662927] fsl-spdif-dai 2004000.spdif: use txclk df 9 for 44100Hz sample 
> rate
> [6.662932] fsl-spdif-dai 2004000.spdif: the best rate for 44100Hz sample 
> rate is 43859Hz

With your patch, it selects rxtx1 (the dedicated SPDIF baud clock).

> Does it mean that a 43859Hz clock is close enough to theoretical 44100Hz
> but  43882Hz is not ? 

No, the problem is not at the rate but the source -- Although the
MLB clock exists in the clock tree as a better rate provider, it
might not be correctly enabled or running at the rate it claims.

> Maybe there's something wrong with rxtx6 (IMX6QDL_CLK_MLB). This clock

Yes.

> does not seem to be used elsewhere (I mean in files, it's used in any
> board that includes imx6qdl.dtsi)
> 
> 
> include/dt-bindings/clock/imx6qdl-clock.h: 
> #define IMX6QDL_CLK_MLB   139
 
> Might it have to do with the fact I'm using (still trying in fact) to use 
> etnaviv ?
> 
> drivers/clk/imx/clk-imx6q.c:
> 
>if (clk_on_imx6dl())
>   /*
>* The multiplexer and divider of the imx6q clock gpu2d get
>* redefined/reused as mlb_sys_sel and mlb_sys_clk_podf on imx6dl.
>*/
>clk[IMX6QDL_CLK_MLB] = imx_clk_gate2("mlb",
> "gpu2d_core_podf",   base + 0x74, 18);
>else
>clk[IMX6QDL_CLK_MLB] = imx_clk_gate2("mlb","axi",  
>  base + 0x74, 18);
> 
> 
> But I'm on a imx6q not imx6dl .

There are five MLB clocks sharing the same clock gate according
to CCM chapter in the Reference Manual of imx6q. But five clocks
come from three different parent clocks, and I am wondering if
the MLB clock that's connected to the S/PDIF module is really
derived from this AXI.

Hope Fabio might be able to help on the clock tree issue here:)

> --- linux-4.7-no-spdif-out/arch/arm/boot/dts/imx6qdl.dtsi 2016-07-25 
> 00:19:43.0 +0200
> +++ linux-4.7/arch/arm/boot/dts/imx6qdl.dtsi  2016-08-30 12:51:37.369431791 
> +0200
> @@ -242,7 +242,7 @@
>   clocks = < 
> IMX6QDL_CLK_SPDIF_GCLK>, < IMX6QDL_CLK_OSC>,
>< IMX6QDL_CLK_SPDIF>, 
> < IMX6QDL_CLK_ASRC>,
>< IMX6QDL_CLK_DUMMY>, 
> < IMX6QDL_CLK_ESAI_EXTAL>,
> -  < IMX6QDL_CLK_IPG>, 
> < IMX6QDL_CLK_MLB>,
> +  < IMX6QDL_CLK_IPG>, 
> < IMX6QDL_CLK_DUMMY>,

As MLB might be gated or not available at all, disabling it is a
quick work around.
 
> AFAICS it just uses rxtx5 (IMX6QDL_CLK_IPG) for 32KHz and gets a little 
> closer to that. 
> But I haven't tried to play at 32KHz
> 
> Is there anything else I can try ? 

Another solution for you could be to change the rates of two of
those existing clocks to the perfect rates for 44.1KHz and 48KHz
respectively, 22579200Hz and 24576000Hz for example. (If you

Re: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of CMA

2016-08-30 Thread Alexey Kardashevskiy
On 14/07/16 14:25, Balbir Singh wrote:
> 
> From: Balbir Singh 
> Subject: [RESEND][v2][PATCH] KVM: PPC: Book3S HV: Migrate pinned pages out of 
> CMA
> 
> When PCI Device pass-through is enabled via VFIO, KVM-PPC will
> pin pages using get_user_pages_fast(). One of the downsides of
> the pinning is that the page could be in CMA region. The CMA
> region is used for other allocations like the hash page table.
> Ideally we want the pinned pages to be from non CMA region.
> 
> This patch (currently only for KVM PPC with VFIO) forcefully
> migrates the pages out (huge pages are omitted for the moment).
> There are more efficient ways of doing this, but that might
> be elaborate and might impact a larger audience beyond just
> the kvm ppc implementation.
> 
> The magic is in new_iommu_non_cma_page() which allocates the
> new page from a non CMA region.
> 
> I've tested the patches lightly at my end, but there might be bugs
> For example if after lru_add_drain(), the page is not isolated
> is this a BUG?
> 
> Previous discussion was at
> http://permalink.gmane.org/gmane.linux.kernel.mm/136738
> 
> Cc: Benjamin Herrenschmidt 
> Cc: Michael Ellerman 
> Cc: Paul Mackerras 
> Cc: Alexey Kardashevskiy 
> 
> Signed-off-by: Balbir Singh 



Acked-by: Alexey Kardashevskiy 



> ---
>  arch/powerpc/include/asm/mmu_context.h |  1 +
>  arch/powerpc/mm/mmu_context_iommu.c| 80 
> --
>  2 files changed, 77 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/mmu_context.h 
> b/arch/powerpc/include/asm/mmu_context.h
> index 9d2cd0c..475d1be 100644
> --- a/arch/powerpc/include/asm/mmu_context.h
> +++ b/arch/powerpc/include/asm/mmu_context.h
> @@ -18,6 +18,7 @@ extern void destroy_context(struct mm_struct *mm);
>  #ifdef CONFIG_SPAPR_TCE_IOMMU
>  struct mm_iommu_table_group_mem_t;
>  
> +extern int isolate_lru_page(struct page *page);  /* from internal.h */
>  extern bool mm_iommu_preregistered(void);
>  extern long mm_iommu_get(unsigned long ua, unsigned long entries,
>   struct mm_iommu_table_group_mem_t **pmem);
> diff --git a/arch/powerpc/mm/mmu_context_iommu.c 
> b/arch/powerpc/mm/mmu_context_iommu.c
> index da6a216..c18f742 100644
> --- a/arch/powerpc/mm/mmu_context_iommu.c
> +++ b/arch/powerpc/mm/mmu_context_iommu.c
> @@ -15,6 +15,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  #include 
>  
>  static DEFINE_MUTEX(mem_list_mutex);
> @@ -72,6 +75,54 @@ bool mm_iommu_preregistered(void)
>  }
>  EXPORT_SYMBOL_GPL(mm_iommu_preregistered);
>  
> +/*
> + * Taken from alloc_migrate_target with changes to remove CMA allocations
> + */
> +struct page *new_iommu_non_cma_page(struct page *page, unsigned long private,
> + int **resultp)
> +{
> + gfp_t gfp_mask = GFP_USER;
> + struct page *new_page;
> +
> + if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> + return NULL;
> +
> + if (PageHighMem(page))
> + gfp_mask |= __GFP_HIGHMEM;
> +
> + /*
> +  * We don't want the allocation to force an OOM if possibe
> +  */
> + new_page = alloc_page(gfp_mask | __GFP_NORETRY | __GFP_NOWARN);
> + return new_page;
> +}
> +
> +static int mm_iommu_move_page_from_cma(struct page *page)
> +{
> + int ret;
> + LIST_HEAD(cma_migrate_pages);
> +
> + /* Ignore huge pages for now */
> + if (PageHuge(page) || PageTransHuge(page) || PageCompound(page))
> + return -EBUSY;
> +
> + lru_add_drain();
> + ret = isolate_lru_page(page);
> + if (ret)
> + get_page(page); /* Potential BUG? */
> +
> + list_add(>lru, _migrate_pages);
> + put_page(page); /* Drop the gup reference */
> +
> + ret = migrate_pages(_migrate_pages, new_iommu_non_cma_page,
> + NULL, 0, MIGRATE_SYNC, MR_CMA);
> + if (ret) {
> + if (!list_empty(_migrate_pages))
> + putback_movable_pages(_migrate_pages);
> + }
> + return 0;
> +}
> +
>  long mm_iommu_get(unsigned long ua, unsigned long entries,
>   struct mm_iommu_table_group_mem_t **pmem)
>  {
> @@ -124,15 +175,36 @@ long mm_iommu_get(unsigned long ua, unsigned long 
> entries,
>   for (i = 0; i < entries; ++i) {
>   if (1 != get_user_pages_fast(ua + (i << PAGE_SHIFT),
>   1/* pages */, 1/* iswrite */, )) {
> + ret = -EFAULT;
>   for (j = 0; j < i; ++j)
> - put_page(pfn_to_page(
> - mem->hpas[j] >> PAGE_SHIFT));
> + put_page(pfn_to_page(mem->hpas[j] >>
> + PAGE_SHIFT));
>   vfree(mem->hpas);
>  

Re: [PATCH v2 2/2] kexec: extend kexec_file_load system call

2016-08-30 Thread Thiago Jung Bauermann
Hello Dave,

Sorry for the delay, I was trying to get the other patch series ready as I 
mentioned in the other email.

Am Donnerstag, 18 August 2016, 16:19:46 schrieb Dave Young:
> Since Eric was objecting the extension, I think you should convince him,
> but I will review from code point of view.

Thank you very much for your review!
It looks like this series went as far as it can go, though...

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center



Re: [PATCH v2 0/2] extend kexec_file_load system call

2016-08-30 Thread Thiago Jung Bauermann
Hello Mark,

Sorry for taking this long to respond. I've been focusing on getting my 
kexec_file_load and kexec buffer hand-over series in shape.

Am Donnerstag, 18 August 2016, 11:21:13 schrieb Mark Rutland:
> On Thu, Aug 11, 2016 at 08:03:56PM -0300, Thiago Jung Bauermann wrote:
> > Device tree blob must be passed to a second kernel on DTB-capable
> > archs, like powerpc and arm64, but the current kernel interface
> > lacks this support.
> > 
> > This patch extends kexec_file_load system call by adding an extra
> > argument to this syscall so that an arbitrary number of file descriptors
> > can be handed out from user space to the kernel.
> > 
> > See the background [1].
> > 
> > Please note that the new interface looks quite similar to the current
> > system call, but that it won't always mean that it provides the "binary
> > compatibility."
> > 
> > [1] http://lists.infradead.org/pipermail/kexec/2016-June/016276.html
> 
> As with the original posting, I have a number of concerns, and I'm
> really not keen on this.

Thanks for laying out out the reasons for your objection. That's very 
helpful.

> * For typical usecases, I do not believe that this is necessary (at
>   least for arm64), and generally do not believe that it should be
>   necessary for a user to manipulate the DTB (much like the user need
>   not manipulate ACPI tables or other FW data structures).
> 
>   Other than (potentially) the case of Linux as a flashed-in bootloader,
>   I don't see a compelling case for modifying the DTB that could not be
>   accomplished in-kernel. For that case, if truly necessary, I think
>   that we can get away with something simpler.

Yes, this is the case I am aiming for. I'll experiment with a couple of 
different approaches and see how well they perform.

-- 
[]'s
Thiago Jung Bauermann
IBM Linux Technology Center



[PATCH] ps3: Remove deprecated create_singlethread_workqueue

2016-08-30 Thread Bhaktipriya Shridhar
The workqueue "ps3av->wq" queues a single work item >work and hence
doesn't require ordering. It is involved in waking up ps3avd to do the
video mode setting and hence it's not being used on a memory reclaim
path. Hence, it has been converted to use system_wq.

System workqueues have been able to handle high level of concurrency
for a long time now and hence it's not required to have a singlethreaded
workqueue just to gain concurrency. Unlike a dedicated per-cpu workqueue
created with create_singlethread_workqueue(), system_wq allows multiple
work items to overlap executions even on the same CPU; however, a
per-cpu workqueue doesn't have any CPU locality or global ordering
guarantee unless the target CPU is explicitly specified and thus the
increase of local concurrency shouldn't make any difference.

The work item has been flushed in ps3av_remove to ensure that
there are no pending tasks while disconnecting the driver.

Signed-off-by: Bhaktipriya Shridhar 
---
 drivers/ps3/ps3av.c | 11 ++-
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/ps3/ps3av.c b/drivers/ps3/ps3av.c
index 437fc35..e293606 100644
--- a/drivers/ps3/ps3av.c
+++ b/drivers/ps3/ps3av.c
@@ -44,7 +44,6 @@ static struct ps3av {
struct mutex mutex;
struct work_struct work;
struct completion done;
-   struct workqueue_struct *wq;
int open_count;
struct ps3_system_bus_device *dev;

@@ -485,7 +484,7 @@ static int ps3av_set_videomode(void)
ps3av_set_av_video_mute(PS3AV_CMD_MUTE_ON);

/* wake up ps3avd to do the actual video mode setting */
-   queue_work(ps3av->wq, >work);
+   schedule_work(>work);

return 0;
 }
@@ -956,11 +955,6 @@ static int ps3av_probe(struct ps3_system_bus_device *dev)
INIT_WORK(>work, ps3avd);
init_completion(>done);
complete(>done);
-   ps3av->wq = create_singlethread_workqueue("ps3avd");
-   if (!ps3av->wq) {
-   res = -ENOMEM;
-   goto fail;
-   }

switch (ps3_os_area_get_av_multi_out()) {
case PS3_PARAM_AV_MULTI_OUT_NTSC:
@@ -1018,8 +1012,7 @@ static int ps3av_remove(struct ps3_system_bus_device *dev)
dev_dbg(>core, " -> %s:%d\n", __func__, __LINE__);
if (ps3av) {
ps3av_cmd_fin();
-   if (ps3av->wq)
-   destroy_workqueue(ps3av->wq);
+   flush_work(>work);
kfree(ps3av);
ps3av = NULL;
}
--
2.1.4



RE: [PATCH 0/2] Support non-ssi cpu-dai for sgtl5000

2016-08-30 Thread Wente Wang
> -Original Message-
> From: Fabio Estevam [mailto:feste...@gmail.com]
> Subject: Re: [PATCH 0/2] Support non-ssi cpu-dai for sgtl5000
> 
> On Mon, Aug 29, 2016 at 7:51 AM, Winter Wang  wrote:
> > These patches support sgtl5000 to be attached to other non-ssi cpu-dais
> like SAI.
> 
> Do we really need this?
> 
> We can use SAI with sgtl5000 just fine via simple card. Take a look at:
> https://git.kernel.org/cgit/linux/kernel/git/shawnguo/linux.git/commit/arch/
> arm/boot/dts/imx6ul-14x14-
> evk.dts?h=imx/fixes=bf3251e112a0139c8dec796c9f12f2e8f01a73ca

Thanks, that's another approach. 
This patch set is not needed.


Re: Setting some clocks back to DUMMY fixes spdif output on imx6q wandboard rev B1

2016-08-30 Thread Xavi Drudis Ferran
El Mon, Aug 29, 2016 at 09:54:28PM +0200, Xavi Drudis Ferran deia:
> El Mon, Aug 29, 2016 at 12:28:21PM -0700, Nicolin Chen deia:
> > Would you
> > please do a little debug using "#define DEBUG 1" and check printk
> > from fsl_spdif_probe_txclk() to see the difference between before
> > and after Shengjiu's commit?
> 
> Yes, but I'm compiling the kernel in the wandboard, so it'll take me some 
> time. 
> 

Now. Sorry for the delay. 

I did a mistake and had to do it twice. I added a couple of sanity check 
messages:
"enter fsl_spdif_probe"
"enter fsl_spdif_probe_txclk"


linux-libre-4.7 without my patch, i.e. clocks defined like this :
arch/arm/boot/dts/imx6qdl.dtsi:
aips-bus@0200 { /* AIPS1 */
[...]
   spba-bus@0200 {
[...]
   spdif: spdif@02004000 {
  clocks = < IMX6QDL_CLK_SPDIF_GCLK>, < IMX6QDL_CLK_OSC>,
   < IMX6QDL_CLK_SPDIF>, < IMX6QDL_CLK_ASRC>,
   < IMX6QDL_CLK_DUMMY>, < IMX6QDL_CLK_ESAI_EXTAL>,
   < IMX6QDL_CLK_IPG>, < IMX6QDL_CLK_MLB>,
   < IMX6QDL_CLK_DUMMY>, < IMX6QDL_CLK_SPBA>;
  clock-names = "core",  "rxtx0",
"rxtx1", "rxtx2",
"rxtx3", "rxtx4",
"rxtx5", "rxtx6",
"rxtx7", "spba";
[...]

The messages at boot: 

[8.803603] 20ec000.sdma: Missing Free firmware (non-Free firmware loading 
is disabled)
[8.813737] imx-sdma 20ec000.sdma: failed to get firmware from device tree
[8.870764] imx-sdma 20ec000.sdma: Direct firmware load for /*(DEBLOBBED)*/ 
failed with error -2
[...]
[9.083301] fsl-asrc 2034000.asrc: driver registered
[9.087050] sgtl5000 1-000a: sgtl5000 revision 0x11
[9.107141] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe
[9.144839] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe_txclk
[9.225228] fsl-spdif-dai 2004000.spdif: use rxtx5 as tx clock source for 
32000Hz sample rate
[9.234058] fsl-spdif-dai 2004000.spdif: use txclk df 16 for 32000Hz sample 
rate
[9.235565] imx-spdif sound-spdif: ASoC: CPU DAI (null) not registered
[9.236007] imx-spdif sound-spdif: snd_soc_register_card failed: -517
[...]
[9.262713] fsl-spdif-dai 2004000.spdif: use sysclk df 2 for 32000Hz sample 
rate
[9.269349] fsl-spdif-dai 2004000.spdif: the best rate for 32000Hz sample 
rate is 32226Hz
[9.276431] fsl-asoc-card sound: ASoC: CPU DAI (null) not registered
[9.276438] fsl-asoc-card sound: snd_soc_register_card failed (-517)
[9.289559] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe_txclk
[9.306991] imx-sgtl5000 sound: ASoC: CPU DAI (null) not registered
[9.346253] imx-sgtl5000 sound: snd_soc_register_card failed (-517)
[9.376398] fsl-spdif-dai 2004000.spdif: use rxtx6 as tx clock source for 
44100Hz sample rate
[9.376404] fsl-spdif-dai 2004000.spdif: use txclk df 94 for 44100Hz sample 
rate
[9.376409] fsl-spdif-dai 2004000.spdif: the best rate for 44100Hz sample 
rate is 43882Hz
[9.376415] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe_txclk
[9.403159] fsl-spdif-dai 2004000.spdif: use rxtx6 as tx clock source for 
48000Hz sample rate
[9.403165] fsl-spdif-dai 2004000.spdif: use txclk df 86 for 48000Hz sample 
rate
[9.403170] fsl-spdif-dai 2004000.spdif: the best rate for 48000Hz sample 
rate is 47965Hz
[9.403174] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe_txclk
[9.424007] fsl-spdif-dai 2004000.spdif: use rxtx6 as tx clock source for 
96000Hz sample rate
[9.424013] fsl-spdif-dai 2004000.spdif: use txclk df 43 for 96000Hz sample 
rate
[9.424021] fsl-spdif-dai 2004000.spdif: the best rate for 96000Hz sample 
rate is 95930Hz
[9.424025] fsl-spdif-dai 2004000.spdif: enter fsl_spdif_probe_txclk
[9.450424] fsl-spdif-dai 2004000.spdif: use rxtx6 as tx clock source for 
192000Hz sample rate
[9.481561] fsl-spdif-dai 2004000.spdif: use txclk df 21 for 192000Hz sample 
rate
[9.488400] fsl-spdif-dai 2004000.spdif: the best rate for 192000Hz sample 
rate is 196428Hz
[9.536785] fsl-ssi-dai 2028000.ssi: No cache defaults, reading back from HW
[9.582106] imx-spdif sound-spdif: snd-soc-dummy-dai <-> 2004000.spdif 
mapping ok
[9.612159] fsl-asoc-card sound: ASoC: CPU DAI (null) not registered
[9.621942] fsl-asoc-card sound: snd_soc_register_card failed (-517)
[9.638227] imx-sgtl5000 sound: ASoC: CPU DAI (null) not registered
[9.647247] imx-sgtl5000 sound: snd_soc_register_card failed (-517)
[9.682092] sgtl5000 1-000a: Using internal LDO instead of VDDD
[9.776989] fsl-asoc-card sound: sgtl5000 <-> 2028000.ssi mapping ok
[...]
[   70.407805] fsl-spdif-dai 2004000.spdif: expected clock rate = 265305600
[   70.407829] fsl-spdif-dai 2004000.spdif: actual clock rate = 26400
[   70.407857] fsl-spdif-dai 2004000.spdif: set sample rate to 43882Hz for 
44100Hz playback
[   70.407867] fsl-spdif-dai 2004000.spdif: STCSCH: 0x304000
[   70.407875] fsl-spdif-dai 

[PATHC v2 9/9] ima: platform-independent hash value

2016-08-30 Thread Mimi Zohar
From: Andreas Steffen 

For remote attestion it is important for the ima measurement values
to be platform-independent. Therefore integer fields to be hashed
must be converted to canonical format.

Changelog:
- Define canonical format as little endian (Mimi)

Signed-off-by: Andreas Steffen 
Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima_crypto.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/security/integrity/ima/ima_crypto.c 
b/security/integrity/ima/ima_crypto.c
index 38f2ed8..38d6f5d 100644
--- a/security/integrity/ima/ima_crypto.c
+++ b/security/integrity/ima/ima_crypto.c
@@ -477,11 +477,13 @@ static int ima_calc_field_array_hash_tfm(struct 
ima_field_data *field_data,
u8 buffer[IMA_EVENT_NAME_LEN_MAX + 1] = { 0 };
u8 *data_to_hash = field_data[i].data;
u32 datalen = field_data[i].len;
+   u32 datalen_to_hash = !ima_canonical_fmt ? datalen :
+   cpu_to_le32(datalen);
 
if (strcmp(td->name, IMA_TEMPLATE_IMA_NAME) != 0) {
rc = crypto_shash_update(shash,
-   (const u8 *) _data[i].len,
-   sizeof(field_data[i].len));
+   (const u8 *) _to_hash,
+   sizeof(datalen_to_hash));
if (rc)
break;
} else if (strcmp(td->fields[i]->field_id, "n") == 0) {
-- 
2.1.0



[PATHC v2 8/9] ima: define a canonical binary_runtime_measurements list format

2016-08-30 Thread Mimi Zohar
The IMA binary_runtime_measurements list is currently in platform native
format.

To allow restoring a measurement list carried across kexec with a
different endianness than the targeted kernel, this patch defines
little-endian as the canonical format.  For big endian systems wanting
to save/restore the measurement list from a system with a different
endianness, a new boot command line parameter named "ima_canonical_fmt"
is defined.

Considerations: use of the "ima_canonical_fmt" boot command line
option will break existing userspace applications on big endian systems
expecting the binary_runtime_measurements list to be in platform native
format.

Signed-off-by: Mimi Zohar 
---
 Documentation/kernel-parameters.txt   |  4 
 security/integrity/ima/ima.h  |  6 ++
 security/integrity/ima/ima_fs.c   | 28 +---
 security/integrity/ima/ima_kexec.c| 11 +--
 security/integrity/ima/ima_template.c | 24 ++--
 security/integrity/ima/ima_template_lib.c |  6 --
 6 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 46c030a..5e8037fc 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1580,6 +1580,10 @@ bytes respectively. Such letter suffixes can also be 
entirely omitted.
The builtin appraise policy appraises all files
owned by uid=0.
 
+   ima_canonical_fmt [IMA]
+   Use the canonical format for the binary runtime
+   measurements, instead of host native format.
+
ima_hash=   [IMA]
Format: { md5 | sha1 | rmd160 | sha256 | sha384
   | sha512 | ... }
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index e8303c9..eb0f4dd 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -112,6 +112,12 @@ struct ima_kexec_hdr {
u64 count;
 };
 
+/*
+ * The default binary_runtime_measurements list format is defined as the
+ * platform native format.  The canonical format is defined as little-endian.
+ */
+extern bool ima_canonical_fmt;
+
 /* Internal IMA function definitions */
 int ima_init(void);
 int ima_fs_init(void);
diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
index 66e5dd5..2bcad99 100644
--- a/security/integrity/ima/ima_fs.c
+++ b/security/integrity/ima/ima_fs.c
@@ -28,6 +28,16 @@
 
 static DEFINE_MUTEX(ima_write_mutex);
 
+bool ima_canonical_fmt;
+static int __init default_canonical_fmt_setup(char *str)
+{
+#ifdef __BIG_ENDIAN
+   ima_canonical_fmt = 1;
+#endif
+   return 1;
+}
+__setup("ima_canonical_fmt", default_canonical_fmt_setup);
+
 static int valid_policy = 1;
 #define TMPBUFLEN 12
 static ssize_t ima_show_htable_value(char __user *buf, size_t count,
@@ -122,7 +132,7 @@ int ima_measurements_show(struct seq_file *m, void *v)
struct ima_queue_entry *qe = v;
struct ima_template_entry *e;
char *template_name;
-   int namelen;
+   u32 pcr, namelen, template_data_len; /* temporary fields */
bool is_ima_template = false;
int i;
 
@@ -139,25 +149,29 @@ int ima_measurements_show(struct seq_file *m, void *v)
 * PCR used defaults to the same (config option) in
 * little-endian format, unless set in policy
 */
-   ima_putc(m, >pcr, sizeof(e->pcr));
+   pcr = !ima_canonical_fmt ? e->pcr : cpu_to_le32(e->pcr);
+   ima_putc(m, , sizeof(e->pcr));
 
/* 2nd: template digest */
ima_putc(m, e->digest, TPM_DIGEST_SIZE);
 
/* 3rd: template name size */
-   namelen = strlen(template_name);
+   namelen = !ima_canonical_fmt ? strlen(template_name) :
+   cpu_to_le32(strlen(template_name));
ima_putc(m, , sizeof(namelen));
 
/* 4th:  template name */
-   ima_putc(m, template_name, namelen);
+   ima_putc(m, template_name, strlen(template_name));
 
/* 5th:  template length (except for 'ima' template) */
if (strcmp(template_name, IMA_TEMPLATE_IMA_NAME) == 0)
is_ima_template = true;
 
-   if (!is_ima_template)
-   ima_putc(m, >template_data_len,
-sizeof(e->template_data_len));
+   if (!is_ima_template) {
+   template_data_len = !ima_canonical_fmt ? e->template_data_len :
+   cpu_to_le32(e->template_data_len);
+   ima_putc(m, _data_len, sizeof(e->template_data_len));
+   }
 
/* 6th:  template specific data */
for (i = 0; i < e->template_desc->num_fields; i++) {
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index 0e4d0db..cf38ccc 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ 

[PATHC v2 7/9] ima: support restoring multiple template formats

2016-08-30 Thread Mimi Zohar
The configured IMA measurement list template format can be replaced at
runtime on the boot command line, including a custom template format.
This patch adds support for restoring a measuremement list containing
multiple builtin/custom template formats.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima_template.c | 58 +--
 1 file changed, 55 insertions(+), 3 deletions(-)

diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index 92df055..fd46c65 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -57,6 +57,8 @@ static int __init ima_template_setup(char *str)
if (ima_template)
return 1;
 
+   ima_init_template_list();
+
/*
 * Verify that a template with the supplied name exists.
 * If not, use CONFIG_IMA_DEFAULT_TEMPLATE.
@@ -153,9 +155,14 @@ static int template_desc_init_fields(const char 
*template_fmt,
 {
const char *template_fmt_ptr;
struct ima_template_field *found_fields[IMA_TEMPLATE_NUM_FIELDS_MAX];
-   int template_num_fields = template_fmt_size(template_fmt);
+   int template_num_fields;
int i, len;
 
+   if (num_fields && *num_fields > 0) /* already initialized? */
+   return 0;
+
+   template_num_fields = template_fmt_size(template_fmt);
+
if (template_num_fields > IMA_TEMPLATE_NUM_FIELDS_MAX) {
pr_err("format string '%s' contains too many fields\n",
   template_fmt);
@@ -205,6 +212,9 @@ void __init ima_init_template_list(void)
 {
int i;
 
+   if (!list_empty(_templates))
+   return;
+
spin_lock(_list);
for (i = 0; i < ARRAY_SIZE(builtin_templates); i++) {
list_add_tail_rcu(_templates[i].list,
@@ -230,6 +240,35 @@ int __init ima_init_template(void)
return result;
 }
 
+static struct ima_template_desc *restore_template_fmt(char *template_name)
+{
+   struct ima_template_desc *template_desc = NULL;
+   int ret;
+
+   ret = template_desc_init_fields(template_name, NULL, NULL);
+   if (ret < 0) {
+   pr_err("attempting to initialize the template \"%s\" failed\n",
+   template_name);
+   goto out;
+   }
+
+   template_desc = kzalloc(sizeof(*template_desc), GFP_KERNEL);
+   if (!template_desc)
+   goto out;
+
+   template_desc->name = "";
+   template_desc->fmt = kstrdup(template_name, GFP_KERNEL);
+   if (!template_desc->fmt)
+   goto out;
+
+   spin_lock(_list);
+   list_add_tail_rcu(_desc->list, _templates);
+   spin_unlock(_list);
+   synchronize_rcu();
+out:
+   return template_desc;
+}
+
 static int ima_restore_template_data(struct ima_template_desc *template_desc,
 void *template_data,
 int template_data_size,
@@ -360,10 +399,23 @@ int ima_restore_measurement_list(loff_t size, void *buf)
}
data_v1 = bufp += (u_int8_t)hdr_v1->template_name_len;
 
-   /* get template format */
template_desc = lookup_template_desc(template_name);
if (!template_desc) {
-   pr_err("template \"%s\" not found\n", template_name);
+   template_desc = restore_template_fmt(template_name);
+   if (!template_desc)
+   break;
+   }
+
+   /*
+* Only the running system's template format is initialized
+* on boot.  As needed, initialize the other template formats.
+*/
+   ret = template_desc_init_fields(template_desc->fmt,
+   &(template_desc->fields),
+   &(template_desc->num_fields));
+   if (ret < 0) {
+   pr_err("attempting to restore the template fmt \"%s\" \
+   failed\n", template_desc->fmt);
ret = -EINVAL;
break;
}
-- 
2.1.0



[PATHC v2 6/9] ima: store the builtin/custom template definitions in a list

2016-08-30 Thread Mimi Zohar
The builtin and single custom templates are currently stored in an
array.  In preparation for being able to restore a measurement list
containing multiple builtin/custom templates, this patch stores the
builtin and custom templates as a linked list.  This will permit
defining more than one custom template per boot.

Changelog v2:
- fix lookup_template_desc() preemption imbalance (kernel test robot)

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima.h  |  2 ++
 security/integrity/ima/ima_main.c |  1 +
 security/integrity/ima/ima_template.c | 43 +++
 3 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index 634d140..e8303c9 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -81,6 +81,7 @@ struct ima_template_field {
 
 /* IMA template descriptor definition */
 struct ima_template_desc {
+   struct list_head list;
char *name;
char *fmt;
int num_fields;
@@ -136,6 +137,7 @@ int ima_restore_measurement_list(loff_t bufsize, void *buf);
 int ima_measurements_show(struct seq_file *m, void *v);
 unsigned long ima_get_binary_runtime_size(void);
 int ima_init_template(void);
+void ima_init_template_list(void);
 
 #ifdef CONFIG_KEXEC_FILE
 void ima_load_kexec_buffer(void);
diff --git a/security/integrity/ima/ima_main.c 
b/security/integrity/ima/ima_main.c
index 596ef61..592f318 100644
--- a/security/integrity/ima/ima_main.c
+++ b/security/integrity/ima/ima_main.c
@@ -418,6 +418,7 @@ static int __init init_ima(void)
 {
int error;
 
+   ima_init_template_list();
hash_setup(CONFIG_IMA_DEFAULT_HASH);
error = ima_init();
if (!error) {
diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index 7c90075..92df055 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -15,16 +15,20 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
 #include "ima.h"
 #include "ima_template_lib.h"
 
-static struct ima_template_desc defined_templates[] = {
+static struct ima_template_desc builtin_templates[] = {
{.name = IMA_TEMPLATE_IMA_NAME, .fmt = IMA_TEMPLATE_IMA_FMT},
{.name = "ima-ng", .fmt = "d-ng|n-ng"},
{.name = "ima-sig", .fmt = "d-ng|n-ng|sig"},
{.name = "", .fmt = ""},/* placeholder for a custom format */
 };
 
+static LIST_HEAD(defined_templates);
+spinlock_t template_list;
+
 static struct ima_template_field supported_fields[] = {
{.field_id = "d", .field_init = ima_eventdigest_init,
 .field_show = ima_show_template_digest},
@@ -81,7 +85,7 @@ __setup("ima_template=", ima_template_setup);
 
 static int __init ima_template_fmt_setup(char *str)
 {
-   int num_templates = ARRAY_SIZE(defined_templates);
+   int num_templates = ARRAY_SIZE(builtin_templates);
 
if (ima_template)
return 1;
@@ -92,22 +96,28 @@ static int __init ima_template_fmt_setup(char *str)
return 1;
}
 
-   defined_templates[num_templates - 1].fmt = str;
-   ima_template = defined_templates + num_templates - 1;
+   builtin_templates[num_templates - 1].fmt = str;
+   ima_template = builtin_templates + num_templates - 1;
+
return 1;
 }
 __setup("ima_template_fmt=", ima_template_fmt_setup);
 
 static struct ima_template_desc *lookup_template_desc(const char *name)
 {
-   int i;
+   struct ima_template_desc *template_desc;
+   int found = 0;
 
-   for (i = 0; i < ARRAY_SIZE(defined_templates); i++) {
-   if (strcmp(defined_templates[i].name, name) == 0)
-   return defined_templates + i;
+   rcu_read_lock();
+   list_for_each_entry_rcu(template_desc, _templates, list) {
+   if ((strcmp(template_desc->name, name) == 0) ||
+   (strcmp(template_desc->fmt, name) == 0)) {
+   found = 1;
+   break;
+   }
}
-
-   return NULL;
+   rcu_read_unlock();
+   return found ? template_desc : NULL;
 }
 
 static struct ima_template_field *lookup_template_field(const char *field_id)
@@ -191,6 +201,19 @@ struct ima_template_desc *ima_template_desc_current(void)
return ima_template;
 }
 
+void __init ima_init_template_list(void)
+{
+   int i;
+
+   spin_lock(_list);
+   for (i = 0; i < ARRAY_SIZE(builtin_templates); i++) {
+   list_add_tail_rcu(_templates[i].list,
+ _templates);
+   }
+   spin_unlock(_list);
+   synchronize_rcu();
+}
+
 int __init ima_init_template(void)
 {
struct ima_template_desc *template = ima_template_desc_current();
-- 
2.1.0



[PATHC v2 5/9] ima: on soft reboot, save the measurement list

2016-08-30 Thread Mimi Zohar
From: Thiago Jung Bauermann 

This patch uses the kexec buffer passing mechanism to pass the
serialized IMA binary_runtime_measurements to the next kernel.

Changelog v2:
- Fix build issue by defining a stub ima_add_kexec_buffer and stub
  struct kimage when CONFIG_IMA=n and CONFIG_IMA_KEXEC=n. (Fenguang Wu)
- removed kexec_add_handover_buffer() checksum argument.
- added skip_checksum member to kexec_buf
- only register reboot notifier once

Changelog v1:
- updated to call IMA functions  (Mimi)
- move code from ima_template.c to ima_kexec.c (Mimi)

Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Mimi Zohar 
---
 include/linux/ima.h| 12 ++
 kernel/kexec_file.c|  4 ++
 security/integrity/ima/ima_kexec.c | 88 ++
 3 files changed, 104 insertions(+)

diff --git a/include/linux/ima.h b/include/linux/ima.h
index 0eb7c2e..7f6952f 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -11,6 +11,7 @@
 #define _LINUX_IMA_H
 
 #include 
+#include 
 struct linux_binprm;
 
 #ifdef CONFIG_IMA
@@ -23,6 +24,10 @@ extern int ima_post_read_file(struct file *file, void *buf, 
loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
 
+#ifdef CONFIG_IMA_KEXEC
+extern void ima_add_kexec_buffer(struct kimage *image);
+#endif
+
 #else
 static inline int ima_bprm_check(struct linux_binprm *bprm)
 {
@@ -62,6 +67,13 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
 
 #endif /* CONFIG_IMA */
 
+#ifndef CONFIG_IMA_KEXEC
+struct kimage;
+
+static inline void ima_add_kexec_buffer(struct kimage *image)
+{}
+#endif
+
 #ifdef CONFIG_IMA_APPRAISE
 extern void ima_inode_post_setattr(struct dentry *dentry);
 extern int ima_inode_setxattr(struct dentry *dentry, const char *xattr_name,
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 0e90d14..9585861 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -200,6 +201,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
return ret;
image->kernel_buf_len = size;
 
+   /* IMA needs to pass the measurement list to the next kernel. */
+   ima_add_kexec_buffer(image);
+
/* Call arch image probe handlers */
ret = arch_kexec_kernel_image_probe(image, image->kernel_buf,
image->kernel_buf_len);
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index e77ca9d..0e4d0db 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -23,6 +23,11 @@
 
 #include "ima.h"
 
+#ifdef CONFIG_IMA_KEXEC
+/* Physical address of the measurement buffer in the next kernel. */
+static unsigned long kexec_buffer_load_addr;
+static size_t kexec_segment_size;
+
 static int ima_dump_measurement_list(unsigned long *buffer_size, void **buffer,
 unsigned long segment_size)
 {
@@ -75,6 +80,89 @@ out:
 }
 
 /*
+ * Called during kexec execute so that IMA can save the measurement list.
+ */
+static int ima_update_kexec_buffer(struct notifier_block *self,
+  unsigned long action, void *data)
+{
+   void *kexec_buffer = NULL;
+   size_t kexec_buffer_size;
+   int ret;
+
+   if (!kexec_in_progress)
+   return NOTIFY_OK;
+
+   kexec_buffer_size = ima_get_binary_runtime_size();
+   if (kexec_buffer_size >
+   (kexec_segment_size - sizeof(struct ima_kexec_hdr))) {
+   pr_err("Binary measurement list grew too large.\n");
+   goto out;
+   }
+
+   ima_dump_measurement_list(_buffer_size, _buffer,
+ kexec_segment_size);
+   if (!kexec_buffer) {
+   pr_err("Not enough memory for the kexec measurement buffer.\n");
+   goto out;
+   }
+   ret = kexec_update_segment(kexec_buffer, kexec_buffer_size,
+  kexec_buffer_load_addr, kexec_segment_size);
+   if (ret)
+   pr_err("Error updating kexec buffer: %d\n", ret);
+out:
+   return NOTIFY_OK;
+}
+
+struct notifier_block update_buffer_nb = {
+   .notifier_call = ima_update_kexec_buffer,
+};
+
+/*
+ * Called during kexec_file_load so that IMA can add a segment to the kexec
+ * image for the measurement list for the next kernel.
+ */
+void ima_add_kexec_buffer(struct kimage *image)
+{
+   static int registered = 0;
+   struct kexec_buf kbuf = { .image = image, .buf_align = PAGE_SIZE,
+ .buf_min = 0, .buf_max = ULONG_MAX,
+ .top_down = true, .skip_checksum = true };
+   int ret;
+
+   if 

[PATHC v2 4/9] ima: serialize the binary_runtime_measurements

2016-08-30 Thread Mimi Zohar
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
of the running kernel must be saved and restored on boot.  This patch
serializes the IMA measurement list in the binary_runtime_measurements
format.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima.h   |  1 +
 security/integrity/ima/ima_fs.c|  2 +-
 security/integrity/ima/ima_kexec.c | 51 ++
 3 files changed, 53 insertions(+), 1 deletion(-)

diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index f9cd08e..634d140 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -133,6 +133,7 @@ void ima_print_digest(struct seq_file *m, u8 *digest, u32 
size);
 struct ima_template_desc *ima_template_desc_current(void);
 int ima_restore_measurement_entry(struct ima_template_entry *entry);
 int ima_restore_measurement_list(loff_t bufsize, void *buf);
+int ima_measurements_show(struct seq_file *m, void *v);
 unsigned long ima_get_binary_runtime_size(void);
 int ima_init_template(void);
 
diff --git a/security/integrity/ima/ima_fs.c b/security/integrity/ima/ima_fs.c
index c07a384..66e5dd5 100644
--- a/security/integrity/ima/ima_fs.c
+++ b/security/integrity/ima/ima_fs.c
@@ -116,7 +116,7 @@ void ima_putc(struct seq_file *m, void *data, int datalen)
  *   [eventdata length]
  *   eventdata[n]=template specific data
  */
-static int ima_measurements_show(struct seq_file *m, void *v)
+int ima_measurements_show(struct seq_file *m, void *v)
 {
/* the list never shrinks, so we don't need a lock here */
struct ima_queue_entry *qe = v;
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
index 6a046ad..e77ca9d 100644
--- a/security/integrity/ima/ima_kexec.c
+++ b/security/integrity/ima/ima_kexec.c
@@ -23,6 +23,57 @@
 
 #include "ima.h"
 
+static int ima_dump_measurement_list(unsigned long *buffer_size, void **buffer,
+unsigned long segment_size)
+{
+   struct ima_queue_entry *qe;
+   struct seq_file file;
+   struct ima_kexec_hdr khdr = {
+   .version = 1, .buffer_size = 0, .count = 0};
+   int ret = 0;
+
+   /* segment size can't change between kexec load and execute */
+   file.buf = vmalloc(segment_size);
+   if (!file.buf) {
+   ret = -ENOMEM;
+   goto out;
+   }
+
+   file.size = segment_size;
+   file.read_pos = 0;
+   file.count = sizeof(khdr);  /* reserved space */
+
+   list_for_each_entry_rcu(qe, _measurements, later) {
+   if (file.count < file.size) {
+   khdr.count++;
+   ima_measurements_show(, qe);
+   } else {
+   ret = -EINVAL;
+   break;
+   }
+   }
+
+   if (ret < 0)
+   goto out;
+
+   /*
+* fill in reserved space with some buffer details
+* (eg. version, buffer size, number of measurements)
+*/
+   khdr.buffer_size = file.count;
+   memcpy(file.buf, , sizeof(khdr));
+   print_hex_dump(KERN_DEBUG, "ima dump: ", DUMP_PREFIX_NONE,
+   16, 1, file.buf,
+   file.count < 100 ? file.count : 100, true);
+
+   *buffer_size = file.count;
+   *buffer = file.buf;
+out:
+   if (ret == -EINVAL)
+   vfree(file.buf);
+   return ret;
+}
+
 /*
  * Restore the measurement list from the previous kernel.
  */
-- 
2.1.0



[PATHC v2 3/9] ima: maintain memory size needed for serializing the measurement list

2016-08-30 Thread Mimi Zohar
In preparation for serializing the binary_runtime_measurements, this patch
maintains the amount of memory required.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/Kconfig | 12 ++
 security/integrity/ima/ima.h   |  1 +
 security/integrity/ima/ima_queue.c | 49 --
 3 files changed, 60 insertions(+), 2 deletions(-)

diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index 5487827..1c5a1c2 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -27,6 +27,18 @@ config IMA
  to learn more about IMA.
  If unsure, say N.
 
+config IMA_KEXEC
+   bool "Enable carrying the IMA measurement list across a soft boot"
+   depends on IMA && TCG_TPM && KEXEC_FILE
+   default n
+   help
+  TPM PCRs are only reset on a hard reboot.  In order to validate
+  a TPM's quote after a soft boot, the IMA measurement list of the
+  running kernel must be saved and restored on boot.
+
+  Depending on the IMA policy, the measurement list can grow to
+  be very large.
+
 config IMA_MEASURE_PCR_IDX
int
depends on IMA
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index e7b3755..f9cd08e 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -133,6 +133,7 @@ void ima_print_digest(struct seq_file *m, u8 *digest, u32 
size);
 struct ima_template_desc *ima_template_desc_current(void);
 int ima_restore_measurement_entry(struct ima_template_entry *entry);
 int ima_restore_measurement_list(loff_t bufsize, void *buf);
+unsigned long ima_get_binary_runtime_size(void);
 int ima_init_template(void);
 
 #ifdef CONFIG_KEXEC_FILE
diff --git a/security/integrity/ima/ima_queue.c 
b/security/integrity/ima/ima_queue.c
index 12d1b04..8f0661b 100644
--- a/security/integrity/ima/ima_queue.c
+++ b/security/integrity/ima/ima_queue.c
@@ -29,6 +29,11 @@
 #define AUDIT_CAUSE_LEN_MAX 32
 
 LIST_HEAD(ima_measurements);   /* list of all measurements */
+#ifdef CONFIG_IMA_KEXEC
+static unsigned long binary_runtime_size;
+#else
+static unsigned long binary_runtime_size = ULONG_MAX;
+#endif
 
 /* key: inode (before secure-hashing a file) */
 struct ima_h_table ima_htable = {
@@ -64,6 +69,24 @@ static struct ima_queue_entry *ima_lookup_digest_entry(u8 
*digest_value,
return ret;
 }
 
+/*
+ * Calculate the memory required for serializing a single
+ * binary_runtime_measurement list entry, which contains a
+ * couple of variable length fields (e.g template name and data).
+ */
+static int get_binary_runtime_size(struct ima_template_entry *entry)
+{
+   int size = 0;
+
+   size += sizeof(u32);/* pcr */
+   size += sizeof(entry->digest);
+   size += sizeof(int);/* template name size field */
+   size += strlen(entry->template_desc->name);
+   size += sizeof(entry->template_data_len);
+   size += entry->template_data_len;
+   return size;
+}
+
 /* ima_add_template_entry helper function:
  * - Add template entry to the measurement list and hash table, for
  *   all entries except those carried across kexec.
@@ -90,9 +113,26 @@ static int ima_add_digest_entry(struct ima_template_entry 
*entry, int flags)
key = ima_hash_key(entry->digest);
hlist_add_head_rcu(>hnext, _htable.queue[key]);
}
+
+   if (binary_runtime_size != ULONG_MAX) {
+   int size;
+
+   size = get_binary_runtime_size(entry);
+   binary_runtime_size = (binary_runtime_size < ULONG_MAX - size) ?
+binary_runtime_size + size : ULONG_MAX;
+   }
return 0;
 }
 
+/*
+ * Return the amount of memory required for serializing the
+ * entire binary_runtime_measurement list.
+ */
+unsigned long ima_get_binary_runtime_size(void)
+{
+   return binary_runtime_size;
+};
+
 static int ima_pcr_extend(const u8 *hash, int pcr)
 {
int result = 0;
@@ -106,8 +146,13 @@ static int ima_pcr_extend(const u8 *hash, int pcr)
return result;
 }
 
-/* Add template entry to the measurement list and hash table,
- * and extend the pcr.
+/*
+ * Add template entry to the measurement list and hash table, and
+ * extend the pcr.
+ *
+ * On systems which support carrying the IMA measurement list across
+ * kexec, maintain the total memory size required for serializing the
+ * binary_runtime_measurements.
  */
 int ima_add_template_entry(struct ima_template_entry *entry, int violation,
   const char *op, struct inode *inode,
-- 
2.1.0



[PATHC v2 2/9] ima: permit duplicate measurement list entries

2016-08-30 Thread Mimi Zohar
Measurements carried across kexec need to be added to the IMA
measurement list, but should not prevent measurements of the newly
booted kernel from being added to the measurement list. This patch
adds support for allowing duplicate measurements.

The "boot_aggregate" measurement entry is the delimiter between soft
boots.

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/ima_queue.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/security/integrity/ima/ima_queue.c 
b/security/integrity/ima/ima_queue.c
index 4b1bb77..12d1b04 100644
--- a/security/integrity/ima/ima_queue.c
+++ b/security/integrity/ima/ima_queue.c
@@ -65,11 +65,12 @@ static struct ima_queue_entry *ima_lookup_digest_entry(u8 
*digest_value,
 }
 
 /* ima_add_template_entry helper function:
- * - Add template entry to measurement list and hash table.
+ * - Add template entry to the measurement list and hash table, for
+ *   all entries except those carried across kexec.
  *
  * (Called with ima_extend_list_mutex held.)
  */
-static int ima_add_digest_entry(struct ima_template_entry *entry)
+static int ima_add_digest_entry(struct ima_template_entry *entry, int flags)
 {
struct ima_queue_entry *qe;
unsigned int key;
@@ -85,8 +86,10 @@ static int ima_add_digest_entry(struct ima_template_entry 
*entry)
list_add_tail_rcu(>later, _measurements);
 
atomic_long_inc(_htable.len);
-   key = ima_hash_key(entry->digest);
-   hlist_add_head_rcu(>hnext, _htable.queue[key]);
+   if (flags) {
+   key = ima_hash_key(entry->digest);
+   hlist_add_head_rcu(>hnext, _htable.queue[key]);
+   }
return 0;
 }
 
@@ -126,7 +129,7 @@ int ima_add_template_entry(struct ima_template_entry 
*entry, int violation,
}
}
 
-   result = ima_add_digest_entry(entry);
+   result = ima_add_digest_entry(entry, 1);
if (result < 0) {
audit_cause = "ENOMEM";
audit_info = 0;
@@ -155,7 +158,7 @@ int ima_restore_measurement_entry(struct ima_template_entry 
*entry)
int result = 0;
 
mutex_lock(_extend_list_mutex);
-   result = ima_add_digest_entry(entry);
+   result = ima_add_digest_entry(entry, 0);
mutex_unlock(_extend_list_mutex);
return result;
 }
-- 
2.1.0



[PATHC v2 1/9] ima: on soft reboot, restore the measurement list

2016-08-30 Thread Mimi Zohar
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
of the running kernel must be saved and restored on boot.  This patch
restores the measurement list.

Changelog v2:
- redefined ima_kexec_hdr to use types with well defined sizes (M. Ellerman)
- defined missing ima_load_kexec_buffer() stub function

Changelog v1:
- call ima_load_kexec_buffer() (Thiago)

Signed-off-by: Mimi Zohar 
---
 security/integrity/ima/Makefile   |   1 +
 security/integrity/ima/ima.h  |  18 
 security/integrity/ima/ima_init.c |   2 +
 security/integrity/ima/ima_kexec.c|  55 +++
 security/integrity/ima/ima_queue.c|  10 ++
 security/integrity/ima/ima_template.c | 170 ++
 6 files changed, 256 insertions(+)
 create mode 100644 security/integrity/ima/ima_kexec.c

diff --git a/security/integrity/ima/Makefile b/security/integrity/ima/Makefile
index 9aeaeda..56093be 100644
--- a/security/integrity/ima/Makefile
+++ b/security/integrity/ima/Makefile
@@ -8,4 +8,5 @@ obj-$(CONFIG_IMA) += ima.o
 ima-y := ima_fs.o ima_queue.o ima_init.o ima_main.o ima_crypto.o ima_api.o \
 ima_policy.o ima_template.o ima_template_lib.o
 ima-$(CONFIG_IMA_APPRAISE) += ima_appraise.o
+ima-$(CONFIG_KEXEC_FILE) += ima_kexec.o
 obj-$(CONFIG_IMA_BLACKLIST_KEYRING) += ima_mok.o
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index db25f54..e7b3755 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -102,6 +102,15 @@ struct ima_queue_entry {
 };
 extern struct list_head ima_measurements;  /* list of all measurements */
 
+/* Some details preceding the binary serialized measurement list */
+struct ima_kexec_hdr {
+   u16 version;
+   u16 _reserved0;
+   u32 _reserved1;
+   u64 buffer_size;
+   u64 count;
+};
+
 /* Internal IMA function definitions */
 int ima_init(void);
 int ima_fs_init(void);
@@ -122,8 +131,17 @@ int ima_init_crypto(void);
 void ima_putc(struct seq_file *m, void *data, int datalen);
 void ima_print_digest(struct seq_file *m, u8 *digest, u32 size);
 struct ima_template_desc *ima_template_desc_current(void);
+int ima_restore_measurement_entry(struct ima_template_entry *entry);
+int ima_restore_measurement_list(loff_t bufsize, void *buf);
 int ima_init_template(void);
 
+#ifdef CONFIG_KEXEC_FILE
+void ima_load_kexec_buffer(void);
+#else
+static inline void ima_load_kexec_buffer(void)
+{}
+#endif
+
 /*
  * used to protect h_table and sha_table
  */
diff --git a/security/integrity/ima/ima_init.c 
b/security/integrity/ima/ima_init.c
index 32912bd..3ba0ca4 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -128,6 +128,8 @@ int __init ima_init(void)
if (rc != 0)
return rc;
 
+   ima_load_kexec_buffer();
+
rc = ima_add_boot_aggregate();  /* boot aggregate must be first entry */
if (rc != 0)
return rc;
diff --git a/security/integrity/ima/ima_kexec.c 
b/security/integrity/ima/ima_kexec.c
new file mode 100644
index 000..6a046ad
--- /dev/null
+++ b/security/integrity/ima/ima_kexec.c
@@ -0,0 +1,55 @@
+/*
+ * Copyright (C) 2016 IBM Corporation
+ *
+ * Authors:
+ * Thiago Jung Bauermann 
+ * Mimi Zohar 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "ima.h"
+
+/*
+ * Restore the measurement list from the previous kernel.
+ */
+void ima_load_kexec_buffer(void)
+{
+   void *kexec_buffer = NULL;
+   size_t kexec_buffer_size = 0;
+   int rc;
+
+   rc = kexec_get_handover_buffer(_buffer, _buffer_size);
+   switch (rc) {
+   case 0:
+   rc = ima_restore_measurement_list(kexec_buffer_size,
+ kexec_buffer);
+   if (rc != 0)
+   pr_err("Failed to restore the measurement list: %d\n",
+   rc);
+
+   kexec_free_handover_buffer();
+   break;
+   case -ENOTSUPP:
+   pr_debug("Restoring the measurement list not supported\n");
+   break;
+   case -ENOENT:
+   pr_debug("No measurement list to restore\n");
+   break;
+   default:
+   pr_debug("Error restoring the measurement list: %d\n", rc);
+   }
+}
diff --git a/security/integrity/ima/ima_queue.c 
b/security/integrity/ima/ima_queue.c
index 32f6ac0..4b1bb77 100644
--- a/security/integrity/ima/ima_queue.c
+++ b/security/integrity/ima/ima_queue.c

[PATHC v2 0/9] ima: carry the measurement list across kexec

2016-08-30 Thread Mimi Zohar
The TPM PCRs are only reset on a hard reboot.  In order to validate a
TPM's quote after a soft reboot (eg. kexec -e), the IMA measurement list
of the running kernel must be saved and then restored on the subsequent
boot, possibly of a different architecture.

The existing securityfs binary_runtime_measurements file conveniently
provides a serialized format of the IMA measurement list. This patch
set serializes the measurement list in this format and restores it.

Up to now, the binary_runtime_measurements was defined as architecture
native format.  The assumption being that userspace could and would
handle any architecture conversions.  With the ability of carrying the
measurement list across kexec, possibly from one architecture to a
different one, the per boot architecture information is lost and with it
the ability of recalculating the template digest hash.  To resolve this
problem, without breaking the existing ABI, this patch set introduces
the boot command line option "ima_canonical_fmt", which is arbitrarily
defined as little endian.

The need for this boot command line option will be limited to the
existing version 1 format of the binary_runtime_measurements.
Subsequent formats will be defined as canonical format (eg. TPM 2.0
support for larger digests).

This patch set pre-req's Thiago Bauermann's "kexec_file: Add buffer
hand-over for the next kernel" patch set. 

These patches can also be found in the next-kexec-restore branch of:
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity.git

Mimi

Andreas Steffen (1):
  ima: platform-independent hash value

Mimi Zohar (7):
  ima: on soft reboot, restore the measurement list
  ima: permit duplicate measurement list entries
  ima: maintain memory size needed for serializing the measurement list
  ima: serialize the binary_runtime_measurements
  ima: store the builtin/custom template definitions in a list
  ima: support restoring multiple template formats
  ima: define a canonical binary_runtime_measurements list format

Thiago Jung Bauermann (1):
  ima: on soft reboot, save the measurement list

 Documentation/kernel-parameters.txt   |   4 +
 include/linux/ima.h   |  12 ++
 kernel/kexec_file.c   |   4 +
 security/integrity/ima/Kconfig|  12 ++
 security/integrity/ima/Makefile   |   1 +
 security/integrity/ima/ima.h  |  28 +++
 security/integrity/ima/ima_crypto.c   |   6 +-
 security/integrity/ima/ima_fs.c   |  30 +++-
 security/integrity/ima/ima_init.c |   2 +
 security/integrity/ima/ima_kexec.c| 201 +
 security/integrity/ima/ima_main.c |   1 +
 security/integrity/ima/ima_queue.c|  72 +++-
 security/integrity/ima/ima_template.c | 289 --
 security/integrity/ima/ima_template_lib.c |   6 +-
 14 files changed, 637 insertions(+), 31 deletions(-)
 create mode 100644 security/integrity/ima/ima_kexec.c

-- 
2.1.0



[PATCH 2/2] arm64: don't select PERF_USE_VMALLOC by default

2016-08-30 Thread Kim Phillips
Any arm64 based parts that have cache aliasing issues can set it
manually.  Apparently dragged in from ARM(32) defaults in commit
8c2c3df "arm64: Build infrastructure".

Signed-off-by: Kim Phillips 
Cc: Catalin Marinas 
---
 arch/arm64/Kconfig | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index bc3f00f..2e6874f 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -105,7 +105,6 @@ config ARM64
select OF_NUMA if NUMA && OF
select OF_RESERVED_MEM
select PCI_ECAM if ACPI
-   select PERF_USE_VMALLOC
select POWER_RESET
select POWER_SUPPLY
select SPARSE_IRQ
-- 
2.9.3



[PATCH 1/2] perf_event: remove unused DEBUG_PERF_USE_VMALLOC

2016-08-30 Thread Kim Phillips
This 'DEBUG'-prefixed version of PERF_USE_VMALLOC is not used anywhere.
It appears to be leftovers from commit 906010b "perf_event: Provide
vmalloc() based mmap() backing" that introduced it.

Not sure what commit cb30711 "perf_event: Don't allow vmalloc() backed
perf on powerpc" was trying to do with it either.

Signed-off-by: Kim Phillips 
Cc: Peter Zijlstra 
Cc: Michael Ellerman 
---
 init/Kconfig | 13 -
 1 file changed, 13 deletions(-)

diff --git a/init/Kconfig b/init/Kconfig
index cac3f09..934a61f 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1707,19 +1707,6 @@ config PERF_EVENTS
 
  Say Y if unsure.
 
-config DEBUG_PERF_USE_VMALLOC
-   default n
-   bool "Debug: use vmalloc to back perf mmap() buffers"
-   depends on PERF_EVENTS && DEBUG_KERNEL && !PPC
-   select PERF_USE_VMALLOC
-   help
-Use vmalloc memory to back perf mmap() buffers.
-
-Mostly useful for debugging the vmalloc code on platforms
-that don't require it.
-
-Say N if unsure.
-
 endmenu
 
 config VM_EVENT_COUNTERS
-- 
2.9.3



[PATCH v7 06/13] powerpc: Generalize elf64_apply_relocate_add.

2016-08-30 Thread Thiago Jung Bauermann
When apply_relocate_add is called, modules are already loaded at their
final location in memory so Elf64_Shdr.sh_addr can be used for accessing
the section contents as well as the base address for relocations.

This is not the case for kexec's purgatory, because it will only be
copied to its final location right before being executed. Therefore,
it needs to be relocated while it is still in a temporary buffer. In
this case, Elf64_Shdr.sh_addr can't be used to access the sections'
contents.

This patch allows elf64_apply_relocate_add to be used when the ELF
binary is not yet at its final location by adding an addr_base argument
to specify the address at which the section will be loaded, and rela,
loc_base and syms_base to point to the sections' contents.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/elf_util.h |  6 ++--
 arch/powerpc/kernel/elf_util_64.c   | 63 +
 arch/powerpc/kernel/module_64.c | 17 --
 3 files changed, 61 insertions(+), 25 deletions(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
index 37372559fe62..a012ba03282d 100644
--- a/arch/powerpc/include/asm/elf_util.h
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -64,7 +64,9 @@ static inline unsigned long my_r2(const struct elf_info 
*elf_info)
 }
 
 int elf64_apply_relocate_add(const struct elf_info *elf_info,
-const char *strtab, unsigned int symindex,
-unsigned int relsec, const char *obj_name);
+const char *strtab, const Elf64_Rela *rela,
+unsigned int num_rela, void *syms_base,
+void *loc_base, Elf64_Addr addr_base,
+const char *obj_name);
 
 #endif /* _ASM_POWERPC_ELF_UTIL_H */
diff --git a/arch/powerpc/kernel/elf_util_64.c 
b/arch/powerpc/kernel/elf_util_64.c
index decad2c34f38..8e5d400ac9f2 100644
--- a/arch/powerpc/kernel/elf_util_64.c
+++ b/arch/powerpc/kernel/elf_util_64.c
@@ -69,33 +69,56 @@ static void squash_toc_save_inst(const char *name, unsigned 
long addr) { }
  * elf64_apply_relocate_add - apply 64 bit RELA relocations
  * @elf_info:  Support information for the ELF binary being relocated.
  * @strtab:String table for the associated symbol table.
- * @symindex:  Section header index for the associated symbol table.
- * @relsec:Section header index for the relocations to apply.
+ * @rela:  Contents of the section with the relocations to apply.
+ * @num_rela:  Number of relocation entries in the section.
+ * @syms_base: Contents of the associated symbol table.
+ * @loc_base:  Contents of the section to which relocations apply.
+ * @addr_base: The address where the section will be loaded in memory.
  * @obj_name:  The name of the ELF binary, for information messages.
+ *
+ * Applies RELA relocations to an ELF file already at its final location
+ * in memory (in which case loc_base == addr_base), or still in a temporary
+ * buffer.
  */
 int elf64_apply_relocate_add(const struct elf_info *elf_info,
-const char *strtab, unsigned int symindex,
-unsigned int relsec, const char *obj_name)
+const char *strtab, const Elf64_Rela *rela,
+unsigned int num_rela, void *syms_base,
+void *loc_base, Elf64_Addr addr_base,
+const char *obj_name)
 {
unsigned int i;
-   Elf64_Shdr *sechdrs = elf_info->sechdrs;
-   Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr;
-   Elf64_Sym *sym;
unsigned long *location;
+   unsigned long address;
unsigned long value;
+   const char *name;
+   Elf64_Sym *sym;
+
+   for (i = 0; i < num_rela; i++) {
+   /*
+* rels[i].r_offset contains the byte offset from the beginning
+* of section to the storage unit affected.
+*
+* This is the location to update in the temporary buffer where
+* the section is currently loaded. The section will finally
+* be loaded to a different address later, pointed to by
+* addr_base.
+*/
+   location = loc_base + rela[i].r_offset;
+
+   /* Final address of the location. */
+   address = addr_base + rela[i].r_offset;
 
+   /* This is the symbol the relocation is referring to. */
+   sym = (Elf64_Sym *) syms_base + ELF64_R_SYM(rela[i].r_info);
 
-   for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rela); i++) {
-   /* This is where to make the change */
-   location = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
-   + 

[PATCH v7 07/13] powerpc: Adapt elf64_apply_relocate_add for kexec_file_load.

2016-08-30 Thread Thiago Jung Bauermann
Extend elf64_apply_relocate_add to support relative symbols. This is
necessary because there is a difference between how the module loading
mechanism and the kexec purgatory loading code use Elf64_Sym.st_value
at relocation time: the former changes st_value to point to the absolute
memory address before relocating the module, while the latter does that
adjustment during relocation of the purgatory.

Also, add a check_symbols argument so that the kexec code can be stricter
about undefined symbols.

Finally, add relocation types used by the purgatory.

[a...@linux-foundation.org: coding-style fixes]
Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Andrew Morton 
---
 arch/powerpc/include/asm/elf_util.h |   2 +
 arch/powerpc/kernel/elf_util_64.c   | 100 +---
 arch/powerpc/kernel/module_64.c |   6 ++-
 3 files changed, 99 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
index a012ba03282d..5a27e8ceb88a 100644
--- a/arch/powerpc/include/asm/elf_util.h
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -20,6 +20,7 @@
 #include 
 
 struct elf_info {
+   const struct elfhdr *ehdr;
struct elf_shdr *sechdrs;
 
/* Index of stubs section. */
@@ -67,6 +68,7 @@ int elf64_apply_relocate_add(const struct elf_info *elf_info,
 const char *strtab, const Elf64_Rela *rela,
 unsigned int num_rela, void *syms_base,
 void *loc_base, Elf64_Addr addr_base,
+bool relative_symbols, bool check_symbols,
 const char *obj_name);
 
 #endif /* _ASM_POWERPC_ELF_UTIL_H */
diff --git a/arch/powerpc/kernel/elf_util_64.c 
b/arch/powerpc/kernel/elf_util_64.c
index 8e5d400ac9f2..1b17df71fb8d 100644
--- a/arch/powerpc/kernel/elf_util_64.c
+++ b/arch/powerpc/kernel/elf_util_64.c
@@ -74,6 +74,8 @@ static void squash_toc_save_inst(const char *name, unsigned 
long addr) { }
  * @syms_base: Contents of the associated symbol table.
  * @loc_base:  Contents of the section to which relocations apply.
  * @addr_base: The address where the section will be loaded in memory.
+ * @relative_symbols:  Are the symbols' st_value members relative?
+ * @check_symbols: Fail if an unexpected symbol is found?
  * @obj_name:  The name of the ELF binary, for information messages.
  *
  * Applies RELA relocations to an ELF file already at its final location
@@ -84,12 +86,15 @@ int elf64_apply_relocate_add(const struct elf_info 
*elf_info,
 const char *strtab, const Elf64_Rela *rela,
 unsigned int num_rela, void *syms_base,
 void *loc_base, Elf64_Addr addr_base,
+bool relative_symbols, bool check_symbols,
 const char *obj_name)
 {
unsigned int i;
unsigned long *location;
unsigned long address;
+   unsigned long sec_base;
unsigned long value;
+   int reloc_type;
const char *name;
Elf64_Sym *sym;
 
@@ -116,15 +121,44 @@ int elf64_apply_relocate_add(const struct elf_info 
*elf_info,
else
name = "";
 
-   pr_debug("RELOC at %p: %li-type as %s (0x%lx) + %li\n",
-  location, (long)ELF64_R_TYPE(rela[i].r_info),
-  name, (unsigned long)sym->st_value,
+   reloc_type = ELF64_R_TYPE(rela[i].r_info);
+
+   pr_debug("RELOC at %p: %i-type as %s (0x%lx) + %li\n",
+  location, reloc_type, name, (unsigned long)sym->st_value,
   (long)rela[i].r_addend);
 
+   if (check_symbols) {
+   /*
+* TOC symbols appear as undefined but should be
+* resolved as well, so allow them to be processed.
+*/
+   if (sym->st_shndx == SHN_UNDEF &&
+   strcmp(name, ".TOC.") != 0 &&
+   reloc_type != R_PPC64_TOC) {
+   pr_err("Undefined symbol: %s\n", name);
+   return -ENOEXEC;
+   } else if (sym->st_shndx == SHN_COMMON) {
+   pr_err("Symbol '%s' in common section.\n",
+  name);
+   return -ENOEXEC;
+   }
+   }
+
+   if (relative_symbols && sym->st_shndx != SHN_ABS) {
+   if (sym->st_shndx >= elf_info->ehdr->e_shnum) {
+   pr_err("Invalid section %d for symbol %s\n",
+  sym->st_shndx, name);
+   return 

[PATCH v4 5/5] IMA: Demonstration code for kexec buffer passing.

2016-08-30 Thread Thiago Jung Bauermann
This shows how kernel code can use the kexec buffer passing mechanism
to pass information to the next kernel.

This patch is not intended to be committed.

[a...@linux-foundation.org: coding-style fixes]
Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Andrew Morton 
---
 include/linux/ima.h   | 11 +
 kernel/kexec_file.c   |  4 ++
 security/integrity/ima/ima.h  |  5 +++
 security/integrity/ima/ima_init.c | 26 +++
 security/integrity/ima/ima_template.c | 85 +++
 5 files changed, 131 insertions(+)

diff --git a/include/linux/ima.h b/include/linux/ima.h
index 0eb7c2e7f0d6..96528d007139 100644
--- a/include/linux/ima.h
+++ b/include/linux/ima.h
@@ -11,6 +11,7 @@
 #define _LINUX_IMA_H
 
 #include 
+#include 
 struct linux_binprm;
 
 #ifdef CONFIG_IMA
@@ -23,6 +24,10 @@ extern int ima_post_read_file(struct file *file, void *buf, 
loff_t size,
  enum kernel_read_file_id id);
 extern void ima_post_path_mknod(struct dentry *dentry);
 
+#ifdef CONFIG_KEXEC_FILE
+extern void ima_add_kexec_buffer(struct kimage *image);
+#endif
+
 #else
 static inline int ima_bprm_check(struct linux_binprm *bprm)
 {
@@ -60,6 +65,12 @@ static inline void ima_post_path_mknod(struct dentry *dentry)
return;
 }
 
+#ifdef CONFIG_KEXEC_FILE
+static inline void ima_add_kexec_buffer(struct kimage *image)
+{
+}
+#endif
+
 #endif /* CONFIG_IMA */
 
 #ifdef CONFIG_IMA_APPRAISE
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 0e90d1446cb0..75c1b8d67a72 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -248,6 +249,9 @@ kimage_file_prepare_segments(struct kimage *image, int 
kernel_fd, int initrd_fd,
}
}
 
+   /* IMA needs to pass the measurement list to the next kernel. */
+   ima_add_kexec_buffer(image);
+
/* Call arch image load handlers */
ldata = arch_kexec_kernel_image_load(image);
 
diff --git a/security/integrity/ima/ima.h b/security/integrity/ima/ima.h
index db25f54a04fe..0334001055d7 100644
--- a/security/integrity/ima/ima.h
+++ b/security/integrity/ima/ima.h
@@ -102,6 +102,11 @@ struct ima_queue_entry {
 };
 extern struct list_head ima_measurements;  /* list of all measurements */
 
+#ifdef CONFIG_KEXEC_FILE
+extern void *kexec_buffer;
+extern size_t kexec_buffer_size;
+#endif
+
 /* Internal IMA function definitions */
 int ima_init(void);
 int ima_fs_init(void);
diff --git a/security/integrity/ima/ima_init.c 
b/security/integrity/ima/ima_init.c
index 32912bd54ead..a1924d0f3b2b 100644
--- a/security/integrity/ima/ima_init.c
+++ b/security/integrity/ima/ima_init.c
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "ima.h"
 
@@ -104,6 +105,29 @@ void __init ima_load_x509(void)
 }
 #endif
 
+#ifdef CONFIG_KEXEC_FILE
+static void ima_load_kexec_buffer(void)
+{
+   int rc;
+
+   /* Fetch the buffer from the previous kernel, if any. */
+   rc = kexec_get_handover_buffer(_buffer, _buffer_size);
+   if (rc == 0) {
+   /* Demonstrate that buffer handover works. */
+   pr_err("kexec buffer contents: %s\n", (char *) kexec_buffer);
+   pr_err("kexec buffer contents after update: %s\n",
+  (char *) kexec_buffer + 4 * PAGE_SIZE + 10);
+
+   kexec_free_handover_buffer();
+   } else if (rc == -ENOENT)
+   pr_debug("No kexec buffer from the previous kernel.\n");
+   else
+   pr_debug("Error restoring kexec buffer: %d\n", rc);
+}
+#else
+static void ima_load_kexec_buffer(void) { }
+#endif
+
 int __init ima_init(void)
 {
u8 pcr_i[TPM_DIGEST_SIZE];
@@ -134,5 +158,7 @@ int __init ima_init(void)
 
ima_init_policy();
 
+   ima_load_kexec_buffer();
+
return ima_fs_init();
 }
diff --git a/security/integrity/ima/ima_template.c 
b/security/integrity/ima/ima_template.c
index febd12ed9b55..e9ac260534b6 100644
--- a/security/integrity/ima/ima_template.c
+++ b/security/integrity/ima/ima_template.c
@@ -15,6 +15,8 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include 
+#include 
 #include "ima.h"
 #include "ima_template_lib.h"
 
@@ -182,6 +184,89 @@ static int template_desc_init_fields(const char 
*template_fmt,
return 0;
 }
 
+#ifdef CONFIG_KEXEC_FILE
+void *kexec_buffer;
+size_t kexec_buffer_size;
+
+/* Physical address of the measurement buffer in the next kernel. */
+unsigned long kexec_buffer_load_addr;
+
+/*
+ * Called during reboot. IMA can add here new events that were generated after
+ * the kexec image was loaded.
+ */
+static int ima_update_kexec_buffer(struct notifier_block *self,
+  unsigned long action, void *data)
+{
+   int ret;
+
+   if (!kexec_in_progress)
+   return 

[PATCH v4 4/5] kexec_file: Add mechanism to update kexec segments.

2016-08-30 Thread Thiago Jung Bauermann
kexec_update_segment allows a given segment in kexec_image to have
its contents updated. This is useful if the current kernel wants to
send information to the next kernel that is up-to-date at the time of
reboot.

Signed-off-by: Thiago Jung Bauermann 
---
 include/linux/kexec.h |  2 ++
 kernel/kexec_core.c   | 98 +++
 2 files changed, 100 insertions(+)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index edadff6c86ff..ff3aa93649e2 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -256,6 +256,8 @@ extern int kexec_purgatory_get_set_symbol(struct kimage 
*image,
  unsigned int size, bool get_value);
 extern void *kexec_purgatory_get_symbol_addr(struct kimage *image,
 const char *name);
+int kexec_update_segment(const char *buffer, size_t bufsz,
+unsigned long load_addr, size_t memsz);
 extern void __crash_kexec(struct pt_regs *);
 extern void crash_kexec(struct pt_regs *);
 int kexec_should_crash(struct task_struct *);
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 561675589511..d3f1ebf66222 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -721,6 +721,104 @@ static struct page *kimage_alloc_page(struct kimage 
*image,
return page;
 }
 
+/**
+ * kexec_update_segment - update the contents of a kimage segment
+ * @buffer:New contents of the segment.
+ * @bufsz: @buffer size.
+ * @load_addr: Segment's physical address in the next kernel.
+ * @memsz: Segment size.
+ *
+ * This function assumes kexec_mutex is held.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int kexec_update_segment(const char *buffer, size_t bufsz,
+unsigned long load_addr, size_t memsz)
+{
+   int i;
+   unsigned long entry;
+   unsigned long *ptr = NULL;
+   void *dest = NULL;
+
+   if (kexec_image == NULL) {
+   pr_err("Can't update segment: no kexec image loaded.\n");
+   return -EINVAL;
+   }
+
+   /*
+* kexec_add_buffer rounds up segment sizes to PAGE_SIZE, so
+* we have to do it here as well.
+*/
+   memsz = ALIGN(memsz, PAGE_SIZE);
+
+   for (i = 0; i < kexec_image->nr_segments; i++)
+   /* We only support updating whole segments. */
+   if (load_addr == kexec_image->segment[i].mem &&
+   memsz == kexec_image->segment[i].memsz) {
+   if (!kexec_image->segment[i].skip_checksum) {
+   pr_err("Trying to update non-modifiable 
segment.\n");
+   return -EINVAL;
+   }
+
+   break;
+   }
+   if (i == kexec_image->nr_segments) {
+   pr_err("Couldn't find segment to update: 0x%lx, size 0x%zx\n",
+  load_addr, memsz);
+   return -EINVAL;
+   }
+
+   for (entry = kexec_image->head; !(entry & IND_DONE) && memsz;
+entry = *ptr++) {
+   void *addr = (void *) (entry & PAGE_MASK);
+
+   switch (entry & IND_FLAGS) {
+   case IND_DESTINATION:
+   dest = addr;
+   break;
+   case IND_INDIRECTION:
+   ptr = __va(entry & PAGE_MASK);
+   break;
+   case IND_SOURCE:
+   /* Shouldn't happen, but verify just to be safe. */
+   if (dest == NULL) {
+   pr_err("Invalid kexec entries list.");
+   return -EINVAL;
+   }
+
+   if (dest == (void *) load_addr) {
+   void *page_addr;
+   unsigned long offset;
+   size_t uchunk, mchunk;
+
+   page_addr = kmap_atomic(kmap_to_page(addr));
+
+   offset = load_addr & ~PAGE_MASK;
+   mchunk = min_t(size_t, memsz,
+  PAGE_SIZE - offset);
+   uchunk = min(bufsz, mchunk);
+   memcpy(page_addr + offset, buffer, uchunk);
+
+   kunmap_atomic(page_addr);
+
+   bufsz -= uchunk;
+   load_addr += mchunk;
+   buffer += mchunk;
+   memsz -= mchunk;
+   }
+   dest += PAGE_SIZE;
+   }
+
+   /* Shouldn't happen, but verify just to be safe. */
+   if (ptr == NULL) {
+   pr_err("Invalid kexec entries list.");
+   return -EINVAL;
+   }
+   

[PATCH v4 3/5] kexec_file: Allow skipping checksum calculation for some segments.

2016-08-30 Thread Thiago Jung Bauermann
Add skip_checksum member to struct kexec_buf to specify whether the
corresponding segment should be part of the checksum calculation.

The next patch will add a way to update segments after a kimage is loaded.
Segments that will be updated in this way should not be checksummed,
otherwise they will cause the purgatory checksum verification to fail
when the machine is rebooted.

As a bonus, we don't need to special-case the purgatory segment anymore
to avoid checksumming it.

Places using struct kexec_buf get false as the default value for
skip_checksum since they all use designated initializers.  Therefore,
there is no behavior change with this patch and all segments except the
purgatory are checksummed.

Signed-off-by: Thiago Jung Bauermann 
---
 include/linux/kexec.h | 23 ++-
 kernel/kexec_file.c   | 15 +++
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index 16561e96a6d7..edadff6c86ff 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -100,6 +100,9 @@ struct kexec_segment {
size_t bufsz;
unsigned long mem;
size_t memsz;
+
+   /* Whether this segment is ignored in the checksum calculation. */
+   bool skip_checksum;
 };
 
 #ifdef CONFIG_COMPAT
@@ -151,15 +154,16 @@ struct kexec_file_ops {
 
 /**
  * struct kexec_buf - parameters for finding a place for a buffer in memory
- * @image: kexec image in which memory to search.
- * @buffer:Contents which will be copied to the allocated memory.
- * @bufsz: Size of @buffer.
- * @mem:   On return will have address of the buffer in memory.
- * @memsz: Size for the buffer in memory.
- * @buf_align: Minimum alignment needed.
- * @buf_min:   The buffer can't be placed below this address.
- * @buf_max:   The buffer can't be placed above this address.
- * @top_down:  Allocate from top of memory.
+ * @image: kexec image in which memory to search.
+ * @buffer:Contents which will be copied to the allocated memory.
+ * @bufsz: Size of @buffer.
+ * @mem:   On return will have address of the buffer in memory.
+ * @memsz: Size for the buffer in memory.
+ * @buf_align: Minimum alignment needed.
+ * @buf_min:   The buffer can't be placed below this address.
+ * @buf_max:   The buffer can't be placed above this address.
+ * @top_down:  Allocate from top of memory.
+ * @skip_checksum: Don't verify checksum for this buffer in purgatory.
  */
 struct kexec_buf {
struct kimage *image;
@@ -171,6 +175,7 @@ struct kexec_buf {
unsigned long buf_min;
unsigned long buf_max;
bool top_down;
+   bool skip_checksum;
 };
 
 int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index f5684adfad07..0e90d1446cb0 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -584,6 +584,7 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
ksegment->bufsz = kbuf->bufsz;
ksegment->mem = kbuf->mem;
ksegment->memsz = kbuf->memsz;
+   ksegment->skip_checksum = kbuf->skip_checksum;
kbuf->image->nr_segments++;
return 0;
 }
@@ -598,7 +599,6 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
char *digest;
void *zero_buf;
struct kexec_sha_region *sha_regions;
-   struct purgatory_info *pi = >purgatory_info;
 
zero_buf = __va(page_to_pfn(ZERO_PAGE(0)) << PAGE_SHIFT);
zero_buf_sz = PAGE_SIZE;
@@ -638,11 +638,7 @@ static int kexec_calculate_store_digests(struct kimage 
*image)
struct kexec_segment *ksegment;
 
ksegment = >segment[i];
-   /*
-* Skip purgatory as it will be modified once we put digest
-* info in purgatory.
-*/
-   if (ksegment->kbuf == pi->purgatory_buf)
+   if (ksegment->skip_checksum)
continue;
 
ret = crypto_shash_update(desc, ksegment->kbuf,
@@ -714,7 +710,7 @@ static int __kexec_load_purgatory(struct kimage *image, 
unsigned long min,
Elf_Shdr *sechdrs = NULL;
struct kexec_buf kbuf = { .image = image, .bufsz = 0, .buf_align = 1,
  .buf_min = min, .buf_max = max,
- .top_down = top_down };
+ .top_down = top_down, .skip_checksum = true };
 
/*
 * sechdrs_c points to section headers in purgatory and are read
@@ -819,7 +815,10 @@ static int __kexec_load_purgatory(struct kimage *image, 
unsigned long min,
if (kbuf.buf_align < bss_align)
kbuf.buf_align = bss_align;
 
-   /* Add buffer to segment list */
+   /*
+* Add buffer to segment list. Don't checksum the segment as
+* it will be modified once we 

[PATCH v4 2/5] powerpc: kexec_file: Add buffer hand-over support for the next kernel

2016-08-30 Thread Thiago Jung Bauermann
The buffer hand-over mechanism allows the currently running kernel to pass
data to kernel that will be kexec'd via a kexec segment. The second kernel
can check whether the previous kernel sent data and retrieve it.

This is the architecture-specific part.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/kexec.h   |  12 +-
 arch/powerpc/kernel/kexec_elf_64.c |   2 +-
 arch/powerpc/kernel/machine_kexec_64.c | 274 +++--
 3 files changed, 240 insertions(+), 48 deletions(-)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 73f88b5f9bd1..b8e32194ce63 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -92,12 +92,20 @@ static inline bool kdump_in_progress(void)
 }
 
 #ifdef CONFIG_KEXEC_FILE
+#define ARCH_HAS_KIMAGE_ARCH
+
+struct kimage_arch {
+   phys_addr_t handover_buffer_addr;
+   unsigned long handover_buffer_size;
+};
+
 int setup_purgatory(struct kimage *image, const void *slave_code,
const void *fdt, unsigned long kernel_load_addr,
unsigned long fdt_load_addr, unsigned long stack_top,
int debug);
-int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
- unsigned long initrd_len, const char *cmdline);
+int setup_new_fdt(const struct kimage *image, void *fdt,
+ unsigned long initrd_load_addr, unsigned long initrd_len,
+ const char *cmdline);
 bool find_debug_console(const void *fdt);
 #endif /* CONFIG_KEXEC_FILE */
 
diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
b/arch/powerpc/kernel/kexec_elf_64.c
index 3cc8ebce1a86..0c576e300384 100644
--- a/arch/powerpc/kernel/kexec_elf_64.c
+++ b/arch/powerpc/kernel/kexec_elf_64.c
@@ -208,7 +208,7 @@ void *elf64_load(struct kimage *image, char *kernel_buf,
goto out;
}
 
-   ret = setup_new_fdt(fdt, initrd_load_addr, initrd_len, cmdline);
+   ret = setup_new_fdt(image, fdt, initrd_load_addr, initrd_len, cmdline);
if (ret)
goto out;
 
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 3879b6d91c0b..d6077898200a 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -489,6 +489,77 @@ int arch_kimage_file_post_load_cleanup(struct kimage 
*image)
return image->fops->cleanup(image->image_loader_data);
 }
 
+bool kexec_can_hand_over_buffer(void)
+{
+   return true;
+}
+
+int arch_kexec_add_handover_buffer(struct kimage *image,
+  unsigned long load_addr, unsigned long size)
+{
+   image->arch.handover_buffer_addr = load_addr;
+   image->arch.handover_buffer_size = size;
+
+   return 0;
+}
+
+int kexec_get_handover_buffer(void **addr, unsigned long *size)
+{
+   int ret;
+   u64 start_addr, end_addr;
+
+   ret = of_property_read_u64(of_chosen,
+  "linux,kexec-handover-buffer-start",
+  _addr);
+   if (ret == -EINVAL)
+   return -ENOENT;
+   else if (ret)
+   return -EINVAL;
+
+   ret = of_property_read_u64(of_chosen, "linux,kexec-handover-buffer-end",
+  _addr);
+   if (ret == -EINVAL)
+   return -ENOENT;
+   else if (ret)
+   return -EINVAL;
+
+   *addr =  __va(start_addr);
+   /* -end is the first address after the buffer. */
+   *size = end_addr - start_addr;
+
+   return 0;
+}
+
+int kexec_free_handover_buffer(void)
+{
+   int ret;
+   void *addr;
+   unsigned long size;
+   struct property *prop;
+
+   ret = kexec_get_handover_buffer(, );
+   if (ret)
+   return ret;
+
+   ret = memblock_free(__pa(addr), size);
+   if (ret)
+   return ret;
+
+   prop = of_find_property(of_chosen, "linux,kexec-handover-buffer-start",
+   NULL);
+   ret = of_remove_property(of_chosen, prop);
+   if (ret)
+   return ret;
+
+   prop = of_find_property(of_chosen, "linux,kexec-handover-buffer-end",
+   NULL);
+   ret = of_remove_property(of_chosen, prop);
+   if (ret)
+   return ret;
+
+   return 0;
+}
+
 /**
  * arch_kexec_walk_mem() - call func(data) for each unreserved memory block
  * @kbuf:  Context info for the search. Also passed to @func.
@@ -686,26 +757,16 @@ int setup_purgatory(struct kimage *image, const void 
*slave_code,
return 0;
 }
 
-/*
- * setup_new_fdt() - modify /chosen and memory reservation for the next kernel
- * @fdt:
- * @initrd_load_addr:  Address where the next initrd will be loaded.
- * @initrd_len:Size of the next initrd, or 0 if there will be 
none.
- * @cmdline:   Command line for the next kernel, 

[PATCH v4 0/5] kexec_file: Add buffer hand-over for the next kernel

2016-08-30 Thread Thiago Jung Bauermann
Hello,

The purpose of this new version of the series is to fix a small issue that
I found, which is that the kernel doesn't remove the memory reservation
for the hand-over buffer it received from the previous kernel in the
device tree it sets up for the next kernel. The result is that for each
successive kexec, a stale hand-over buffer is left behind, wasting memory.

This is fixed by changes to kexec_free_handover_buffer and
setup_handover_buffer in patch 2. The other change is to fix checkpatch
warnings in the last patch.

Original cover letter:

This patch series implements a mechanism which allows the kernel to pass
on a buffer to the kernel that will be kexec'd. This buffer is passed
as a segment which is added to the kimage when it is being prepared
by kexec_file_load.

How the second kernel is informed of this buffer is architecture-specific.
On powerpc, this is done via the device tree, by checking
the properties /chosen/linux,kexec-handover-buffer-start and
/chosen/linux,kexec-handover-buffer-end, which is analogous to how the
kernel finds the initrd.

This is needed because the Integrity Measurement Architecture subsystem
needs to preserve its measurement list accross the kexec reboot. The
following patch series for the IMA subsystem uses this feature for that
purpose:

https://lists.infradead.org/pipermail/kexec/2016-August/016745.html

This is so that IMA can implement trusted boot support on the OpenPower
platform, because on such systems an intermediary Linux instance running
as part of the firmware is used to boot the target operating system via
kexec. Using this mechanism, IMA on this intermediary instance can
hand over to the target OS the measurements of the components that were
used to boot it.

Because there could be additional measurement events between the
kexec_file_load call and the actual reboot, IMA needs a way to update the
buffer with those additional events before rebooting. One can minimize
the interval between the kexec_file_load and the reboot syscalls, but as
small as it can be, there is always the possibility that the measurement
list will be out of date at the time of reboot.

To address this issue, this patch series also introduces
kexec_update_segment, which allows a reboot notifier to change the
contents of the image segment during the reboot process.

The last patch is not intended to be merged, it just demonstrates how
this feature can be used.

This series applies on top of v6 of the "kexec_file_load implementation
for PowerPC" patch series (which applies on top of v4.8-rc1):

https://lists.infradead.org/pipermail/kexec/2016-August/016960.html

Changes for v4:
- Rebased series on kexec_file_load patch series v7.
- Patch "powerpc: kexec_file: Add buffer hand-over support for the next kernel"
  - Convert hand-over buffer address to physical address when calling
memblock_free in kexec_free_handover_buffer.
  - Delete hand-over buffer properties from the live device tree in
kexec_free_handover_buffer.
  - Remove the memory reservation and the properties for the hand-over
buffer received from the previous kernel in setup_handover_buffer.
- Patch "IMA: Demonstration code for kexec buffer passing."
  - Fix checkpatch warnings. (Andrew Morton)

Changes for v3:
- Rebased series on kexec_file_load patch series v6.
  Both patch series apply cleanly on todays' Linus master branch, except
  for a few lines of fuzz in arch/powerpc/Makefile and arch/powerpc/Kconfig.
- Patch "kexec_file: Add buffer hand-over support for the next kernel"
  - Fix compilation warning in  by adding a struct kexec_buf
forward declaration when CONFIG_KEXEC_FILE=n. (Fenguang Wu)
- Patch "kexec_file: Allow skipping checksum calculation for some segments."
  - Substitute checksum argument in kexec_add_buffer with skip_checksum
member in struct kexec_buf, as suggested by Dave Young.
- Patch "kexec_file: Add mechanism to update kexec segments."
  - Use kmap_atomic in kexec_update_segment, as suggested by Andrew Morton.
  - Fix build warning on m68k by passing unsigned long value to __va instead
of void *. (Fenguang Wu)
  - Change bufsz and memsz arguments of kexec_update_segment to size_t to fix
compilation warning. (Fenguang Wu)
- Patch "kexec: Share logic to copy segment page contents."
  - Dropped this patch.
- Patch "IMA: Demonstration code for kexec buffer passing."
  - Update to use kexec_buf.skip_checksum instead of passing it in
kexec_add_buffer.

Changes for v2:
- Rebased on v5 of kexec_file_load implementation for PowerPC patch series.
- Patch "kexec_file: Add buffer hand-over support for the next kernel"
  - Changed kexec_add_handover_buffer to receive a struct kexec_buf, as
suggested by Dave Young.
- Patch "powerpc: kexec_file: Add buffer hand-over support for the next kernel"
  - Moved setup_handover_buffer from kexec_elf_64.c to machine_kexec_64.c.
  - Call setup_handover_buffer from setup_new_fdt instead of elf64_load.
  - Changed kexec_get_handover_buffer 

[PATCH v4 1/5] kexec_file: Add buffer hand-over support for the next kernel

2016-08-30 Thread Thiago Jung Bauermann
The buffer hand-over mechanism allows the currently running kernel to pass
data to kernel that will be kexec'd via a kexec segment. The second kernel
can check whether the previous kernel sent data and retrieve it.

This is the architecture-independent part of the feature.

Signed-off-by: Thiago Jung Bauermann 
---
 include/linux/kexec.h | 31 +++
 kernel/kexec_file.c   | 68 +++
 2 files changed, 99 insertions(+)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d419d0e51fe5..16561e96a6d7 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -383,6 +383,37 @@ static inline void *boot_phys_to_virt(unsigned long entry)
return phys_to_virt(boot_phys_to_phys(entry));
 }
 
+#ifdef CONFIG_KEXEC_FILE
+bool __weak kexec_can_hand_over_buffer(void);
+int __weak arch_kexec_add_handover_buffer(struct kimage *image,
+ unsigned long load_addr,
+ unsigned long size);
+int kexec_add_handover_buffer(struct kexec_buf *kbuf);
+int __weak kexec_get_handover_buffer(void **addr, unsigned long *size);
+int __weak kexec_free_handover_buffer(void);
+#else
+struct kexec_buf;
+
+static inline bool kexec_can_hand_over_buffer(void)
+{
+   return false;
+}
+
+static inline int kexec_add_handover_buffer(struct kexec_buf *kbuf)
+{
+   return -ENOTSUPP;
+}
+
+static inline int kexec_get_handover_buffer(void **addr, unsigned long *size)
+{
+   return -ENOTSUPP;
+}
+
+static inline int kexec_free_handover_buffer(void)
+{
+   return -ENOTSUPP;
+}
+#endif /* CONFIG_KEXEC_FILE */
 #else /* !CONFIG_KEXEC_CORE */
 struct pt_regs;
 struct task_struct;
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 3401816700f3..f5684adfad07 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -113,6 +113,74 @@ void kimage_file_post_load_cleanup(struct kimage *image)
image->image_loader_data = NULL;
 }
 
+/**
+ * kexec_can_hand_over_buffer - can we pass data to the kexec'd kernel?
+ */
+bool __weak kexec_can_hand_over_buffer(void)
+{
+   return false;
+}
+
+/**
+ * arch_kexec_add_handover_buffer - do arch-specific steps to handover buffer
+ *
+ * Architectures should use this function to pass on the handover buffer
+ * information to the next kernel.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int __weak arch_kexec_add_handover_buffer(struct kimage *image,
+ unsigned long load_addr,
+ unsigned long size)
+{
+   return -ENOTSUPP;
+}
+
+/**
+ * kexec_add_handover_buffer - add buffer to be used by the next kernel
+ * @kbuf:  Buffer contents and memory parameters.
+ *
+ * This function assumes that kexec_mutex is held.
+ * On successful return, @kbuf->mem will have the physical address of
+ * the buffer in the next kernel.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int kexec_add_handover_buffer(struct kexec_buf *kbuf)
+{
+   int ret;
+
+   if (!kexec_can_hand_over_buffer())
+   return -ENOTSUPP;
+
+   ret = kexec_add_buffer(kbuf);
+   if (ret)
+   return ret;
+
+   return arch_kexec_add_handover_buffer(kbuf->image, kbuf->mem,
+ kbuf->memsz);
+}
+
+/**
+ * kexec_get_handover_buffer - get the handover buffer from the previous kernel
+ * @addr:  On successful return, set to point to the buffer contents.
+ * @size:  On successful return, set to the buffer size.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int __weak kexec_get_handover_buffer(void **addr, unsigned long *size)
+{
+   return -ENOTSUPP;
+}
+
+/**
+ * kexec_free_handover_buffer - free memory used by the handover buffer
+ */
+int __weak kexec_free_handover_buffer(void)
+{
+   return -ENOTSUPP;
+}
+
 /*
  * In file mode list of segments is prepared by kernel. Copy relevant
  * data from user space, do error checking, prepare segment list
-- 
1.9.1



[PATCH v7 12/13] powerpc: Add purgatory for kexec_file_load implementation.

2016-08-30 Thread Thiago Jung Bauermann
This purgatory implementation comes from kexec-tools, almost unchanged.

The only changes were that the sha256_regions global variable was
renamed to sha_regions to match what kexec_file_load expects, and to
use the sha256.c file from x86's purgatory to avoid adding yet another
SHA-256 implementation.

Also, some formatting warnings found by checkpatch.pl were fixed.

In order to use boot/string.S in ppc64 big endian mode, the functions
defined in it need to have dot symbols so that they can be called
from C code. Therefore,  change the file to use a DOTSYM macro
if one is defined, so that the purgatory can add those dot symbols.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/Makefile |   4 +
 arch/powerpc/boot/string.S|  67 ++--
 arch/powerpc/purgatory/.gitignore |   2 +
 arch/powerpc/purgatory/Makefile   |  46 +
 arch/powerpc/purgatory/console-ppc64.c|  38 +++
 arch/powerpc/purgatory/crashdump-ppc64.h  |  42 
 arch/powerpc/purgatory/crashdump_backup.c |  36 +++
 arch/powerpc/purgatory/crtsavres.S|   5 +
 arch/powerpc/purgatory/hvCall.S   |  27 +
 arch/powerpc/purgatory/hvCall.h   |   8 ++
 arch/powerpc/purgatory/kexec-sha256.h |  11 ++
 arch/powerpc/purgatory/ppc64_asm.h|  20 
 arch/powerpc/purgatory/printf.c   | 164 ++
 arch/powerpc/purgatory/purgatory-ppc64.c  |  41 
 arch/powerpc/purgatory/purgatory-ppc64.h  |   6 ++
 arch/powerpc/purgatory/purgatory.c|  62 +++
 arch/powerpc/purgatory/purgatory.h|  11 ++
 arch/powerpc/purgatory/sha256.c   |   6 ++
 arch/powerpc/purgatory/sha256.h   |   1 +
 arch/powerpc/purgatory/string.S   |   2 +
 arch/powerpc/purgatory/v2wrap.S   | 134 
 21 files changed, 704 insertions(+), 29 deletions(-)

diff --git a/arch/powerpc/Makefile b/arch/powerpc/Makefile
index 1934707bf321..c91c496cfc64 100644
--- a/arch/powerpc/Makefile
+++ b/arch/powerpc/Makefile
@@ -256,6 +256,7 @@ core-y  += arch/powerpc/kernel/ 
\
 core-$(CONFIG_XMON)+= arch/powerpc/xmon/
 core-$(CONFIG_KVM) += arch/powerpc/kvm/
 core-$(CONFIG_PERF_EVENTS) += arch/powerpc/perf/
+core-$(CONFIG_KEXEC_FILE)  += arch/powerpc/purgatory/
 
 drivers-$(CONFIG_OPROFILE) += arch/powerpc/oprofile/
 
@@ -377,6 +378,9 @@ archclean:
$(Q)$(MAKE) $(clean)=$(boot)
 
 archprepare: checkbin
+ifeq ($(CONFIG_KEXEC_FILE),y)
+   $(Q)$(MAKE) $(build)=arch/powerpc/purgatory 
arch/powerpc/purgatory/kexec-purgatory.c
+endif
 
 # Use the file '.tmp_gas_check' for binutils tests, as gas won't output
 # to stdout and these checks are run even on install targets.
diff --git a/arch/powerpc/boot/string.S b/arch/powerpc/boot/string.S
index acc9428f2789..b54bbad5f83d 100644
--- a/arch/powerpc/boot/string.S
+++ b/arch/powerpc/boot/string.S
@@ -11,9 +11,18 @@
 
 #include "ppc_asm.h"
 
+/*
+ * The ppc64 kexec purgatory uses this file and packages it in ELF64,
+ * so it needs dot symbols for the ppc64 big endian ABI. This macro
+ * allows it to create those symbols.
+ */
+#ifndef DOTSYM
+#define DOTSYM(a)  a
+#endif
+
.text
-   .globl  strcpy
-strcpy:
+   .globl  DOTSYM(strcpy)
+DOTSYM(strcpy):
addir5,r3,-1
addir4,r4,-1
 1: lbzur0,1(r4)
@@ -22,8 +31,8 @@ strcpy:
bne 1b
blr
 
-   .globl  strncpy
-strncpy:
+   .globl  DOTSYM(strncpy)
+DOTSYM(strncpy):
cmpwi   0,r5,0
beqlr
mtctr   r5
@@ -35,8 +44,8 @@ strncpy:
bdnzf   2,1b/* dec ctr, branch if ctr != 0 && !cr0.eq */
blr
 
-   .globl  strcat
-strcat:
+   .globl  DOTSYM(strcat)
+DOTSYM(strcat):
addir5,r3,-1
addir4,r4,-1
 1: lbzur0,1(r5)
@@ -49,8 +58,8 @@ strcat:
bne 1b
blr
 
-   .globl  strchr
-strchr:
+   .globl  DOTSYM(strchr)
+DOTSYM(strchr):
addir3,r3,-1
 1: lbzur0,1(r3)
cmpw0,r0,r4
@@ -60,8 +69,8 @@ strchr:
li  r3,0
blr
 
-   .globl  strcmp
-strcmp:
+   .globl  DOTSYM(strcmp)
+DOTSYM(strcmp):
addir5,r3,-1
addir4,r4,-1
 1: lbzur3,1(r5)
@@ -72,8 +81,8 @@ strcmp:
beq 1b
blr
 
-   .globl  strncmp
-strncmp:
+   .globl  DOTSYM(strncmp)
+DOTSYM(strncmp):
mtctr   r5
addir5,r3,-1
addir4,r4,-1
@@ -85,8 +94,8 @@ strncmp:
bdnzt   eq,1b
blr
 
-   .globl  strlen
-strlen:
+   .globl  DOTSYM(strlen)
+DOTSYM(strlen):
addir4,r3,-1
 1: lbzur0,1(r4)
cmpwi   0,r0,0
@@ -94,8 +103,8 @@ strlen:
subfr3,r3,r4
blr
 
-   .globl  memset
-memset:
+   .globl  DOTSYM(memset)
+DOTSYM(memset):
rlwimi  r4,r4,8,16,23
rlwimi  

[PATCH v7 13/13] powerpc: Enable CONFIG_KEXEC_FILE in powerpc server defconfigs.

2016-08-30 Thread Thiago Jung Bauermann
Enable CONFIG_KEXEC_FILE in powernv_defconfig, ppc64_defconfig and
pseries_defconfig.

It depends on CONFIG_CRYPTO_SHA256=y, so add that as well.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/configs/powernv_defconfig | 2 ++
 arch/powerpc/configs/ppc64_defconfig   | 2 ++
 arch/powerpc/configs/pseries_defconfig | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/arch/powerpc/configs/powernv_defconfig 
b/arch/powerpc/configs/powernv_defconfig
index dce352e9153b..319e1fb7b0c9 100644
--- a/arch/powerpc/configs/powernv_defconfig
+++ b/arch/powerpc/configs/powernv_defconfig
@@ -47,6 +47,7 @@ CONFIG_BINFMT_MISC=m
 CONFIG_PPC_TRANSACTIONAL_MEM=y
 CONFIG_HOTPLUG_CPU=y
 CONFIG_KEXEC=y
+CONFIG_KEXEC_FILE=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_NUMA=y
 CONFIG_MEMORY_HOTPLUG=y
@@ -298,6 +299,7 @@ CONFIG_CRYPTO_CCM=m
 CONFIG_CRYPTO_PCBC=m
 CONFIG_CRYPTO_HMAC=y
 CONFIG_CRYPTO_MICHAEL_MIC=m
+CONFIG_CRYPTO_SHA256=y
 CONFIG_CRYPTO_TGR192=m
 CONFIG_CRYPTO_WP512=m
 CONFIG_CRYPTO_ANUBIS=m
diff --git a/arch/powerpc/configs/ppc64_defconfig 
b/arch/powerpc/configs/ppc64_defconfig
index 0a8d250cb97e..a0355ccc7f55 100644
--- a/arch/powerpc/configs/ppc64_defconfig
+++ b/arch/powerpc/configs/ppc64_defconfig
@@ -44,6 +44,7 @@ CONFIG_HZ_100=y
 CONFIG_BINFMT_MISC=m
 CONFIG_PPC_TRANSACTIONAL_MEM=y
 CONFIG_KEXEC=y
+CONFIG_KEXEC_FILE=y
 CONFIG_CRASH_DUMP=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_MEMORY_HOTREMOVE=y
@@ -333,6 +334,7 @@ CONFIG_CRYPTO_TEST=m
 CONFIG_CRYPTO_PCBC=m
 CONFIG_CRYPTO_HMAC=y
 CONFIG_CRYPTO_MICHAEL_MIC=m
+CONFIG_CRYPTO_SHA256=y
 CONFIG_CRYPTO_TGR192=m
 CONFIG_CRYPTO_WP512=m
 CONFIG_CRYPTO_ANUBIS=m
diff --git a/arch/powerpc/configs/pseries_defconfig 
b/arch/powerpc/configs/pseries_defconfig
index 654aeffc57ef..23af4a72930e 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -50,6 +50,7 @@ CONFIG_HZ_100=y
 CONFIG_BINFMT_MISC=m
 CONFIG_PPC_TRANSACTIONAL_MEM=y
 CONFIG_KEXEC=y
+CONFIG_KEXEC_FILE=y
 CONFIG_IRQ_ALL_CPUS=y
 CONFIG_MEMORY_HOTPLUG=y
 CONFIG_MEMORY_HOTREMOVE=y
@@ -300,6 +301,7 @@ CONFIG_CRYPTO_TEST=m
 CONFIG_CRYPTO_PCBC=m
 CONFIG_CRYPTO_HMAC=y
 CONFIG_CRYPTO_MICHAEL_MIC=m
+CONFIG_CRYPTO_SHA256=y
 CONFIG_CRYPTO_TGR192=m
 CONFIG_CRYPTO_WP512=m
 CONFIG_CRYPTO_ANUBIS=m
-- 
1.9.1



[PATCH v7 11/13] powerpc: Add support for loading ELF kernels with kexec_file_load.

2016-08-30 Thread Thiago Jung Bauermann
This uses all the infrastructure built up by the previous patches
in the series to load an ELF vmlinux file and an initrd. It uses the
flattened device tree at initial_boot_params as a base and adjusts memory
reservations and its /chosen node for the next kernel.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/kexec_elf_64.h |  10 ++
 arch/powerpc/kernel/Makefile|   1 +
 arch/powerpc/kernel/kexec_elf_64.c  | 282 
 arch/powerpc/kernel/machine_kexec_64.c  |   5 +-
 4 files changed, 297 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kexec_elf_64.h 
b/arch/powerpc/include/asm/kexec_elf_64.h
new file mode 100644
index ..30da6bc0ccf8
--- /dev/null
+++ b/arch/powerpc/include/asm/kexec_elf_64.h
@@ -0,0 +1,10 @@
+#ifndef __POWERPC_KEXEC_ELF_64_H__
+#define __POWERPC_KEXEC_ELF_64_H__
+
+#ifdef CONFIG_KEXEC_FILE
+
+extern struct kexec_file_ops kexec_elf64_ops;
+
+#endif /* CONFIG_KEXEC_FILE */
+
+#endif /* __POWERPC_KEXEC_ELF_64_H__ */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index 0179da5b8520..64f8dc540618 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -109,6 +109,7 @@ obj-$(CONFIG_PCI)   += pci_$(CONFIG_WORD_SIZE).o 
$(pci64-y) \
 obj-$(CONFIG_PCI_MSI)  += msi.o
 obj-$(CONFIG_KEXEC_CORE)   += machine_kexec.o crash.o \
   machine_kexec_$(CONFIG_WORD_SIZE).o
+obj-$(CONFIG_KEXEC_FILE)   += kexec_elf_$(CONFIG_WORD_SIZE).o
 obj-$(CONFIG_AUDIT)+= audit.o
 obj64-$(CONFIG_AUDIT)  += compat_audit.o
 
diff --git a/arch/powerpc/kernel/kexec_elf_64.c 
b/arch/powerpc/kernel/kexec_elf_64.c
new file mode 100644
index ..3cc8ebce1a86
--- /dev/null
+++ b/arch/powerpc/kernel/kexec_elf_64.c
@@ -0,0 +1,282 @@
+/*
+ * Load ELF vmlinux file for the kexec_file_load syscall.
+ *
+ * Copyright (C) 2004  Adam Litke (a...@us.ibm.com)
+ * Copyright (C) 2004  IBM Corp.
+ * Copyright (C) 2005  R Sharada (shar...@in.ibm.com)
+ * Copyright (C) 2006  Mohan Kumar M (mo...@in.ibm.com)
+ * Copyright (C) 2016  IBM Corporation
+ *
+ * Based on kexec-tools' kexec-elf-exec.c and kexec-elf-ppc64.c.
+ * Heavily modified for the kernel by
+ * Thiago Jung Bauermann .
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt)"kexec_elf: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+extern size_t kexec_purgatory_size;
+
+#define PURGATORY_STACK_SIZE   (16 * 1024)
+
+/**
+ * build_elf_exec_info - read ELF executable and check that we can use it
+ */
+static int build_elf_exec_info(const char *buf, size_t len, struct elfhdr 
*ehdr,
+  struct elf_info *elf_info)
+{
+   int i;
+   int ret;
+
+   ret = elf_read_from_buffer(buf, len, ehdr, elf_info);
+   if (ret)
+   return ret;
+
+   /* Big endian vmlinux has type ET_DYN. */
+   if (ehdr->e_type != ET_EXEC && ehdr->e_type != ET_DYN) {
+   pr_err("Not an ELF executable.\n");
+   goto error;
+   } else if (!elf_info->proghdrs) {
+   pr_err("No ELF program header.\n");
+   goto error;
+   }
+
+   for (i = 0; i < ehdr->e_phnum; i++) {
+   /*
+* Kexec does not support loading interpreters.
+* In addition this check keeps us from attempting
+* to kexec ordinay executables.
+*/
+   if (elf_info->proghdrs[i].p_type == PT_INTERP) {
+   pr_err("Requires an ELF interpreter.\n");
+   goto error;
+   }
+   }
+
+   return 0;
+error:
+   elf_free_info(elf_info);
+   return -ENOEXEC;
+}
+
+static int elf64_probe(const char *buf, unsigned long len)
+{
+   struct elfhdr ehdr;
+   struct elf_info elf_info;
+   int ret;
+
+   ret = build_elf_exec_info(buf, len, , _info);
+   if (ret)
+   return ret;
+
+   elf_free_info(_info);
+
+   return elf_check_arch() ? 0 : -ENOEXEC;
+}
+
+/**
+ * elf_exec_load - load ELF executable image
+ * @lowest_load_addr:  On return, will be the address where the first PT_LOAD
+ * section will be loaded in memory.
+ *
+ * Return:
+ * 0 on success, negative value on failure.
+ */
+static int elf_exec_load(struct kimage *image, struct elfhdr *ehdr,
+struct elf_info 

[PATCH v7 10/13] powerpc: Add code to work with device trees in kexec_file_load.

2016-08-30 Thread Thiago Jung Bauermann
kexec_file_load needs to set up the device tree that will be used
by the next kernel and check whether it provides a console
that can be used by the purgatory.

[a...@linux-foundation.org: coding-style fixes]
Signed-off-by: Thiago Jung Bauermann 
Signed-off-by: Andrew Morton 
---
 arch/powerpc/include/asm/kexec.h   |   3 +
 arch/powerpc/kernel/machine_kexec_64.c | 221 +
 2 files changed, 224 insertions(+)

diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index 0c7e020d935a..73f88b5f9bd1 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -96,6 +96,9 @@ int setup_purgatory(struct kimage *image, const void 
*slave_code,
const void *fdt, unsigned long kernel_load_addr,
unsigned long fdt_load_addr, unsigned long stack_top,
int debug);
+int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
+ unsigned long initrd_len, const char *cmdline);
+bool find_debug_console(const void *fdt);
 #endif /* CONFIG_KEXEC_FILE */
 
 #else /* !CONFIG_KEXEC_CORE */
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 1e678dc5096a..31c5090705e0 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -683,4 +683,225 @@ int setup_purgatory(struct kimage *image, const void 
*slave_code,
return 0;
 }
 
+/*
+ * setup_new_fdt() - modify /chosen and memory reservation for the next kernel
+ * @fdt:
+ * @initrd_load_addr:  Address where the next initrd will be loaded.
+ * @initrd_len:Size of the next initrd, or 0 if there will be 
none.
+ * @cmdline:   Command line for the next kernel, or NULL if there will
+ * be none.
+ *
+ * Return: 0 on success, or negative errno on error.
+ */
+int setup_new_fdt(void *fdt, unsigned long initrd_load_addr,
+ unsigned long initrd_len, const char *cmdline)
+{
+   uint64_t oldfdt_addr;
+   int i, ret, chosen_node;
+   const void *prop;
+
+   /* Remove memory reservation for the current device tree. */
+   oldfdt_addr = __pa(initial_boot_params);
+   for (i = 0; i < fdt_num_mem_rsv(fdt); i++) {
+   uint64_t rsv_start, rsv_size;
+
+   ret = fdt_get_mem_rsv(fdt, i, _start, _size);
+   if (ret) {
+   pr_err("Malformed device tree.\n");
+   return -EINVAL;
+   }
+
+   if (rsv_start == oldfdt_addr &&
+   rsv_size == fdt_totalsize(initial_boot_params)) {
+   ret = fdt_del_mem_rsv(fdt, i);
+   if (ret) {
+   pr_err("Error deleting fdt reservation.\n");
+   return -EINVAL;
+   }
+
+   pr_debug("Removed old device tree reservation.\n");
+   break;
+   }
+   }
+
+   chosen_node = fdt_path_offset(fdt, "/chosen");
+   if (chosen_node == -FDT_ERR_NOTFOUND) {
+   chosen_node = fdt_add_subnode(fdt, fdt_path_offset(fdt, "/"),
+ "chosen");
+   if (chosen_node < 0) {
+   pr_err("Error creating /chosen.\n");
+   return -EINVAL;
+   }
+   } else if (chosen_node < 0) {
+   pr_err("Malformed device tree: error reading /chosen.\n");
+   return -EINVAL;
+   }
+
+   /* Did we boot using an initrd? */
+   prop = fdt_getprop(fdt, chosen_node, "linux,initrd-start", NULL);
+   if (prop) {
+   uint64_t tmp_start, tmp_end, tmp_size, tmp_sizepg;
+
+   tmp_start = fdt64_to_cpu(*((const fdt64_t *) prop));
+
+   prop = fdt_getprop(fdt, chosen_node, "linux,initrd-end", NULL);
+   if (!prop) {
+   pr_err("Malformed device tree.\n");
+   return -EINVAL;
+   }
+   tmp_end = fdt64_to_cpu(*((const fdt64_t *) prop));
+
+   /*
+* kexec reserves exact initrd size, while firmware may
+* reserve a multiple of PAGE_SIZE, so check for both.
+*/
+   tmp_size = tmp_end - tmp_start;
+   tmp_sizepg = round_up(tmp_size, PAGE_SIZE);
+
+   /* Remove memory reservation for the current initrd. */
+   for (i = 0; i < fdt_num_mem_rsv(fdt); i++) {
+   uint64_t rsv_start, rsv_size;
+
+   ret = fdt_get_mem_rsv(fdt, i, _start, _size);
+   if (ret) {
+   pr_err("Malformed device tree.\n");
+   return -EINVAL;
+   }
+
+   if 

[PATCH v7 09/13] powerpc: Implement kexec_file_load.

2016-08-30 Thread Thiago Jung Bauermann
arch_kexec_walk_mem and arch_kexec_apply_relocations_add are used by
generic kexec code, while setup_purgatory is powerpc-specific and sets
runtime variables needed by the powerpc purgatory implementation.

Signed-off-by: Josh Sklar 
Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/Kconfig   |  13 ++
 arch/powerpc/include/asm/kexec.h   |   7 +
 arch/powerpc/include/asm/systbl.h  |   1 +
 arch/powerpc/include/asm/unistd.h  |   2 +-
 arch/powerpc/include/uapi/asm/unistd.h |   1 +
 arch/powerpc/kernel/Makefile   |   4 +-
 arch/powerpc/kernel/machine_kexec_64.c | 252 +
 7 files changed, 278 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 140a1b84019a..41300c3a1bfe 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -460,6 +460,19 @@ config KEXEC
  interface is strongly in flux, so no good recommendation can be
  made.
 
+config KEXEC_FILE
+   bool "kexec file based system call"
+   select KEXEC_CORE
+   select BUILD_BIN2C
+   depends on PPC64
+   depends on CRYPTO=y
+   depends on CRYPTO_SHA256=y
+   help
+ This is a new version of the kexec system call. This call is
+ file based and takes in file descriptors as system call arguments
+ for kernel and initramfs as opposed to a list of segments as is the
+ case for the older kexec call.
+
 config RELOCATABLE
bool "Build a relocatable kernel"
depends on (PPC64 && !COMPILE_TEST) || (FLATMEM && (44x || FSL_BOOKE))
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index eca2f975bf44..0c7e020d935a 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -91,6 +91,13 @@ static inline bool kdump_in_progress(void)
return crashing_cpu >= 0;
 }
 
+#ifdef CONFIG_KEXEC_FILE
+int setup_purgatory(struct kimage *image, const void *slave_code,
+   const void *fdt, unsigned long kernel_load_addr,
+   unsigned long fdt_load_addr, unsigned long stack_top,
+   int debug);
+#endif /* CONFIG_KEXEC_FILE */
+
 #else /* !CONFIG_KEXEC_CORE */
 static inline void crash_kexec_secondary(struct pt_regs *regs) { }
 
diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 2fc5d4db503c..4b369d83fe9c 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -386,3 +386,4 @@ SYSCALL(mlock2)
 SYSCALL(copy_file_range)
 COMPAT_SYS_SPU(preadv2)
 COMPAT_SYS_SPU(pwritev2)
+SYSCALL(kexec_file_load)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index cf12c580f6b2..a01e97d3f305 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -12,7 +12,7 @@
 #include 
 
 
-#define NR_syscalls382
+#define NR_syscalls383
 
 #define __NR__exit __NR_exit
 
diff --git a/arch/powerpc/include/uapi/asm/unistd.h 
b/arch/powerpc/include/uapi/asm/unistd.h
index e9f5f41aa55a..2f26335a3c42 100644
--- a/arch/powerpc/include/uapi/asm/unistd.h
+++ b/arch/powerpc/include/uapi/asm/unistd.h
@@ -392,5 +392,6 @@
 #define __NR_copy_file_range   379
 #define __NR_preadv2   380
 #define __NR_pwritev2  381
+#define __NR_kexec_file_load   382
 
 #endif /* _UAPI_ASM_POWERPC_UNISTD_H_ */
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fd550a65d450..0179da5b8520 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -123,9 +123,11 @@ ifneq ($(CONFIG_PPC_INDIRECT_PIO),y)
 obj-y  += iomap.o
 endif
 
-ifeq ($(CONFIG_MODULES)$(CONFIG_WORD_SIZE),y64)
+ifneq ($(CONFIG_MODULES)$(CONFIG_KEXEC_FILE),)
+ifeq ($(CONFIG_WORD_SIZE),64)
 obj-y  += elf_util.o elf_util_64.o
 endif
+endif
 
 obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM)  += tm.o
 
diff --git a/arch/powerpc/kernel/machine_kexec_64.c 
b/arch/powerpc/kernel/machine_kexec_64.c
index 4c780a342282..1e678dc5096a 100644
--- a/arch/powerpc/kernel/machine_kexec_64.c
+++ b/arch/powerpc/kernel/machine_kexec_64.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -31,6 +33,12 @@
 #include 
 #include 
 
+#define SLAVE_CODE_SIZE256
+
+#ifdef CONFIG_KEXEC_FILE
+static struct kexec_file_ops *kexec_file_loaders[] = { };
+#endif
+
 #ifdef CONFIG_PPC_BOOK3E
 int default_machine_kexec_prepare(struct kimage *image)
 {
@@ -432,3 +440,247 @@ static int __init export_htab_values(void)
 }
 late_initcall(export_htab_values);
 #endif /* CONFIG_PPC_STD_MMU_64 */
+
+#ifdef CONFIG_KEXEC_FILE
+int arch_kexec_kernel_image_probe(struct kimage *image, void *buf,
+ unsigned long buf_len)
+{
+   int i, ret = -ENOEXEC;
+   struct kexec_file_ops *fops;
+
+   /* We 

[PATCH v7 08/13] powerpc: Add functions to read ELF files of any endianness.

2016-08-30 Thread Thiago Jung Bauermann
A little endian kernel might need to kexec a big endian kernel (the
opposite is less likely but could happen as well), so we can't just cast
the buffer with the binary to ELF structs and use them as is done
elsewhere.

This patch adds functions which do byte-swapping as necessary when
populating the ELF structs. These functions will be used in the next
patch in the series.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/elf_util.h |  18 ++
 arch/powerpc/kernel/Makefile|   2 +-
 arch/powerpc/kernel/elf_util.c  | 476 
 3 files changed, 495 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
index 5a27e8ceb88a..18703d56eabd 100644
--- a/arch/powerpc/include/asm/elf_util.h
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -20,7 +20,14 @@
 #include 
 
 struct elf_info {
+   /*
+* Where the ELF binary contents are kept.
+* Memory managed by the user of the struct.
+*/
+   const char *buffer;
+
const struct elfhdr *ehdr;
+   const struct elf_phdr *proghdrs;
struct elf_shdr *sechdrs;
 
/* Index of stubs section. */
@@ -64,6 +71,17 @@ static inline unsigned long my_r2(const struct elf_info 
*elf_info)
return elf_info->sechdrs[elf_info->toc_section].sh_addr + 0x8000;
 }
 
+static inline bool elf_is_elf_file(const struct elfhdr *ehdr)
+{
+   return memcmp(ehdr->e_ident, ELFMAG, SELFMAG) == 0;
+}
+
+int elf_read_from_buffer(const char *buf, size_t len, struct elfhdr *ehdr,
+struct elf_info *elf_info);
+void elf_init_elf_info(const struct elfhdr *ehdr, struct elf_shdr *sechdrs,
+  struct elf_info *elf_info);
+void elf_free_info(struct elf_info *elf_info);
+
 int elf64_apply_relocate_add(const struct elf_info *elf_info,
 const char *strtab, const Elf64_Rela *rela,
 unsigned int num_rela, void *syms_base,
diff --git a/arch/powerpc/kernel/Makefile b/arch/powerpc/kernel/Makefile
index fd0dd18a6605..fd550a65d450 100644
--- a/arch/powerpc/kernel/Makefile
+++ b/arch/powerpc/kernel/Makefile
@@ -124,7 +124,7 @@ obj-y   += iomap.o
 endif
 
 ifeq ($(CONFIG_MODULES)$(CONFIG_WORD_SIZE),y64)
-obj-y  += elf_util_64.o
+obj-y  += elf_util.o elf_util_64.o
 endif
 
 obj64-$(CONFIG_PPC_TRANSACTIONAL_MEM)  += tm.o
diff --git a/arch/powerpc/kernel/elf_util.c b/arch/powerpc/kernel/elf_util.c
new file mode 100644
index ..1df4a116ad90
--- /dev/null
+++ b/arch/powerpc/kernel/elf_util.c
@@ -0,0 +1,476 @@
+/*
+ * Utility functions to work with ELF files.
+ *
+ * Copyright (C) 2016, IBM Corporation
+ *
+ * Based on kexec-tools' kexec-elf.c. Heavily modified for the
+ * kernel by Thiago Jung Bauermann .
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation (version 2 of the License).
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+
+#if ELF_CLASS == ELFCLASS32
+#define elf_addr_to_cpuelf32_to_cpu
+
+#ifndef Elf_Rel
+#define Elf_RelElf32_Rel
+#endif /* Elf_Rel */
+#else /* ELF_CLASS == ELFCLASS32 */
+#define elf_addr_to_cpuelf64_to_cpu
+
+#ifndef Elf_Rel
+#define Elf_RelElf64_Rel
+#endif /* Elf_Rel */
+
+static uint64_t elf64_to_cpu(const struct elfhdr *ehdr, uint64_t value)
+{
+   if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+   value = le64_to_cpu(value);
+   else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+   value = be64_to_cpu(value);
+
+   return value;
+}
+#endif /* ELF_CLASS == ELFCLASS32 */
+
+static uint16_t elf16_to_cpu(const struct elfhdr *ehdr, uint16_t value)
+{
+   if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+   value = le16_to_cpu(value);
+   else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+   value = be16_to_cpu(value);
+
+   return value;
+}
+
+static uint32_t elf32_to_cpu(const struct elfhdr *ehdr, uint32_t value)
+{
+   if (ehdr->e_ident[EI_DATA] == ELFDATA2LSB)
+   value = le32_to_cpu(value);
+   else if (ehdr->e_ident[EI_DATA] == ELFDATA2MSB)
+   value = be32_to_cpu(value);
+
+   return value;
+}
+
+/**
+ * elf_is_ehdr_sane - check that it is safe to use the ELF header
+ * @buf_len:   size of the buffer in which the ELF file is loaded.
+ */
+static bool elf_is_ehdr_sane(const struct elfhdr *ehdr, size_t buf_len)
+{
+   

[PATCH v7 05/13] powerpc: Factor out relocation code from module_64.c to elf_util_64.c.

2016-08-30 Thread Thiago Jung Bauermann
The kexec_file_load system call needs to relocate the purgatory, so
factor out the module relocation code so that it can be shared.

This patch's purpose is to move the ELF relocation logic from
apply_relocate_add to elf_util_64.c with as few changes as
possible. The following changes were needed:

To avoid having module-specific code in a general purpose utility
function, struct elf_info was created to contain the information
needed for ELF binaries manipulation.

my_r2, stub_for_addr and create_stub were changed to use it instead of
having to receive a struct module, since they are called from
elf64_apply_relocate_add.

local_entry_offset and squash_toc_save_inst were only used by
apply_rellocate_add, so they were moved to elf_util_64.c as well.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/include/asm/elf_util.h |  70 
 arch/powerpc/include/asm/module.h   |  14 +-
 arch/powerpc/kernel/Makefile|   4 +
 arch/powerpc/kernel/elf_util_64.c   | 269 +++
 arch/powerpc/kernel/module_64.c | 312 
 5 files changed, 386 insertions(+), 283 deletions(-)

diff --git a/arch/powerpc/include/asm/elf_util.h 
b/arch/powerpc/include/asm/elf_util.h
new file mode 100644
index ..37372559fe62
--- /dev/null
+++ b/arch/powerpc/include/asm/elf_util.h
@@ -0,0 +1,70 @@
+/*
+ * Utility functions to work with ELF files.
+ *
+ * Copyright (C) 2016, IBM Corporation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2, or (at your option)
+ * any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+
+#ifndef _ASM_POWERPC_ELF_UTIL_H
+#define _ASM_POWERPC_ELF_UTIL_H
+
+#include 
+
+struct elf_info {
+   struct elf_shdr *sechdrs;
+
+   /* Index of stubs section. */
+   unsigned int stubs_section;
+   /* Index of TOC section. */
+   unsigned int toc_section;
+};
+
+#ifdef __powerpc64__
+#ifdef PPC64_ELF_ABI_v2
+
+/* An address is simply the address of the function. */
+typedef unsigned long func_desc_t;
+#else
+
+/* An address is address of the OPD entry, which contains address of fn. */
+typedef struct ppc64_opd_entry func_desc_t;
+#endif /* PPC64_ELF_ABI_v2 */
+
+/* Like PPC32, we need little trampolines to do > 24-bit jumps (into
+   the kernel itself).  But on PPC64, these need to be used for every
+   jump, actually, to reset r2 (TOC+0x8000). */
+struct ppc64_stub_entry
+{
+   /* 28 byte jump instruction sequence (7 instructions). We only
+* need 6 instructions on ABIv2 but we always allocate 7 so
+* so we don't have to modify the trampoline load instruction. */
+   u32 jump[7];
+   /* Used by ftrace to identify stubs */
+   u32 magic;
+   /* Data for the above code */
+   func_desc_t funcdata;
+};
+#endif
+
+/* r2 is the TOC pointer: it actually points 0x8000 into the TOC (this
+   gives the value maximum span in an instruction which uses a signed
+   offset) */
+static inline unsigned long my_r2(const struct elf_info *elf_info)
+{
+   return elf_info->sechdrs[elf_info->toc_section].sh_addr + 0x8000;
+}
+
+int elf64_apply_relocate_add(const struct elf_info *elf_info,
+const char *strtab, unsigned int symindex,
+unsigned int relsec, const char *obj_name);
+
+#endif /* _ASM_POWERPC_ELF_UTIL_H */
diff --git a/arch/powerpc/include/asm/module.h 
b/arch/powerpc/include/asm/module.h
index cd4ffd86765f..f2073115d518 100644
--- a/arch/powerpc/include/asm/module.h
+++ b/arch/powerpc/include/asm/module.h
@@ -12,7 +12,14 @@
 #include 
 #include 
 #include 
+#include 
 
+/* Both low and high 16 bits are added as SIGNED additions, so if low
+   16 bits has high bit set, high 16 bits must be adjusted.  These
+   macros do that (stolen from binutils). */
+#define PPC_LO(v) ((v) & 0x)
+#define PPC_HI(v) (((v) >> 16) & 0x)
+#define PPC_HA(v) PPC_HI ((v) + 0x8000)
 
 #ifndef __powerpc64__
 /*
@@ -33,8 +40,7 @@ struct ppc_plt_entry {
 
 struct mod_arch_specific {
 #ifdef __powerpc64__
-   unsigned int stubs_section; /* Index of stubs section in module */
-   unsigned int toc_section;   /* What section is the TOC? */
+   struct elf_info elf_info;
bool toc_fixed; /* Have we fixed up .TOC.? */
 #ifdef CONFIG_DYNAMIC_FTRACE
unsigned long toc;
@@ -90,6 +96,10 @@ static inline int module_finalize_ftrace(struct module *mod, 
const Elf_Shdr *sec
 }
 #endif
 
+unsigned long stub_for_addr(const struct elf_info *elf_info, unsigned long 
addr,
+   const char *obj_name);

[PATCH v7 04/13] powerpc: Change places using CONFIG_KEXEC to use CONFIG_KEXEC_CORE instead.

2016-08-30 Thread Thiago Jung Bauermann
Commit 2965faa5e03d ("kexec: split kexec_load syscall from kexec core
code") introduced CONFIG_KEXEC_CORE so that CONFIG_KEXEC means whether
the kexec_load system call should be compiled-in and CONFIG_KEXEC_FILE
means whether the kexec_file_load system call should be compiled-in.
These options can be set independently from each other.

Since until now powerpc only supported kexec_load, CONFIG_KEXEC and
CONFIG_KEXEC_CORE were synonyms. That is not the case anymore, so we
need to make a distinction. Almost all places where CONFIG_KEXEC was
being used should be using CONFIG_KEXEC_CORE instead, since
kexec_file_load also needs that code compiled in.

Signed-off-by: Thiago Jung Bauermann 
---
 arch/powerpc/Kconfig  | 2 +-
 arch/powerpc/include/asm/debug.h  | 2 +-
 arch/powerpc/include/asm/kexec.h  | 6 +++---
 arch/powerpc/include/asm/machdep.h| 4 ++--
 arch/powerpc/include/asm/smp.h| 2 +-
 arch/powerpc/kernel/Makefile  | 4 ++--
 arch/powerpc/kernel/head_64.S | 2 +-
 arch/powerpc/kernel/misc_32.S | 2 +-
 arch/powerpc/kernel/misc_64.S | 6 +++---
 arch/powerpc/kernel/prom.c| 2 +-
 arch/powerpc/kernel/setup_64.c| 4 ++--
 arch/powerpc/kernel/smp.c | 6 +++---
 arch/powerpc/kernel/traps.c   | 2 +-
 arch/powerpc/platforms/85xx/corenet_generic.c | 2 +-
 arch/powerpc/platforms/85xx/smp.c | 8 
 arch/powerpc/platforms/cell/spu_base.c| 2 +-
 arch/powerpc/platforms/powernv/setup.c| 6 +++---
 arch/powerpc/platforms/ps3/setup.c| 4 ++--
 arch/powerpc/platforms/pseries/Makefile   | 2 +-
 arch/powerpc/platforms/pseries/setup.c| 4 ++--
 20 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 927d2ab2ce08..140a1b84019a 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -494,7 +494,7 @@ config CRASH_DUMP
 
 config FA_DUMP
bool "Firmware-assisted dump"
-   depends on PPC64 && PPC_RTAS && CRASH_DUMP && KEXEC
+   depends on PPC64 && PPC_RTAS && CRASH_DUMP && KEXEC_CORE
help
  A robust mechanism to get reliable kernel crash dump with
  assistance from firmware. This approach does not use kexec,
diff --git a/arch/powerpc/include/asm/debug.h b/arch/powerpc/include/asm/debug.h
index a954e4975049..86308f177f2d 100644
--- a/arch/powerpc/include/asm/debug.h
+++ b/arch/powerpc/include/asm/debug.h
@@ -10,7 +10,7 @@ struct pt_regs;
 
 extern struct dentry *powerpc_debugfs_root;
 
-#if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC)
+#if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE)
 
 extern int (*__debugger)(struct pt_regs *regs);
 extern int (*__debugger_ipi)(struct pt_regs *regs);
diff --git a/arch/powerpc/include/asm/kexec.h b/arch/powerpc/include/asm/kexec.h
index a46f5f45570c..eca2f975bf44 100644
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -53,7 +53,7 @@
 
 typedef void (*crash_shutdown_t)(void);
 
-#ifdef CONFIG_KEXEC
+#ifdef CONFIG_KEXEC_CORE
 
 /*
  * This function is responsible for capturing register states if coming
@@ -91,7 +91,7 @@ static inline bool kdump_in_progress(void)
return crashing_cpu >= 0;
 }
 
-#else /* !CONFIG_KEXEC */
+#else /* !CONFIG_KEXEC_CORE */
 static inline void crash_kexec_secondary(struct pt_regs *regs) { }
 
 static inline int overlaps_crashkernel(unsigned long start, unsigned long size)
@@ -116,7 +116,7 @@ static inline bool kdump_in_progress(void)
return false;
 }
 
-#endif /* CONFIG_KEXEC */
+#endif /* CONFIG_KEXEC_CORE */
 #endif /* ! __ASSEMBLY__ */
 #endif /* __KERNEL__ */
 #endif /* _ASM_POWERPC_KEXEC_H */
diff --git a/arch/powerpc/include/asm/machdep.h 
b/arch/powerpc/include/asm/machdep.h
index 0420b388dd83..3200a4403de3 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -183,7 +183,7 @@ struct machdep_calls {
 */
void (*machine_shutdown)(void);
 
-#ifdef CONFIG_KEXEC
+#ifdef CONFIG_KEXEC_CORE
void (*kexec_cpu_down)(int crash_shutdown, int secondary);
 
/* Called to do what every setup is needed on image and the
@@ -198,7 +198,7 @@ struct machdep_calls {
 * no return.
 */
void (*machine_kexec)(struct kimage *image);
-#endif /* CONFIG_KEXEC */
+#endif /* CONFIG_KEXEC_CORE */
 
 #ifdef CONFIG_SUSPEND
/* These are called to disable and enable, respectively, IRQs when
diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
index 0d02c11dc331..32db16d2e7ad 100644
--- a/arch/powerpc/include/asm/smp.h
+++ b/arch/powerpc/include/asm/smp.h
@@ -176,7 +176,7 @@ static inline void set_hard_smp_processor_id(int cpu, int 
phys)
 #endif /* !CONFIG_SMP */
 #endif /* !CONFIG_PPC64 */
 
-#if defined(CONFIG_PPC64) && 

[PATCH v7 02/13] kexec_file: Change kexec_add_buffer to take kexec_buf as argument.

2016-08-30 Thread Thiago Jung Bauermann
This is done to simplify the kexec_add_buffer argument list.
Adapt all callers to set up a kexec_buf to pass to kexec_add_buffer.

In addition, change the type of kexec_buf.buffer from char * to void *.
There is no particular reason for it to be a char *, and the change
allows us to get rid of 3 existing casts to char * in the code.

Signed-off-by: Thiago Jung Bauermann 
Acked-by: Dave Young 
Acked-by: Balbir Singh 
---
 arch/x86/kernel/crash.c   | 37 
 arch/x86/kernel/kexec-bzimage64.c | 48 +++--
 include/linux/kexec.h |  8 +---
 kernel/kexec_file.c   | 88 ++-
 4 files changed, 87 insertions(+), 94 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 9616cf76940c..38a1cdf6aa05 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -615,9 +615,9 @@ static int determine_backup_region(u64 start, u64 end, void 
*arg)
 
 int crash_load_segments(struct kimage *image)
 {
-   unsigned long src_start, src_sz, elf_sz;
-   void *elf_addr;
int ret;
+   struct kexec_buf kbuf = { .image = image, .buf_min = 0,
+ .buf_max = ULONG_MAX, .top_down = false };
 
/*
 * Determine and load a segment for backup area. First 640K RAM
@@ -631,43 +631,44 @@ int crash_load_segments(struct kimage *image)
if (ret < 0)
return ret;
 
-   src_start = image->arch.backup_src_start;
-   src_sz = image->arch.backup_src_sz;
-
/* Add backup segment. */
-   if (src_sz) {
+   if (image->arch.backup_src_sz) {
+   kbuf.buffer = _zero_bytes;
+   kbuf.bufsz = sizeof(crash_zero_bytes);
+   kbuf.memsz = image->arch.backup_src_sz;
+   kbuf.buf_align = PAGE_SIZE;
/*
 * Ideally there is no source for backup segment. This is
 * copied in purgatory after crash. Just add a zero filled
 * segment for now to make sure checksum logic works fine.
 */
-   ret = kexec_add_buffer(image, (char *)_zero_bytes,
-  sizeof(crash_zero_bytes), src_sz,
-  PAGE_SIZE, 0, -1, 0,
-  >arch.backup_load_addr);
+   ret = kexec_add_buffer();
if (ret)
return ret;
+   image->arch.backup_load_addr = kbuf.mem;
pr_debug("Loaded backup region at 0x%lx backup_start=0x%lx 
memsz=0x%lx\n",
-image->arch.backup_load_addr, src_start, src_sz);
+image->arch.backup_load_addr,
+image->arch.backup_src_start, kbuf.memsz);
}
 
/* Prepare elf headers and add a segment */
-   ret = prepare_elf_headers(image, _addr, _sz);
+   ret = prepare_elf_headers(image, , );
if (ret)
return ret;
 
-   image->arch.elf_headers = elf_addr;
-   image->arch.elf_headers_sz = elf_sz;
+   image->arch.elf_headers = kbuf.buffer;
+   image->arch.elf_headers_sz = kbuf.bufsz;
 
-   ret = kexec_add_buffer(image, (char *)elf_addr, elf_sz, elf_sz,
-   ELF_CORE_HEADER_ALIGN, 0, -1, 0,
-   >arch.elf_load_addr);
+   kbuf.memsz = kbuf.bufsz;
+   kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
+   ret = kexec_add_buffer();
if (ret) {
vfree((void *)image->arch.elf_headers);
return ret;
}
+   image->arch.elf_load_addr = kbuf.mem;
pr_debug("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n",
-image->arch.elf_load_addr, elf_sz, elf_sz);
+image->arch.elf_load_addr, kbuf.bufsz, kbuf.bufsz);
 
return ret;
 }
diff --git a/arch/x86/kernel/kexec-bzimage64.c 
b/arch/x86/kernel/kexec-bzimage64.c
index f2356bda2b05..4b3a75329fb6 100644
--- a/arch/x86/kernel/kexec-bzimage64.c
+++ b/arch/x86/kernel/kexec-bzimage64.c
@@ -331,17 +331,17 @@ static void *bzImage64_load(struct kimage *image, char 
*kernel,
 
struct setup_header *header;
int setup_sects, kern16_size, ret = 0;
-   unsigned long setup_header_size, params_cmdline_sz, params_misc_sz;
+   unsigned long setup_header_size, params_cmdline_sz;
struct boot_params *params;
unsigned long bootparam_load_addr, kernel_load_addr, initrd_load_addr;
unsigned long purgatory_load_addr;
-   unsigned long kernel_bufsz, kernel_memsz, kernel_align;
-   char *kernel_buf;
struct bzimage64_data *ldata;
struct kexec_entry64_regs regs64;
void *stack;
unsigned int setup_hdr_offset = offsetof(struct boot_params, hdr);
unsigned int efi_map_offset, efi_map_sz, efi_setup_data_offset;
+   struct 

[PATCH v7 03/13] kexec_file: Factor out kexec_locate_mem_hole from kexec_add_buffer.

2016-08-30 Thread Thiago Jung Bauermann
kexec_locate_mem_hole will be used by the PowerPC kexec_file_load
implementation to find free memory for the purgatory stack.

Signed-off-by: Thiago Jung Bauermann 
Acked-by: Dave Young 
---
 include/linux/kexec.h |  1 +
 kernel/kexec_file.c   | 25 -
 2 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index be39903edae1..d419d0e51fe5 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -176,6 +176,7 @@ struct kexec_buf {
 int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
   int (*func)(u64, u64, void *));
 extern int kexec_add_buffer(struct kexec_buf *kbuf);
+int kexec_locate_mem_hole(struct kexec_buf *kbuf);
 #endif /* CONFIG_KEXEC_FILE */
 
 struct kimage {
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 25690d64e111..3401816700f3 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -450,6 +450,23 @@ int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
 }
 
 /**
+ * kexec_locate_mem_hole - find free memory for the purgatory or the next 
kernel
+ * @kbuf:  Parameters for the memory search.
+ *
+ * On success, kbuf->mem will have the start address of the memory region 
found.
+ *
+ * Return: 0 on success, negative errno on error.
+ */
+int kexec_locate_mem_hole(struct kexec_buf *kbuf)
+{
+   int ret;
+
+   ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback);
+
+   return ret == 1 ? 0 : -EADDRNOTAVAIL;
+}
+
+/**
  * kexec_add_buffer - place a buffer in a kexec segment
  * @kbuf:  Buffer contents and memory parameters.
  *
@@ -489,11 +506,9 @@ int kexec_add_buffer(struct kexec_buf *kbuf)
kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE);
 
/* Walk the RAM ranges and allocate a suitable range for the buffer */
-   ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback);
-   if (ret != 1) {
-   /* A suitable memory range could not be found for buffer */
-   return -EADDRNOTAVAIL;
-   }
+   ret = kexec_locate_mem_hole(kbuf);
+   if (ret)
+   return ret;
 
/* Found a suitable memory range */
ksegment = >image->segment[kbuf->image->nr_segments];
-- 
1.9.1



[PATCH v7 00/13] kexec_file_load implementation for PowerPC

2016-08-30 Thread Thiago Jung Bauermann
The purpose of this new version of the series is to allow building with
CONFIG_KEXEC=n and CONFIG_KEXEC_FILE=y. This is done by patch 4, which
is new in v7. The other patches have very little changes, just to fix
checkpatch warnings, as noted in the changelog.

Note that at this moment the powerpc tree doesn't build with
CONFIG_KEXEC=n even without this series applied. I posted a separate
patch fixing that issue:

https://lists.ozlabs.org/pipermail/linuxppc-dev/2016-August/147909.html

This series doesn't depend on that patch, and they don't conflict in
any way.

Original cover letter:

This patch series implements the kexec_file_load system call on PowerPC.

This system call moves the reading of the kernel, initrd and the device tree
from the userspace kexec tool to the kernel. This is needed if you want to
do one or both of the following:

1. only allow loading of signed kernels.
2. "measure" (i.e., record the hashes of) the kernel, initrd, kernel
   command line and other boot inputs for the Integrity Measurement
   Architecture subsystem.

The above are the functions kexec already has built into kexec_file_load.
Yesterday I posted a set of patches which allows a third feature:

3. have IMA pass-on its event log (where integrity measurements are
   registered) accross kexec to the second kernel, so that the event
   history is preserved.

Because OpenPower uses an intermediary Linux instance as a boot loader
(skiroot), feature 1 is needed to implement secure boot for the platform,
while features 2 and 3 are needed to implement trusted boot.

This patch series starts by removing an x86 assumption from kexec_file:
kexec_add_buffer uses iomem to find reserved memory ranges, but PowerPC
uses the memblock subsystem.  A hook is added so that each arch can
specify how memory ranges can be found.

Also, the memory-walking logic in kexec_add_buffer is useful in this
implementation to find a free area for the purgatory's stack, so the
next patch moves that logic to kexec_locate_mem_hole.

The kexec_file_load system call needs to apply relocations to the
purgatory but adding code for that would duplicate functionality with
the module loading mechanism, which also needs to apply relocations to
the kernel modules.  Therefore, this patch series factors out the module
relocation code so that it can be shared.

One thing that is still missing is crashkernel support, which I intend
to submit shortly. For now, arch_kexec_kernel_image_probe rejects crash
kernels.

This code is based on kexec-tools, but with many modifications to adapt
it to the kernel environment and facilities. Except the purgatory,
which only has minimal changes.

Changes for v7:
- Rebased on top of v4.8-rc4.
- Patch "powerpc: Change places using CONFIG_KEXEC to use CONFIG_KEXEC_CORE
  instead."
  - New patch. Fixes build when CONFIG_KEXEC=n and CONFIG_KEXEC_FILE=y.
- Patch "powerpc: Adapt elf64_apply_relocate_add for kexec_file_load."
  - Fixed checkpatch warning "else is not generally useful after a break
or return".
  - Fixed checkpatch warnings about line length. (Andrew Morton)
- Patch "powerpc: Add code to work with device trees in kexec_file_load."
  - Remove space before tabs in doc comment for setup_new_fdt. (Andrew Morton)
  - Fixed checkpatch warnings about line length.
- Patch "powerpc: Add support for loading ELF kernels with kexec_file_load."
  - Removed duplicate #include .

Changes for v6:
- Based directly on top of v4.8-rc1.
- Patch "powerpc: Adapt elf64_apply_relocate_add for kexec_file_load."
  - Allow undefined symbols if they are relocations for the TOC in the
big endian ABI.
  - Fixed build error in this patch by adding the ehdr member to elf_info
here instead of in the next patch.
  - Initialize elf_info.ehdr in module_64.c:module_frob_arch_sections.
- Patch "powerpc: Add code to work with device trees in kexec_file_load."
  - Changed find_debug_console to look for /chosen instead of receiving
its offset as an argument.
  - setup_new_fdt: no need to find /chosen again after deleting the memory
reservation for initrd.
- Patch "powerpc: Add support for loading ELF kernels with kexec_file_load."
  - Don't pass the offset to /chosen to find_debug_console.
- Patch "powerpc: Allow userspace to set device tree properties in 
kexec_file_load"
  - Dropped patch.
- Patch "powerpc: Add purgatory for kexec_file_load implementation."
  - Make boot/string.S use the DOTSYM macro so that it can be
used by the ppc64 big endian purgatory.
  - Use -mcall-aixdesc to compile the purgatory on big endian ppc64.

Changes for v5:
- Rebased series on v4.8-rc1 + the extend kexec_file_load series.
- Patch "powerpc: Adapt elf64_apply_relocate_add for kexec_file_load."
  - New patch. These changes were previously in patch 10.
The code itself is unchanged from v4.
- Patch "powerpc: Implement kexec_file_load."
  - Moved arch_kexec_walk_mem, arch_kexec_apply_relocations_add and
setup_purgatory from patch 10 to this patch.
  - 

[PATCH v7 01/13] kexec_file: Allow arch-specific memory walking for kexec_add_buffer

2016-08-30 Thread Thiago Jung Bauermann
Allow architectures to specify a different memory walking function for
kexec_add_buffer. x86 uses iomem to track reserved memory ranges, but
PowerPC uses the memblock subsystem.

Signed-off-by: Thiago Jung Bauermann 
Acked-by: Dave Young 
Acked-by: Balbir Singh 
---
 include/linux/kexec.h   | 29 -
 kernel/kexec_file.c | 30 ++
 kernel/kexec_internal.h | 16 
 3 files changed, 50 insertions(+), 25 deletions(-)

diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index d743baaa..2d6a1ab3b630 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -148,7 +148,34 @@ struct kexec_file_ops {
kexec_verify_sig_t *verify_sig;
 #endif
 };
-#endif
+
+/**
+ * struct kexec_buf - parameters for finding a place for a buffer in memory
+ * @image: kexec image in which memory to search.
+ * @buffer:Contents which will be copied to the allocated memory.
+ * @bufsz: Size of @buffer.
+ * @mem:   On return will have address of the buffer in memory.
+ * @memsz: Size for the buffer in memory.
+ * @buf_align: Minimum alignment needed.
+ * @buf_min:   The buffer can't be placed below this address.
+ * @buf_max:   The buffer can't be placed above this address.
+ * @top_down:  Allocate from top of memory.
+ */
+struct kexec_buf {
+   struct kimage *image;
+   char *buffer;
+   unsigned long bufsz;
+   unsigned long mem;
+   unsigned long memsz;
+   unsigned long buf_align;
+   unsigned long buf_min;
+   unsigned long buf_max;
+   bool top_down;
+};
+
+int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
+  int (*func)(u64, u64, void *));
+#endif /* CONFIG_KEXEC_FILE */
 
 struct kimage {
kimage_entry_t head;
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 503bc2d348e5..4a9581acbba4 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -428,6 +428,27 @@ static int locate_mem_hole_callback(u64 start, u64 end, 
void *arg)
return locate_mem_hole_bottom_up(start, end, kbuf);
 }
 
+/**
+ * arch_kexec_walk_mem - call func(data) on free memory regions
+ * @kbuf:  Context info for the search. Also passed to @func.
+ * @func:  Function to call for each memory region.
+ *
+ * Return: The memory walk will stop when func returns a non-zero value
+ * and that value will be returned. If all free regions are visited without
+ * func returning non-zero, then zero will be returned.
+ */
+int __weak arch_kexec_walk_mem(struct kexec_buf *kbuf,
+  int (*func)(u64, u64, void *))
+{
+   if (kbuf->image->type == KEXEC_TYPE_CRASH)
+   return walk_iomem_res_desc(crashk_res.desc,
+  IORESOURCE_SYSTEM_RAM | 
IORESOURCE_BUSY,
+  crashk_res.start, crashk_res.end,
+  kbuf, func);
+   else
+   return walk_system_ram_res(0, ULONG_MAX, kbuf, func);
+}
+
 /*
  * Helper function for placing a buffer in a kexec segment. This assumes
  * that kexec_mutex is held.
@@ -474,14 +495,7 @@ int kexec_add_buffer(struct kimage *image, char *buffer, 
unsigned long bufsz,
kbuf->top_down = top_down;
 
/* Walk the RAM ranges and allocate a suitable range for the buffer */
-   if (image->type == KEXEC_TYPE_CRASH)
-   ret = walk_iomem_res_desc(crashk_res.desc,
-   IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY,
-   crashk_res.start, crashk_res.end, kbuf,
-   locate_mem_hole_callback);
-   else
-   ret = walk_system_ram_res(0, -1, kbuf,
- locate_mem_hole_callback);
+   ret = arch_kexec_walk_mem(kbuf, locate_mem_hole_callback);
if (ret != 1) {
/* A suitable memory range could not be found for buffer */
return -EADDRNOTAVAIL;
diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h
index 0a52315d9c62..4cef7e4706b0 100644
--- a/kernel/kexec_internal.h
+++ b/kernel/kexec_internal.h
@@ -20,22 +20,6 @@ struct kexec_sha_region {
unsigned long len;
 };
 
-/*
- * Keeps track of buffer parameters as provided by caller for requesting
- * memory placement of buffer.
- */
-struct kexec_buf {
-   struct kimage *image;
-   char *buffer;
-   unsigned long bufsz;
-   unsigned long mem;
-   unsigned long memsz;
-   unsigned long buf_align;
-   unsigned long buf_min;
-   unsigned long buf_max;
-   bool top_down;  /* allocate from top of memory hole */
-};
-
 void kimage_file_post_load_cleanup(struct kimage *image);
 #else /* CONFIG_KEXEC_FILE */
 static inline void kimage_file_post_load_cleanup(struct kimage *image) { }
-- 
1.9.1



Re: [PATCH 04/13] perf/core: Extend perf_output_sample_regs() to include perf_arch_regs

2016-08-30 Thread Nilay Vaish
On 28 August 2016 at 16:00, Madhavan Srinivasan
 wrote:
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 274288819829..e16bf4d057d1 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5371,16 +5371,24 @@ u64 __attribute__((weak)) perf_arch_reg_value(struct 
> perf_arch_regs *regs,
>
>  static void
>  perf_output_sample_regs(struct perf_output_handle *handle,
> -   struct pt_regs *regs, u64 mask)
> +   struct perf_regs *regs, u64 mask)
>  {
> int bit;
> DECLARE_BITMAP(_mask, 64);
> +   u64 arch_regs_mask = regs->arch_regs_mask;
>
> bitmap_from_u64(_mask, mask);
> for_each_set_bit(bit, _mask, sizeof(mask) * BITS_PER_BYTE) {
> u64 val;
>
> -   val = perf_reg_value(regs, bit);
> +   val = perf_reg_value(regs->regs, bit);
> +   perf_output_put(handle, val);
> +   }
> +
> +   bitmap_from_u64(_mask, arch_regs_mask);
> +   for_each_set_bit(bit, _mask, sizeof(mask) * BITS_PER_BYTE) {
> +   u64 val;
> +   val = perf_arch_reg_value(regs->arch_regs, bit);
> perf_output_put(handle, val);
> }
>  }
> @@ -5792,7 +5800,7 @@ void perf_output_sample(struct perf_output_handle 
> *handle,
> if (abi) {
> u64 mask = event->attr.sample_regs_user;
> perf_output_sample_regs(handle,
> -   data->regs_user.regs,
> +   >regs_user,
> mask);
> }
> }
> @@ -5827,7 +5835,7 @@ void perf_output_sample(struct perf_output_handle 
> *handle,
> u64 mask = event->attr.sample_regs_intr;
>
> perf_output_sample_regs(handle,
> -   data->regs_intr.regs,
> +   >regs_intr,
> mask);
> }
> }
> --
> 2.7.4
>

I would like to suggest a slightly different version.  Would it make
more sense to have something like following:

@@ -5792,7 +5800,7 @@ void perf_output_sample(struct perf_output_handle *handle,
 if (abi) {
u64 mask = event->attr.sample_regs_user;
perf_output_sample_regs(handle,
data->regs_user.regs,
mask);
}
+
+  if (arch_regs_mask) {
+   perf_output_pmu_regs(handle,
data->regs_users.arch_regs, arch_regs_mask);
+  }
}


Somehow I don't like outputting the two sets of registers through the
same function call.

--
Nilay


Re: [PATCH 00/13] Add support for perf_arch_regs

2016-08-30 Thread Nilay Vaish
On 28 August 2016 at 16:00, Madhavan Srinivasan
 wrote:
> Patchset to extend PERF_SAMPLE_REGS_INTR to include
> platform specific PMU registers.
>
> Patchset applies cleanly on tip:perf/core branch
>
> It's a perennial request from hardware folks to be able to
> see the raw values of the pmu registers. Partly it's so that
> they can verify perf is doing what they want, and some
> of it is that they're interested in some of the more obscure
> info that isn't plumbed out through other perf interfaces.
>
> Over the years internally we have used various hack to get
> the requested data out but this is an attempt to use a
> somewhat standard mechanism (using PERF_SAMPLE_REGS_INTR).
>
> This would also be helpful for those of us working on the perf
> hardware backends, to be able to verify that we're programming
> things correctly, without resorting to debug printks etc.
>
> Mechanism proposed:
>
> 1)perf_regs structure is extended with a perf_arch_regs structure
> which each arch/ can populate with their specific platform
> registers to sample on each perf interrupt and an arch_regs_mask
> variable, which is for perf tool to know about the perf_arch_regs
> that are supported.
>
> 2)perf/core func perf_sample_regs_intr() extended to update
> the perf_arch_regs structure and the perf_arch_reg_mask. Set of new
> support functions added perf_get_arch_regs_mask() and
> perf_get_arch_reg() to aid the updates from arch/ side.
>
> 3) perf/core funcs perf_prepare_sample() and perf_output_sample()
> are extended to support the update for the perf_arch_regs_mask and
> perf_arch_regs in the sample
>
> 4)perf/core func perf_output_sample_regs() extended to dump
> the arch_regs to the output sample.
>
> 5)Finally, perf tool side is updated to include a new element
> "arch_regs_mask" in the "struct regs_dump", event sample funcs
> and print functions are updated to support perf_arch_regs.
>

I read the patch series and I have one suggestion to make.  I think we
should not use 'arch regs' to refer to these pmu registers.  I think
architectural registers typically refer to the ones that hold the
state of the process.  Can we replace arch_regs by pmu_regs, or some
other choice?

Thanks
Nilay


Re: [PATCH 07/34] mm, vmscan: make kswapd reclaim in terms of nodes

2016-08-30 Thread Mel Gorman
On Tue, Aug 30, 2016 at 07:55:08PM +0530, Srikar Dronamraju wrote:
> > > 
> > > This patch seems to hurt FA_DUMP functionality. This behaviour is not
> > > seen on v4.7 but only after this patch.
> > > 
> > > So when a kernel on a multinode machine with memblock_reserve() such
> > > that most of the nodes have zero available memory, kswapd seems to be
> > > consuming 100% of the time.
> > > 
> > 
> > Why is FA_DUMP specifically the trigger? If the nodes have zero available
> > memory then is the zone_populated() check failing when FA_DUMP is enabled? 
> > If
> > so, that would both allow kswapd to wake and stay awake.
> > 
> 
> The trigger is memblock_reserve() for the complete node memory.  And
> this is exactly what FA_DUMP does.  Here again the node has memory but
> its all reserved so there is no free memory in the node.
> 
> Did you mean populated_zone() when you said zone_populated or have I
> mistaken? populated_zone() does return 1 since it checks for
> zone->present_pages.
> 

Yes, I meant populated_zone(). Using present pages may have hidden
a long-lived corner case as it was unexpected that an entire node
would be reserved. The old code happened to survive *probably* because
pgdat_reclaimable would look false and kswapd checks for pgdat being
balanced would happen to do the right thing in this case.

Can you check if something like this works?

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d572b78b65e1..cf64a5456cf6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -830,7 +830,7 @@ unsigned long __init node_memmap_size_bytes(int, unsigned 
long, unsigned long);
 
 static inline int populated_zone(struct zone *zone)
 {
-   return (!!zone->present_pages);
+   return (!!zone->managed_pages);
 }
 
 extern int movable_zone;

-- 
Mel Gorman
SUSE Labs


Re: [PATCH 07/34] mm, vmscan: make kswapd reclaim in terms of nodes

2016-08-30 Thread Srikar Dronamraju
> > 
> > This patch seems to hurt FA_DUMP functionality. This behaviour is not
> > seen on v4.7 but only after this patch.
> > 
> > So when a kernel on a multinode machine with memblock_reserve() such
> > that most of the nodes have zero available memory, kswapd seems to be
> > consuming 100% of the time.
> > 
> 
> Why is FA_DUMP specifically the trigger? If the nodes have zero available
> memory then is the zone_populated() check failing when FA_DUMP is enabled? If
> so, that would both allow kswapd to wake and stay awake.
> 

The trigger is memblock_reserve() for the complete node memory.  And
this is exactly what FA_DUMP does.  Here again the node has memory but
its all reserved so there is no free memory in the node.

Did you mean populated_zone() when you said zone_populated or have I
mistaken? populated_zone() does return 1 since it checks for
zone->present_pages.

Here is revelant log from the dmesg log at boot 

ppc64_pft_size= 0x26
phys_mem_size = 0x1e46
dcache_bsize  = 0x80
icache_bsize  = 0x80
cpu_features  = 0x27fc7aec18500249
  possible= 0x3fff18500649
  always  = 0x18100040
cpu_user_features = 0xdc0065c2 0xef00
mmu_features  = 0x7c01
firmware_features = 0x0003c45bfc57
htab_hash_mask= 0x7fff
-
Node 0 Memory: 0x0-0x1fb5000
Node 1 Memory: 0x1fb5000-0x3fa9000
Node 2 Memory: 0x3fa9000-0x5f9b000
Node 3 Memory: 0x5f9b000-0x7685000
Node 4 Memory: 0x7685000-0x9502000
Node 5 Memory: 0x9502000-0xb37f000
Node 6 Memory: 0xb37f000-0xd1fc000
Node 7 Memory: 0xd1fc000-0xf079000
Node 8 Memory: 0xf079000-0x10ef6000
Node 9 Memory: 0x10ef6000-0x12d73000
Node 10 Memory: 0x12d73000-0x14bf
Node 11 Memory: 0x14bf-0x16a6d000
Node 12 Memory: 0x16a6d000-0x188ea000
Node 13 Memory: 0x188ea000-0x1a766000
Node 14 Memory: 0x1a766000-0x1c5e3000
Node 15 Memory: 0x1c5e3000-0x1e46
numa: Initmem setup node 0 [mem 0x-0x1fb4fff]
numa:   NODE_DATA [mem 0x1837fe23680-0x1837fe2d37f]
numa: Initmem setup node 1 [mem 0x1fb5000-0x3fa8fff]
numa:   NODE_DATA [mem 0x1837fa19980-0x1837fa2367f]
numa: NODE_DATA(1) on node 0
numa: Initmem setup node 2 [mem 0x3fa9000-0x5f9afff]
numa:   NODE_DATA [mem 0x1837f60fc80-0x1837f61997f]
numa: NODE_DATA(2) on node 0
numa: Initmem setup node 3 [mem 0x5f9b000-0x7684fff]
numa:   NODE_DATA [mem 0x1837f205f80-0x1837f20fc7f]
numa: NODE_DATA(3) on node 0
numa: Initmem setup node 4 [mem 0x7685000-0x9501fff]
numa:   NODE_DATA [mem 0x1837ef1c280-0x1837ef25f7f]
numa: NODE_DATA(4) on node 0
numa: Initmem setup node 5 [mem 0x9502000-0xb37efff]
numa:   NODE_DATA [mem 0x1837eb42580-0x1837eb4c27f]
numa: NODE_DATA(5) on node 0
numa: Initmem setup node 6 [mem 0xb37f000-0xd1fbfff]
numa:   NODE_DATA [mem 0x1837e778880-0x1837e78257f]
numa: NODE_DATA(6) on node 0
numa: Initmem setup node 7 [mem 0xd1fc000-0xf078fff]
numa:   NODE_DATA [mem 0x1837e39eb80-0x1837e3a887f]
numa: NODE_DATA(7) on node 0
numa: Initmem setup node 8 [mem 0xf079000-0x10ef5fff]
numa:   NODE_DATA [mem 0x1837dfc4e80-0x1837dfceb7f]
numa: NODE_DATA(8) on node 0
numa: Initmem setup node 9 [mem 0x10ef6000-0x12d72fff]
numa:   NODE_DATA [mem 0x1837dbeb180-0x1837dbf4e7f]
numa: NODE_DATA(9) on node 0
numa: Initmem setup node 10 [mem 0x12d73000-0x14be]
numa:   NODE_DATA [mem 0x1837d811480-0x1837d81b17f]
numa: NODE_DATA(10) on node 0
numa: Initmem setup node 11 [mem 0x14bf-0x16a6cfff]
numa:   NODE_DATA [mem 0x1837d437780-0x1837d44147f]
numa: NODE_DATA(11) on node 0
numa: Initmem setup node 12 [mem 0x16a6d000-0x188e9fff]
numa:   NODE_DATA [mem 0x1837d05da80-0x1837d06777f]
numa: NODE_DATA(12) on node 0
numa: Initmem setup node 13 [mem 0x188ea000-0x1a765fff]
numa:   NODE_DATA [mem 0x1837cc83d80-0x1837cc8da7f]
numa: NODE_DATA(13) on node 0
numa: Initmem setup node 14 [mem 0x1a766000-0x1c5e2fff]
numa:   NODE_DATA [mem 0x1837c8aa080-0x1837c8b3d7f]
numa: NODE_DATA(14) on node 0
numa: Initmem setup node 15 [mem 0x1c5e3000-0x1e45]
numa:   NODE_DATA [mem 0x1837c4d0380-0x1837c4da07f]
numa: NODE_DATA(15) on node 0
Section 99194 and 99199 (node 0) have a circular dependency on usemap and pgdat 
allocations
node 1 must be removed before remove section 99193
node 1 must be removed before remove section 99194
node 2 must be removed before remove section 99193
node 4 must be removed before remove section 99193
node 8 must be removed before remove section 99193
node 13 must be removed before remove section 99193
PCI host bridge /pci@8002032  ranges:
 MEM 0x3fd48000..0x3fd4feff -> 0x8000 
 MEM 0x3290..0x329f -> 0x0003d290 
PCI host bridge 

Re: [PATCH 07/34] mm, vmscan: make kswapd reclaim in terms of nodes

2016-08-30 Thread Mel Gorman
On Mon, Aug 29, 2016 at 03:08:44PM +0530, Srikar Dronamraju wrote:
> > Patch "mm: vmscan: Begin reclaiming pages on a per-node basis" started
> > thinking of reclaim in terms of nodes but kswapd is still zone-centric. This
> > patch gets rid of many of the node-based versus zone-based decisions.
> > 
> > o A node is considered balanced when any eligible lower zone is balanced.
> >   This eliminates one class of age-inversion problem because we avoid
> >   reclaiming a newer page just because it's in the wrong zone
> > o pgdat_balanced disappears because we now only care about one zone being
> >   balanced.
> > o Some anomalies related to writeback and congestion tracking being based on
> >   zones disappear.
> > o kswapd no longer has to take care to reclaim zones in the reverse order
> >   that the page allocator uses.
> > o Most importantly of all, reclaim from node 0 with multiple zones will
> >   have similar aging and reclaiming characteristics as every
> >   other node.
> > 
> > Signed-off-by: Mel Gorman 
> > Acked-by: Johannes Weiner 
> > Acked-by: Vlastimil Babka 
> 
> This patch seems to hurt FA_DUMP functionality. This behaviour is not
> seen on v4.7 but only after this patch.
> 
> So when a kernel on a multinode machine with memblock_reserve() such
> that most of the nodes have zero available memory, kswapd seems to be
> consuming 100% of the time.
> 

Why is FA_DUMP specifically the trigger? If the nodes have zero available
memory then is the zone_populated() check failing when FA_DUMP is enabled? If
so, that would both allow kswapd to wake and stay awake.

-- 
Mel Gorman
SUSE Labs


[PATCH 6/6] powerpc/boot: Add support for XZ compression

2016-08-30 Thread Oliver O'Halloran
This patch adds an option to use XZ compression for the kernel image.
Currently this is only enabled for PPC64 targets since the bulk of the
32bit platforms produce uboot images which do not use the wrapper.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/boot/Makefile |  2 ++
 arch/powerpc/boot/decompress.c |  5 +
 arch/powerpc/boot/types.h  | 10 +
 arch/powerpc/boot/xz_config.h  | 39 ++
 arch/powerpc/platforms/Kconfig.cputype |  1 +
 5 files changed, 57 insertions(+)
 create mode 100644 arch/powerpc/boot/xz_config.h

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 482bac2af1ff..de36806c1a73 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -20,6 +20,7 @@
 all: $(obj)/zImage
 
 compress-$(CONFIG_KERNEL_GZIP) := CONFIG_KERNEL_GZIP
+compress-$(CONFIG_KERNEL_XZ)   := CONFIG_KERNEL_XZ
 
 BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
 -fno-strict-aliasing -Os -msoft-float -pipe \
@@ -213,6 +214,7 @@ endif
 endif
 
 compressor-$(CONFIG_KERNEL_GZIP) := gz
+compressor-$(CONFIG_KERNEL_XZ)   := xz
 
 # args (to if_changed): 1 = (this rule), 2 = platform, 3 = dts 4=dtb 5=initrd
 quiet_cmd_wrap = WRAP$@
diff --git a/arch/powerpc/boot/decompress.c b/arch/powerpc/boot/decompress.c
index 60fc6fb26867..8f32ea4289af 100644
--- a/arch/powerpc/boot/decompress.c
+++ b/arch/powerpc/boot/decompress.c
@@ -37,6 +37,11 @@
 #  include "decompress_inflate.c"
 #endif
 
+#ifdef CONFIG_KERNEL_XZ
+#  include "xz_config.h"
+#  include "../../../lib/decompress_unxz.c"
+#endif
+
 /* globals for tracking the state of the decompression */
 static unsigned long decompressed_bytes;
 static unsigned long limit;
diff --git a/arch/powerpc/boot/types.h b/arch/powerpc/boot/types.h
index 85565a89bcc2..0362a262a299 100644
--- a/arch/powerpc/boot/types.h
+++ b/arch/powerpc/boot/types.h
@@ -34,4 +34,14 @@ typedef s64 int64_t;
(void) (&_x == &_y);\
_x > _y ? _x : _y; })
 
+#define min_t(type, a, b) min(((type) a), ((type) b))
+#define max_t(type, a, b) max(((type) a), ((type) b))
+
+#ifndef true
+#define true 1
+#endif
+
+#ifndef false
+#define false 0
+#endif
 #endif /* _TYPES_H_ */
diff --git a/arch/powerpc/boot/xz_config.h b/arch/powerpc/boot/xz_config.h
new file mode 100644
index ..5c6afdbca642
--- /dev/null
+++ b/arch/powerpc/boot/xz_config.h
@@ -0,0 +1,39 @@
+#ifndef __XZ_CONFIG_H__
+#define __XZ_CONFIG_H__
+
+/*
+ * most of this is copied from lib/xz/xz_private.h, we can't use their defines
+ * since the boot wrapper is not built in the same environment as the rest of
+ * the kernel.
+ */
+
+#include "types.h"
+#include "swab.h"
+
+static inline uint32_t swab32p(void *p)
+{
+   uint32_t *q = p;
+
+   return swab32(*q);
+}
+
+#ifdef __LITTLE_ENDIAN__
+#define get_le32(p) (*((uint32_t *) (p)))
+#else
+#define get_le32(p) swab32p(p)
+#endif
+
+#define memeq(a, b, size) (memcmp(a, b, size) == 0)
+#define memzero(buf, size) memset(buf, 0, size)
+
+/* prevent the inclusion of the xz-preboot MM headers */
+#define DECOMPR_MM_H
+#define memmove memmove
+#define XZ_EXTERN static
+
+/* xz.h needs to be included directly since we need enum xz_mode */
+#include "../../../include/linux/xz.h"
+
+#undef XZ_EXTERN
+
+#endif
diff --git a/arch/powerpc/platforms/Kconfig.cputype 
b/arch/powerpc/platforms/Kconfig.cputype
index f32edec13fd1..d5da55b01027 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -2,6 +2,7 @@ config PPC64
bool "64-bit kernel"
default n
select ZLIB_DEFLATE
+   select HAVE_KERNEL_XZ
help
  This option selects whether a 32-bit or a 64-bit kernel
  will be built.
-- 
2.5.5



[PATCH 5/6] powerpc/boot: add xz support to the wrapper script

2016-08-30 Thread Oliver O'Halloran
This modifies the script so that the -Z option takes an argument to
specify the compression type. It can either be 'gz', 'xz' or 'none'.
The legazy --no-gzip and -z options are still supported and will set
the compression to none and gzip respectively, but they are not
documented.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/boot/Makefile |  7 --
 arch/powerpc/boot/wrapper  | 61 ++
 2 files changed, 50 insertions(+), 18 deletions(-)

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 3fdd74ac2fae..482bac2af1ff 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -212,10 +212,13 @@ CROSSWRAP := -C "$(CROSS_COMPILE)"
 endif
 endif
 
+compressor-$(CONFIG_KERNEL_GZIP) := gz
+
 # args (to if_changed): 1 = (this rule), 2 = platform, 3 = dts 4=dtb 5=initrd
 quiet_cmd_wrap = WRAP$@
-  cmd_wrap =$(CONFIG_SHELL) $(wrapper) -c -o $@ -p $2 $(CROSSWRAP) \
-   $(if $3, -s $3)$(if $4, -d $4)$(if $5, -i $5) vmlinux
+  cmd_wrap =$(CONFIG_SHELL) $(wrapper) -Z $(compressor-y) -c -o $@ -p $2 \
+   $(CROSSWRAP) $(if $3, -s $3)$(if $4, -d $4)$(if $5, -i $5) \
+   vmlinux
 
 image-$(CONFIG_PPC_PSERIES)+= zImage.pseries
 image-$(CONFIG_PPC_POWERNV)+= zImage.pseries
diff --git a/arch/powerpc/boot/wrapper b/arch/powerpc/boot/wrapper
index 6681ec3625c9..cf7631be5007 100755
--- a/arch/powerpc/boot/wrapper
+++ b/arch/powerpc/boot/wrapper
@@ -20,6 +20,8 @@
 # -D dir   specify directory containing data files used by script
 #  (default ./arch/powerpc/boot)
 # -W dir   specify working directory for temporary files (default .)
+# -z   use gzip (legacy)
+# -Z zsuffixcompression to use (gz, xz or none)
 
 # Stop execution if any command fails
 set -e
@@ -38,7 +40,7 @@ dtb=
 dts=
 cacheit=
 binary=
-gzip=.gz
+compression=.gz
 pie=
 format=
 
@@ -59,7 +61,8 @@ tmpdir=.
 usage() {
 echo 'Usage: wrapper [-o output] [-p platform] [-i initrd]' >&2
 echo '   [-d devtree] [-s tree.dts] [-c] [-C cross-prefix]' >&2
-echo '   [-D datadir] [-W workingdir] [--no-gzip] [vmlinux]' >&2
+echo '   [-D datadir] [-W workingdir] [-Z (gz|xz|none)]' >&2
+echo '   [--no-compression] [vmlinux]' >&2
 exit 1
 }
 
@@ -126,8 +129,24 @@ while [ "$#" -gt 0 ]; do
[ "$#" -gt 0 ] || usage
tmpdir="$1"
;;
+-z)
+   compression=.gz
+   ;;
+-Z)
+   shift
+   [ "$#" -gt 0 ] || usage
+[ "$1" != "gz" -o "$1" != "xz" -o "$1" != "none" ] || usage
+
+   compression=".$1"
+
+if [ $compression = ".none" ]; then
+compression=
+fi
+   ;;
 --no-gzip)
-gzip=
+# a "feature" of the the wrapper script is that it can be used outside
+# the kernel tree. So keeping this around for backwards compatibility.
+compression=
 ;;
 -?)
usage
@@ -140,6 +159,7 @@ while [ "$#" -gt 0 ]; do
 shift
 done
 
+
 if [ -n "$dts" ]; then
 if [ ! -r "$dts" -a -r "$object/dts/$dts" ]; then
dts="$object/dts/$dts"
@@ -212,7 +232,7 @@ miboot|uboot*)
 ;;
 cuboot*)
 binary=y
-gzip=
+compression=
 case "$platform" in
 *-mpc866ads|*-mpc885ads|*-adder875*|*-ep88xc)
 platformo=$object/cuboot-8xx.o
@@ -243,7 +263,7 @@ cuboot*)
 ps3)
 platformo="$object/ps3-head.o $object/ps3-hvcall.o $object/ps3.o"
 lds=$object/zImage.ps3.lds
-gzip=
+compression=
 ext=bin
 objflags="-O binary --set-section-flags=.bss=contents,alloc,load,data"
 ksection=.kernel:vmlinux.bin
@@ -310,27 +330,37 @@ mvme7100)
 esac
 
 vmz="$tmpdir/`basename \"$kernel\"`.$ext"
-if [ -z "$cacheit" -o ! -f "$vmz$gzip" -o "$vmz$gzip" -ot "$kernel" ]; then
-${CROSS}objcopy $objflags "$kernel" "$vmz.$$"
 
-strip_size=$(stat -c %s $vmz.$$)
+# Calculate the vmlinux.strip size
+${CROSS}objcopy $objflags "$kernel" "$vmz.$$"
+strip_size=$(stat -c %s $vmz.$$)
 
-if [ -n "$gzip" ]; then
+if [ -z "$cacheit" -o ! -f "$vmz$compression" -o "$vmz$compression" -ot 
"$kernel" ]; then
+# recompress the image if we need to
+case $compression in
+.xz)
+xz --check=crc32 -f -9 "$vmz.$$"
+;;
+.gz)
 gzip -n -f -9 "$vmz.$$"
-fi
+;;
+*)
+# drop the compression suffix so the stripped vmlinux is used
+compression=
+   ;;
+esac
 
 if [ -n "$cacheit" ]; then
-   mv -f "$vmz.$$$gzip" "$vmz$gzip"
+   mv -f "$vmz.$$$compression" "$vmz$compression"
 else
vmz="$vmz.$$"
 fi
 else
-# Calculate the vmlinux.strip size
-${CROSS}objcopy $objflags "$kernel" "$vmz.$$"
-strip_size=$(stat -c %s $vmz.$$)
 rm -f $vmz.$$
 fi
 
+vmz="$vmz$compression"
+
 if [ "$make_space" = "y" ]; then
# Round the size to next higher MB limit
round_size=$(((strip_size + 0xf) & 0xfff0))
@@ 

[PATCH 4/6] powerpc/boot: remove legacy gzip wrapper

2016-08-30 Thread Oliver O'Halloran
This code is no longer used and can be removed.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/boot/gunzip_util.c | 204 
 arch/powerpc/boot/gunzip_util.h |  45 -
 2 files changed, 249 deletions(-)
 delete mode 100644 arch/powerpc/boot/gunzip_util.c
 delete mode 100644 arch/powerpc/boot/gunzip_util.h

diff --git a/arch/powerpc/boot/gunzip_util.c b/arch/powerpc/boot/gunzip_util.c
deleted file mode 100644
index 9dc52501de83..
--- a/arch/powerpc/boot/gunzip_util.c
+++ /dev/null
@@ -1,204 +0,0 @@
-/*
- * Copyright 2007 David Gibson, IBM Corporation.
- * Based on earlier work, Copyright (C) Paul Mackerras 1997.
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include 
-#include "string.h"
-#include "stdio.h"
-#include "ops.h"
-#include "gunzip_util.h"
-
-#define HEAD_CRC   2
-#define EXTRA_FIELD4
-#define ORIG_NAME  8
-#define COMMENT0x10
-#define RESERVED   0xe0
-
-/**
- * gunzip_start - prepare to decompress gzip data
- * @state: decompressor state structure to be initialized
- * @src:   buffer containing gzip compressed or uncompressed data
- * @srclen:size in bytes of the buffer at src
- *
- * If the buffer at @src contains a gzip header, this function
- * initializes zlib to decompress the data, storing the decompression
- * state in @state.  The other functions in this file can then be used
- * to decompress data from the gzipped stream.
- *
- * If the buffer at @src does not contain a gzip header, it is assumed
- * to contain uncompressed data.  The buffer information is recorded
- * in @state and the other functions in this file will simply copy
- * data from the uncompressed data stream at @src.
- *
- * Any errors, such as bad compressed data, cause an error to be
- * printed an the platform's exit() function to be called.
- */
-void gunzip_start(struct gunzip_state *state, void *src, int srclen)
-{
-   char *hdr = src;
-   int hdrlen = 0;
-
-   memset(state, 0, sizeof(*state));
-
-   /* Check for gzip magic number */
-   if ((hdr[0] == 0x1f) && (hdr[1] == 0x8b)) {
-   /* gzip data, initialize zlib parameters */
-   int r, flags;
-
-   state->s.workspace = state->scratch;
-   if (zlib_inflate_workspacesize() > sizeof(state->scratch))
-   fatal("insufficient scratch space for gunzip\n\r");
-
-   /* skip header */
-   hdrlen = 10;
-   flags = hdr[3];
-   if (hdr[2] != Z_DEFLATED || (flags & RESERVED) != 0)
-   fatal("bad gzipped data\n\r");
-   if ((flags & EXTRA_FIELD) != 0)
-   hdrlen = 12 + hdr[10] + (hdr[11] << 8);
-   if ((flags & ORIG_NAME) != 0)
-   while (hdr[hdrlen++] != 0)
-   ;
-   if ((flags & COMMENT) != 0)
-   while (hdr[hdrlen++] != 0)
-   ;
-   if ((flags & HEAD_CRC) != 0)
-   hdrlen += 2;
-   if (hdrlen >= srclen)
-   fatal("gunzip_start: ran out of data in header\n\r");
-
-   r = zlib_inflateInit2(>s, -MAX_WBITS);
-   if (r != Z_OK)
-   fatal("inflateInit2 returned %d\n\r", r);
-   }
-
-   state->s.total_in = hdrlen;
-   state->s.next_in = src + hdrlen;
-   state->s.avail_in = srclen - hdrlen;
-}
-
-/**
- * gunzip_partial - extract bytes from a gzip data stream
- * @state: gzip state structure previously initialized by gunzip_start()
- * @dst:   buffer to store extracted data
- * @dstlen:maximum number of bytes to extract
- *
- * This function extracts at most @dstlen bytes from the data stream
- * previously associated with @state by gunzip_start(), decompressing
- * if necessary.  Exactly @dstlen bytes are extracted unless the data
- * stream doesn't contain enough bytes, in which case the entire
- * remainder of the stream is decompressed.
- *
- * Returns the actual number of bytes extracted.  If any errors occur,
- * such as a corrupted compressed stream, an error is printed an the
- * platform's exit() function is called.
- */
-int gunzip_partial(struct gunzip_state *state, void *dst, int dstlen)
-{
-   int len;
-
-   if (state->s.workspace) {
-   /* gunzipping */
-   int r;
-
-   state->s.next_out = dst;
-   state->s.avail_out = dstlen;
-   r = zlib_inflate(>s, Z_FULL_FLUSH);
-   if (r != Z_OK && r != Z_STREAM_END)
-   fatal("inflate returned %d msg: %s\n\r", r, 
state->s.msg);
-   len = 

[PATCH 3/6] powerpc/boot: use the preboot decompression API

2016-08-30 Thread Oliver O'Halloran
Currently the powerpc boot wrapper has its own wrapper around zlib to
handle decompressing gzipped kernels. The kernel decompressor library
functions now provide a generic interface that can be used in the pre-boot
environment. This allows boot wrappers to easily support different
compression algorithms. This patch converts the wrapper to use this new
API, but does not add support for using new algorithms.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/boot/Makefile |  10 ++-
 arch/powerpc/boot/decompress.c | 142 +
 arch/powerpc/boot/main.c   |  35 +-
 arch/powerpc/boot/ops.h|   3 +
 4 files changed, 170 insertions(+), 20 deletions(-)
 create mode 100644 arch/powerpc/boot/decompress.c

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 5a99a485d80a..3fdd74ac2fae 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -65,11 +65,12 @@ $(obj)/virtex405-head.o: BOOTAFLAGS += -mcpu=405
 
 # the kernel's version of zlib pulls in a lot of other kernel headers
 # which we don't provide inside the wrapper.
+zlib-decomp-$(CONFIG_KERNEL_GZIP) := decompress_inflate.c
 zlib-$(CONFIG_KERNEL_GZIP) := inffast.c inflate.c inftrees.c
 zlibheaders-$(CONFIG_KERNEL_GZIP) := inffast.h inffixed.h inflate.h inftrees.h 
infutil.h
 zliblinuxheader-$(CONFIG_KERNEL_GZIP) := zlib.h zconf.h zutil.h
 
-$(addprefix $(obj)/,$(zlib-y) cuboot-c2k.o gunzip_util.o main.o): \
+$(addprefix $(obj)/,$(zlib-y) cuboot-c2k.o decompress.o main.o): \
$(addprefix $(obj)/,$(zliblinuxheader-y)) \
$(addprefix $(obj)/,$(zlibheaders-y)) \
$(addprefix $(obj)/,$(zlib-decomp-y))
@@ -80,10 +81,10 @@ libfdtheader := fdt.h libfdt.h libfdt_internal.h
 $(addprefix $(obj)/,$(libfdt) libfdt-wrapper.o simpleboot.o epapr.o opal.o): \
$(addprefix $(obj)/,$(libfdtheader))
 
-src-wlib-y := string.S crt0.S crtsavres.S stdio.c main.c \
+src-wlib-y := string.S crt0.S crtsavres.S stdio.c decompress.c main.c \
$(libfdt) libfdt-wrapper.c \
ns16550.c serial.c simple_alloc.c div64.S util.S \
-   gunzip_util.c elf_util.c $(zlib-y) devtree.c stdlib.c \
+   decompress.o elf_util.c $(zlib-y) devtree.c stdlib.c \
oflib.c ofconsole.c cuboot.c mpsc.c cpm-serial.c \
uartlite.c mpc52xx-psc.c opal.c opal-calls.S
 src-wlib-$(CONFIG_40x) += 4xx.c planetcore.c
@@ -144,6 +145,9 @@ $(addprefix $(obj)/,$(zlibheaders-y)): $(obj)/%: 
$(srctree)/lib/zlib_inflate/%
 $(addprefix $(obj)/,$(zliblinuxheader-y)): $(obj)/%: $(srctree)/include/linux/%
$(call cmd,copy_kern_src)
 
+$(addprefix $(obj)/,$(zlib-decomp-y)): $(obj)/%: $(srctree)/lib/%
+   $(call cmd,copy_kern_src)
+
 quiet_cmd_copy_libfdt = COPY$@
   cmd_copy_libfdt = cp $< $@
 
diff --git a/arch/powerpc/boot/decompress.c b/arch/powerpc/boot/decompress.c
new file mode 100644
index ..60fc6fb26867
--- /dev/null
+++ b/arch/powerpc/boot/decompress.c
@@ -0,0 +1,142 @@
+/*
+ * Wrapper around the kernel's pre-boot decompression library.
+ *
+ * Copyright (C) IBM Corporation 2016.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include "elf.h"
+#include "page.h"
+#include "string.h"
+#include "stdio.h"
+#include "ops.h"
+#include "reg.h"
+#include "types.h"
+
+/*
+ * The decompressor_*.c files play #ifdef games so they can be used in both
+ * pre-boot and regular kernel code. We need these definitions to make the
+ * includes work.
+ */
+
+#define STATIC static
+#define INIT
+#define __always_inline inline
+
+/*
+ * The build process will copy the required zlib source files and headers
+ * out of lib/ and "fix" the includes so they do not pull in other kernel
+ * headers.
+ */
+
+#ifdef CONFIG_KERNEL_GZIP
+#  include "decompress_inflate.c"
+#endif
+
+/* globals for tracking the state of the decompression */
+static unsigned long decompressed_bytes;
+static unsigned long limit;
+static unsigned long skip;
+static char *output_buffer;
+
+/*
+ * flush() is called by __decompress() when the decompressor's scratch buffer 
is
+ * full.
+ */
+static long flush(void *v, unsigned long buffer_size)
+{
+   unsigned long end = decompressed_bytes + buffer_size;
+   unsigned long size = buffer_size;
+   unsigned long offset = 0;
+   char *in = v;
+   char *out;
+
+   /*
+* if we hit our decompression limit, we need to fake an error to abort
+* the in-progress decompression.
+*/
+   if (decompressed_bytes >= limit)
+   return -1;
+
+   /* skip this entire block */
+   if (end <= skip) {
+   decompressed_bytes += buffer_size;
+   return buffer_size;
+   }
+
+   /* 

[PATCH 2/6] powerpc/boot: Use CONFIG_KERNEL_GZIP

2016-08-30 Thread Oliver O'Halloran
Most architectures allow the compression algorithm used to produced the
vmlinuz image to be selected as a kernel config option. In preperation
for supporting algorithms other than gzip in the powerpc boot wrapper
the makefile needs to be modified to use these config options.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/Kconfig   |  1 +
 arch/powerpc/boot/Makefile | 31 +++
 2 files changed, 20 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 927d2ab2ce08..9f0568852ecf 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -167,6 +167,7 @@ config PPC
select GENERIC_CPU_AUTOPROBE
select HAVE_VIRT_CPU_ACCOUNTING
select HAVE_ARCH_HARDENED_USERCOPY
+   select HAVE_KERNEL_GZIP
 
 config GENERIC_CSUM
def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index f98e42ee2534..5a99a485d80a 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -19,10 +19,14 @@
 
 all: $(obj)/zImage
 
+compress-$(CONFIG_KERNEL_GZIP) := CONFIG_KERNEL_GZIP
+
 BOOTCFLAGS:= -Wall -Wundef -Wstrict-prototypes -Wno-trigraphs \
 -fno-strict-aliasing -Os -msoft-float -pipe \
 -fomit-frame-pointer -fno-builtin -fPIC -nostdinc \
--isystem $(shell $(CROSS32CC) -print-file-name=include)
+-isystem $(shell $(CROSS32CC) -print-file-name=include) \
+-D$(compress-y)
+
 ifdef CONFIG_PPC64_BOOT_WRAPPER
 BOOTCFLAGS += -m64
 endif
@@ -59,13 +63,16 @@ $(obj)/treeboot-currituck.o: BOOTCFLAGS += -mcpu=405
 $(obj)/treeboot-akebono.o: BOOTCFLAGS += -mcpu=405
 $(obj)/virtex405-head.o: BOOTAFLAGS += -mcpu=405
 
+# the kernel's version of zlib pulls in a lot of other kernel headers
+# which we don't provide inside the wrapper.
+zlib-$(CONFIG_KERNEL_GZIP) := inffast.c inflate.c inftrees.c
+zlibheaders-$(CONFIG_KERNEL_GZIP) := inffast.h inffixed.h inflate.h inftrees.h 
infutil.h
+zliblinuxheader-$(CONFIG_KERNEL_GZIP) := zlib.h zconf.h zutil.h
 
-zlib   := inffast.c inflate.c inftrees.c
-zlibheader := inffast.h inffixed.h inflate.h inftrees.h infutil.h
-zliblinuxheader := zlib.h zconf.h zutil.h
-
-$(addprefix $(obj)/,$(zlib) cuboot-c2k.o gunzip_util.o main.o): \
-   $(addprefix $(obj)/,$(zliblinuxheader)) $(addprefix 
$(obj)/,$(zlibheader))
+$(addprefix $(obj)/,$(zlib-y) cuboot-c2k.o gunzip_util.o main.o): \
+   $(addprefix $(obj)/,$(zliblinuxheader-y)) \
+   $(addprefix $(obj)/,$(zlibheaders-y)) \
+   $(addprefix $(obj)/,$(zlib-decomp-y))
 
 libfdt   := fdt.c fdt_ro.c fdt_wip.c fdt_sw.c fdt_rw.c fdt_strerror.c
 libfdtheader := fdt.h libfdt.h libfdt_internal.h
@@ -76,7 +83,7 @@ $(addprefix $(obj)/,$(libfdt) libfdt-wrapper.o simpleboot.o 
epapr.o opal.o): \
 src-wlib-y := string.S crt0.S crtsavres.S stdio.c main.c \
$(libfdt) libfdt-wrapper.c \
ns16550.c serial.c simple_alloc.c div64.S util.S \
-   gunzip_util.c elf_util.c $(zlib) devtree.c stdlib.c \
+   gunzip_util.c elf_util.c $(zlib-y) devtree.c stdlib.c \
oflib.c ofconsole.c cuboot.c mpsc.c cpm-serial.c \
uartlite.c mpc52xx-psc.c opal.c opal-calls.S
 src-wlib-$(CONFIG_40x) += 4xx.c planetcore.c
@@ -128,13 +135,13 @@ obj-plat: $(libfdt)
 quiet_cmd_copy_kern_src = COPY$@
   cmd_copy_kern_src = sed -f 
$(srctree)/arch/powerpc/boot/fixup-headers.sed $< > $@
 
-$(addprefix $(obj)/,$(zlib)): $(obj)/%: $(srctree)/lib/zlib_inflate/%
+$(addprefix $(obj)/,$(zlib-y)): $(obj)/%: $(srctree)/lib/zlib_inflate/%
$(call cmd,copy_kern_src)
 
-$(addprefix $(obj)/,$(zlibheaders)): $(obj)/%: $(srctree)/lib/zlib_inflate/%
+$(addprefix $(obj)/,$(zlibheaders-y)): $(obj)/%: $(srctree)/lib/zlib_inflate/%
$(call cmd,copy_kern_src)
 
-$(addprefix $(obj)/,$(zliblinuxheader)): $(obj)/%: $(srctree)/include/linux/%
+$(addprefix $(obj)/,$(zliblinuxheader-y)): $(obj)/%: $(srctree)/include/linux/%
$(call cmd,copy_kern_src)
 
 quiet_cmd_copy_libfdt = COPY$@
@@ -153,7 +160,7 @@ $(obj)/zImage.lds: $(obj)/%: $(srctree)/$(src)/%.S
 $(obj)/zImage.coff.lds $(obj)/zImage.ps3.lds : $(obj)/%: $(srctree)/$(src)/%.S
@cp $< $@
 
-clean-files := $(zlib) $(zlibheader) $(zliblinuxheader) \
+clean-files := $(zlib-y) $(zlibheaders-y) $(zliblinuxheader-y) \
$(libfdt) $(libfdtheader) \
empty.c zImage.coff.lds zImage.ps3.lds zImage.lds
 
-- 
2.5.5



XZ compressed zImage support

2016-08-30 Thread Oliver O'Halloran
This series adds support for using XZ compression in addition to gzip in the
kernel boot wrapper. Currently this is only enabled for 64bit Book3S processors
since it seems that some embedded platforms rely on uBoot (or similar) to
decompress the image rather than having the kernel decompress itself. Enabling
it for other platforms should be fairly straight forward though.

Supporting other compression algorithms (like ARM and x86 do) is possible, but
painful. Each algorithm includes some kernel headers even when the #defines
that are supposed to make them usable in a pre-boot environment are set.
Including kernel headers is an issue because on powerpc  the boot wrapper is
compiled with a different toolchain and possibly for a different target for
backwards compatibility reasons*. This makes it difficult to include kernel
headers since the include paths, etc are not setup for BOOTCC.

This can be worked around by rewriting parts of the each decompressor with sed
scripts, but the rewriting requried is specific to each decompressor.

-oliver

*powermacs have 32bit firmware that cannot directly load a 64bit kernel. A 64
bit big endian kernel has a 32bit wrapper to work around this. On 64bit little
endian we don't have this legacy problem so the wrapper is also 64bit little
endian, but the toolchain issues are still there.



[PATCH 1/6] powerpc/boot: add sed script

2016-08-30 Thread Oliver O'Halloran
The powerpc boot wrapper is compiled with a separate "bootcc" toolchain
rather than the toolchain used for the rest of the kernel. The main
problem with this is that the wrapper does not have access to the kernel
headers (without a lot of gross hacks). To get around this the required
headers are copied into the build directory via several sed scripts
which rewrite problematic includes. This patch moves these fixups out of
the makefile into a separate .sed script file to clean up makefile
slightly.

Signed-off-by: Oliver O'Halloran 
---
 arch/powerpc/boot/Makefile  | 18 ++
 arch/powerpc/boot/fixup-headers.sed | 12 
 2 files changed, 18 insertions(+), 12 deletions(-)
 create mode 100644 arch/powerpc/boot/fixup-headers.sed

diff --git a/arch/powerpc/boot/Makefile b/arch/powerpc/boot/Makefile
index 1a2a6e8dc40d..f98e42ee2534 100644
--- a/arch/powerpc/boot/Makefile
+++ b/arch/powerpc/boot/Makefile
@@ -125,23 +125,17 @@ obj-wlib := $(addsuffix .o, $(basename $(addprefix 
$(obj)/, $(src-wlib
 obj-plat := $(addsuffix .o, $(basename $(addprefix $(obj)/, $(src-plat
 obj-plat: $(libfdt)
 
-quiet_cmd_copy_zlib = COPY$@
-  cmd_copy_zlib = sed "s@__used@@;s@]*\).*@\"\1\"@" $< > $@
-
-quiet_cmd_copy_zlibheader = COPY$@
-  cmd_copy_zlibheader = sed "s@]*\).*@\"\1\"@" $< > $@
-# stddef.h for NULL
-quiet_cmd_copy_zliblinuxheader = COPY$@
-  cmd_copy_zliblinuxheader = sed 
"s@@\"string.h\"@;s@@@;s@]*\).*@\"\1\"@"
 $< > $@
+quiet_cmd_copy_kern_src = COPY$@
+  cmd_copy_kern_src = sed -f 
$(srctree)/arch/powerpc/boot/fixup-headers.sed $< > $@
 
 $(addprefix $(obj)/,$(zlib)): $(obj)/%: $(srctree)/lib/zlib_inflate/%
-   $(call cmd,copy_zlib)
+   $(call cmd,copy_kern_src)
 
-$(addprefix $(obj)/,$(zlibheader)): $(obj)/%: $(srctree)/lib/zlib_inflate/%
-   $(call cmd,copy_zlibheader)
+$(addprefix $(obj)/,$(zlibheaders)): $(obj)/%: $(srctree)/lib/zlib_inflate/%
+   $(call cmd,copy_kern_src)
 
 $(addprefix $(obj)/,$(zliblinuxheader)): $(obj)/%: $(srctree)/include/linux/%
-   $(call cmd,copy_zliblinuxheader)
+   $(call cmd,copy_kern_src)
 
 quiet_cmd_copy_libfdt = COPY$@
   cmd_copy_libfdt = cp $< $@
diff --git a/arch/powerpc/boot/fixup-headers.sed 
b/arch/powerpc/boot/fixup-headers.sed
new file mode 100644
index ..96362428eb37
--- /dev/null
+++ b/arch/powerpc/boot/fixup-headers.sed
@@ -0,0 +1,12 @@
+# Copyright 2016 IBM Corporation.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of the GNU General Public License version 2 or later as
+# published by the Free Software Foundation.
+
+s@#include @@;
+s@\"zlib_inflate/\([^\"]*\).*@"\1"@;
+s@@@;
+
+s@__used@@;
+s@]*\).*@"\1"@;
-- 
2.5.5



Re: hwrng: pasemi_rng.c: Migrate to managed API

2016-08-30 Thread PrasannaKumar Muralidharan
Hi Darren,

>> I wanted to use devm_ioremap_resource but could not find DT entry
>> required for this driver in any of the .dts files. So did not change
>> that. I could not find any dts/dtsi for this platform. So I assume
>> that the dtb is not present in the kernel, dtb is supplied by the
>> bootloader. I may be wrong in this. Can anyone confirm this?
>
> On mine (Amigaone X1000) that is correct, we boot linux with a vmlinux file,
> and the bootloader (CFE) passes a fixed dtb. I think it is possible to dump
> the tree from inside CFE, if it would help I can invetigate?

I don't know if it is possible to get dts from dtb even if you manage
to extract devicetree blob from your system.

Labbe, Do you know anyway to get dts from dtb? Is this step really
required to remove 0x100 value for this patch given that the value was
present here for years? If extracting dtb and converting dtb to dts is
easy and not time consuming, I am in favour of finding a way to remove
hard coded value.