Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hello, On 02/12/2015 05:07 PM, Tom Rini wrote: On Thu, Feb 05, 2015 at 10:51:00AM +0100, Lukasz Majewski wrote: Hi Simon, Hi Lukasz, On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com wrote: Dear All, And the next is interesting. odroid_defconfig has more than 80MB for malloc (we need about 64mb for the DFU now, to be able write 32MB file). This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is set to 0 in function mem_malloc_init(). So for this config that function sets more than 80MB to zero. This is not good, because we shouldn't expect zeroed memory returned by malloc pointer. This is a job for calloc. Especially if some command expects zeroed memory after malloc, probably after few next calls - it can crash... I think that the above excerpt is _really_ important and should be discussed. I've cut it from the original post, so it won't get lost between the lines. It seems really strange, that malloc() area is cleared after relocation. Which means that all first malloc'ed buffers get implicitly zeroed. Przemek is right here that this zeroing shouldn't be performed. I'm also concerned about potential bugs, which show up (or even worse - won't show up soon) after this change. Hence, I would like to ask directly the community about the possible solutions. Please look at: ./common/dlmalloc.c mem_alloc_init() function [1]. On the one hand removing memset() at [1] speeds up booting time and makes malloc() doing what is is supposed to do. On the other hand there might be in space some boards, which rely on this memset and without it some wired things may start to happening. I think removing it is a good idea. It was one optimisation that I did for boot time in the Chromium tree. If you do it now (and Tom agrees) then there is plenty of time to test for this release cycle. You could go further and add a test CONFIG which fills it with some other non-zero value. Tom, is such approach acceptable for you? I was thinking at first we should default to a poisoned value. But given what we're seeing with generic board updates (lots of boards aren't even build-tested at every release which isn't really a surprise), I think the funky boards which may exist are probably not going to be seen for a while anyhow so we'd have to default to a poison for a long while. So yes, lets just add a CONFIG option (and Kconfig line) to optionally do it and default to no memset. ... but I just audited everyone doing malloc ( and found a few things to fixup so we really do want to take a poke around. The present malloc implementation, which includes the memset at init, could be little confusing. Because of this memset. the few first calls of malloc will always return a pointer to the zeroed memory region. Of course, the only calloc do the right job and set with zeros the memory region, which was used by any previous malloc call. So, maybe some code expects zeroed memory also when using malloc. In this case, I would like to skip this memset as an option. This also requires skip some checking in calloc code. I will send the patch with such option for Kconfig. Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hello Simon, On 02/02/2015 07:15 PM, Simon Glass wrote: Hi Lukasz, On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com wrote: Dear All, And the next is interesting. odroid_defconfig has more than 80MB for malloc (we need about 64mb for the DFU now, to be able write 32MB file). This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is set to 0 in function mem_malloc_init(). So for this config that function sets more than 80MB to zero. This is not good, because we shouldn't expect zeroed memory returned by malloc pointer. This is a job for calloc. Especially if some command expects zeroed memory after malloc, probably after few next calls - it can crash... I think that the above excerpt is _really_ important and should be discussed. I've cut it from the original post, so it won't get lost between the lines. It seems really strange, that malloc() area is cleared after relocation. Which means that all first malloc'ed buffers get implicitly zeroed. Przemek is right here that this zeroing shouldn't be performed. I'm also concerned about potential bugs, which show up (or even worse - won't show up soon) after this change. Hence, I would like to ask directly the community about the possible solutions. Please look at: ./common/dlmalloc.c mem_alloc_init() function [1]. On the one hand removing memset() at [1] speeds up booting time and makes malloc() doing what is is supposed to do. On the other hand there might be in space some boards, which rely on this memset and without it some wired things may start to happening. I think removing it is a good idea. It was one optimisation that I did for boot time in the Chromium tree. If you do it now (and Tom agrees) then there is plenty of time to test for this release cycle. You could go further and add a test CONFIG which fills it with some other non-zero value. Regards, Simon Filling the malloc area with some pattern was a good idea to find out, why my trats2 had some issue after skip the memset with zeros in malloc init. And actually the issue was not in malloc call, but it was in calloc. The present implementation assumes that memory reserved for malloc is zeroed at init. And the calloc do the check, how much of the allocated memory is fresh(doesn't require zeroing). After skip this fresh memory check, the calloc works fine. Anyway, I think that this should be optional and tested on every config, before enable. I would like to test something and will send the updated patch set on Monday. Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
On Fri, Feb 13, 2015 at 04:48:15PM +0100, Przemyslaw Marczak wrote: Hello, On 02/12/2015 05:07 PM, Tom Rini wrote: On Thu, Feb 05, 2015 at 10:51:00AM +0100, Lukasz Majewski wrote: Hi Simon, Hi Lukasz, On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com wrote: Dear All, And the next is interesting. odroid_defconfig has more than 80MB for malloc (we need about 64mb for the DFU now, to be able write 32MB file). This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is set to 0 in function mem_malloc_init(). So for this config that function sets more than 80MB to zero. This is not good, because we shouldn't expect zeroed memory returned by malloc pointer. This is a job for calloc. Especially if some command expects zeroed memory after malloc, probably after few next calls - it can crash... I think that the above excerpt is _really_ important and should be discussed. I've cut it from the original post, so it won't get lost between the lines. It seems really strange, that malloc() area is cleared after relocation. Which means that all first malloc'ed buffers get implicitly zeroed. Przemek is right here that this zeroing shouldn't be performed. I'm also concerned about potential bugs, which show up (or even worse - won't show up soon) after this change. Hence, I would like to ask directly the community about the possible solutions. Please look at: ./common/dlmalloc.c mem_alloc_init() function [1]. On the one hand removing memset() at [1] speeds up booting time and makes malloc() doing what is is supposed to do. On the other hand there might be in space some boards, which rely on this memset and without it some wired things may start to happening. I think removing it is a good idea. It was one optimisation that I did for boot time in the Chromium tree. If you do it now (and Tom agrees) then there is plenty of time to test for this release cycle. You could go further and add a test CONFIG which fills it with some other non-zero value. Tom, is such approach acceptable for you? I was thinking at first we should default to a poisoned value. But given what we're seeing with generic board updates (lots of boards aren't even build-tested at every release which isn't really a surprise), I think the funky boards which may exist are probably not going to be seen for a while anyhow so we'd have to default to a poison for a long while. So yes, lets just add a CONFIG option (and Kconfig line) to optionally do it and default to no memset. ... but I just audited everyone doing malloc ( and found a few things to fixup so we really do want to take a poke around. The present malloc implementation, which includes the memset at init, could be little confusing. Because of this memset. the few first calls of malloc will always return a pointer to the zeroed memory region. Of course, the only calloc do the right job and set with zeros the memory region, which was used by any previous malloc call. So, maybe some code expects zeroed memory also when using malloc. In this case, I would like to skip this memset as an option. This also requires skip some checking in calloc code. I will send the patch with such option for Kconfig. There might be a few things which are depending on this odd behviour but they shouldn't. Most things are zeroing out twice since it's normal to malloc+memset if you care about contents being zero. The only real hang-up I see is that we're half way to release, more or less. Maybe we do this early post v2015.04 release to give more time to catch the odd cases? And spend some time now auditing. -- Tom signature.asc Description: Digital signature ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
On Thu, Feb 05, 2015 at 10:51:00AM +0100, Lukasz Majewski wrote: Hi Simon, Hi Lukasz, On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com wrote: Dear All, And the next is interesting. odroid_defconfig has more than 80MB for malloc (we need about 64mb for the DFU now, to be able write 32MB file). This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is set to 0 in function mem_malloc_init(). So for this config that function sets more than 80MB to zero. This is not good, because we shouldn't expect zeroed memory returned by malloc pointer. This is a job for calloc. Especially if some command expects zeroed memory after malloc, probably after few next calls - it can crash... I think that the above excerpt is _really_ important and should be discussed. I've cut it from the original post, so it won't get lost between the lines. It seems really strange, that malloc() area is cleared after relocation. Which means that all first malloc'ed buffers get implicitly zeroed. Przemek is right here that this zeroing shouldn't be performed. I'm also concerned about potential bugs, which show up (or even worse - won't show up soon) after this change. Hence, I would like to ask directly the community about the possible solutions. Please look at: ./common/dlmalloc.c mem_alloc_init() function [1]. On the one hand removing memset() at [1] speeds up booting time and makes malloc() doing what is is supposed to do. On the other hand there might be in space some boards, which rely on this memset and without it some wired things may start to happening. I think removing it is a good idea. It was one optimisation that I did for boot time in the Chromium tree. If you do it now (and Tom agrees) then there is plenty of time to test for this release cycle. You could go further and add a test CONFIG which fills it with some other non-zero value. Tom, is such approach acceptable for you? I was thinking at first we should default to a poisoned value. But given what we're seeing with generic board updates (lots of boards aren't even build-tested at every release which isn't really a surprise), I think the funky boards which may exist are probably not going to be seen for a while anyhow so we'd have to default to a poison for a long while. So yes, lets just add a CONFIG option (and Kconfig line) to optionally do it and default to no memset. ... but I just audited everyone doing malloc ( and found a few things to fixup so we really do want to take a poke around. -- Tom signature.asc Description: Digital signature ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hi Simon, Hi Lukasz, On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com wrote: Dear All, And the next is interesting. odroid_defconfig has more than 80MB for malloc (we need about 64mb for the DFU now, to be able write 32MB file). This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is set to 0 in function mem_malloc_init(). So for this config that function sets more than 80MB to zero. This is not good, because we shouldn't expect zeroed memory returned by malloc pointer. This is a job for calloc. Especially if some command expects zeroed memory after malloc, probably after few next calls - it can crash... I think that the above excerpt is _really_ important and should be discussed. I've cut it from the original post, so it won't get lost between the lines. It seems really strange, that malloc() area is cleared after relocation. Which means that all first malloc'ed buffers get implicitly zeroed. Przemek is right here that this zeroing shouldn't be performed. I'm also concerned about potential bugs, which show up (or even worse - won't show up soon) after this change. Hence, I would like to ask directly the community about the possible solutions. Please look at: ./common/dlmalloc.c mem_alloc_init() function [1]. On the one hand removing memset() at [1] speeds up booting time and makes malloc() doing what is is supposed to do. On the other hand there might be in space some boards, which rely on this memset and without it some wired things may start to happening. I think removing it is a good idea. It was one optimisation that I did for boot time in the Chromium tree. If you do it now (and Tom agrees) then there is plenty of time to test for this release cycle. You could go further and add a test CONFIG which fills it with some other non-zero value. Tom, is such approach acceptable for you? Regards, Simon -- Best regards, Lukasz Majewski Samsung RD Institute Poland (SRPOL) | Linux Platform Group ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hi Lukasz, On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com wrote: Dear All, And the next is interesting. odroid_defconfig has more than 80MB for malloc (we need about 64mb for the DFU now, to be able write 32MB file). This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is set to 0 in function mem_malloc_init(). So for this config that function sets more than 80MB to zero. This is not good, because we shouldn't expect zeroed memory returned by malloc pointer. This is a job for calloc. Especially if some command expects zeroed memory after malloc, probably after few next calls - it can crash... I think that the above excerpt is _really_ important and should be discussed. I've cut it from the original post, so it won't get lost between the lines. It seems really strange, that malloc() area is cleared after relocation. Which means that all first malloc'ed buffers get implicitly zeroed. Przemek is right here that this zeroing shouldn't be performed. I'm also concerned about potential bugs, which show up (or even worse - won't show up soon) after this change. Hence, I would like to ask directly the community about the possible solutions. Please look at: ./common/dlmalloc.c mem_alloc_init() function [1]. On the one hand removing memset() at [1] speeds up booting time and makes malloc() doing what is is supposed to do. On the other hand there might be in space some boards, which rely on this memset and without it some wired things may start to happening. I think removing it is a good idea. It was one optimisation that I did for boot time in the Chromium tree. If you do it now (and Tom agrees) then there is plenty of time to test for this release cycle. You could go further and add a test CONFIG which fills it with some other non-zero value. Regards, Simon ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Dear All, And the next is interesting. odroid_defconfig has more than 80MB for malloc (we need about 64mb for the DFU now, to be able write 32MB file). This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is set to 0 in function mem_malloc_init(). So for this config that function sets more than 80MB to zero. This is not good, because we shouldn't expect zeroed memory returned by malloc pointer. This is a job for calloc. Especially if some command expects zeroed memory after malloc, probably after few next calls - it can crash... I think that the above excerpt is _really_ important and should be discussed. I've cut it from the original post, so it won't get lost between the lines. It seems really strange, that malloc() area is cleared after relocation. Which means that all first malloc'ed buffers get implicitly zeroed. Przemek is right here that this zeroing shouldn't be performed. I'm also concerned about potential bugs, which show up (or even worse - won't show up soon) after this change. Hence, I would like to ask directly the community about the possible solutions. Please look at: ./common/dlmalloc.c mem_alloc_init() function [1]. On the one hand removing memset() at [1] speeds up booting time and makes malloc() doing what is is supposed to do. On the other hand there might be in space some boards, which rely on this memset and without it some wired things may start to happening. -- Best regards, Lukasz Majewski Samsung RD Institute Poland (SRPOL) | Linux Platform Group ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hello, On 01/28/2015 01:55 PM, Przemyslaw Marczak wrote: This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled(ARM). For tested Trats2 device, this was done in three steps. First was enable the arch memcpy and memset. The second step was enable memset for .bss clear. The third step for reduce this operation is to keep .bss section small as possible. The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it's done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers. For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated. Przemyslaw Marczak (3): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation arch/arm/lib/crt0.S | 10 +- drivers/dfu/dfu_mmc.c | 25 ++--- include/configs/exynos-common.h | 3 +++ 3 files changed, 34 insertions(+), 4 deletions(-) So I made some additional tests with the oscilloscope. Quick about the measurement: The board is Odroid X2; Exynos4412(this one have gpio header). Time is measured between change the state of one GPIo pin. GPIO HI - set the gpio register in reset label in: arch/arm/cpu/armv7/start.S GPIO LO - set gpio register with bootcmd with setting register by mw.l ... ${bootdelay}=0 odroid_defconfig = .bss ~32.3MB I tested few changes: - 850ms - no changes: - 840ms - + CONFIG_USE_ARCH_MEMCPY/MEMSET - 540ms - .bss memset (patch 2) - 210ms - dynamic allocation dfu file buf (patch 3) And the next is interesting. odroid_defconfig has more than 80MB for malloc (we need about 64mb for the DFU now, to be able write 32MB file). This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is set to 0 in function mem_malloc_init(). So for this config that function sets more than 80MB to zero. This is not good, because we shouldn't expect zeroed memory returned by malloc pointer. This is a job for calloc. Especially if some command expects zeroed memory after malloc, probably after few next calls - it can crash... For the testing purposes I changed the memset area in mem_malloc_init(). The CONFIG_SYS_MALLOC_LEN is unchanged, so the dfu can still alloc 2x32MB... The results: - 158ms - malloc memset len: 40MB - 109ms - malloc memset len: 1MB And a quick test for Trats2 with trace clock cycle counter: - 333ms - malloc memset len: 1MB (for the standard config it was more than 1520ms) The malloc memset can't be removed now, because it requires check/change to calloc a lot of calls, but the board can boot if I set this to 256K. So the final improvement which could be achieved for the odroid config is 850ms - 109 ms. This is about 8 times faster. And the tested boards difference: - Trats2 - 800MHz - Odroid X2 - 1000MHz - different BL1/BL2 Now I'm not so sure about the measurement reliability using the trace. The Trats2 has no gpios header, and now I don't have time for the combinations. So enable the DFU in the board config will increase the boot time. But the real reason is that the malloc memory area is set to zero on boot. I think, that we should follow the malloc/calloc/realloc differences like in this description: http://man7.org/linux/man-pages/man3/malloc.3.html Now I go for some holidays, and probably I will be unreachable until 9-th February. Sorry for troubles. Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hello, On 01/28/2015 03:34 PM, Pantelis Antoniou wrote: Hi Przemyslaw, On Jan 28, 2015, at 16:30 , Przemyslaw Marczak p.marc...@samsung.com wrote: Hello, On 01/28/2015 03:18 PM, Pantelis Antoniou wrote: Hi Przemyslaw, On Jan 28, 2015, at 16:10 , Przemyslaw Marczak p.marc...@samsung.com wrote: Hello Stefan, On 01/28/2015 02:12 PM, Stefan Roese wrote: Hi Przemyslaw, On 28.01.2015 13:55, Przemyslaw Marczak wrote: This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled(ARM). For tested Trats2 device, this was done in three steps. First was enable the arch memcpy and memset. The second step was enable memset for .bss clear. The third step for reduce this operation is to keep .bss section small as possible. The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it's done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers. For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated. Przemyslaw Marczak (3): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation arch/arm/lib/crt0.S | 10 +- drivers/dfu/dfu_mmc.c | 25 ++--- include/configs/exynos-common.h | 3 +++ 3 files changed, 34 insertions(+), 4 deletions(-) Looking at the commit messages of this patchset I can conclude that your overall boot time reduction is: from ~1527ms to ~464ms This is amazing! Congrats. :) Thank you. I was also amazed. The time results are taken with from the clock cycle counter, I think it's reliable. Some day I would like to check it using the oscilloscope. We really should in general make more use of the optimized functions and take care that the buffers (e.g. the DFU buffer in this case) are used in a sane way. Thanks, Stefan Yes you're right, I thought that Exynos config has enabled arch memcpy/set lib, before I checked this… Those numbers are indeed incredible; I suppose the caches are disabled? The caches are enabled after the relocation, in one of board_init_r calls. How big is this .bss section? We’re talking about something that takes 1.5secs to clear a few MBs of memory? This is horrible. Even at the optimized case .5secs is too much. The .bss section was about 32.3 MB, where the 32MB were required for dfu file buf. With the third patch, the .bss section is about 300k. I'm not saying, that this is fast booting. But the achieved improvement is really good for this config Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com Regards — Pantelis Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com Regards — Pantelis Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hi Przemyslaw, On 28.01.2015 13:55, Przemyslaw Marczak wrote: This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled(ARM). For tested Trats2 device, this was done in three steps. First was enable the arch memcpy and memset. The second step was enable memset for .bss clear. The third step for reduce this operation is to keep .bss section small as possible. The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it's done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers. For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated. Przemyslaw Marczak (3): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation arch/arm/lib/crt0.S | 10 +- drivers/dfu/dfu_mmc.c | 25 ++--- include/configs/exynos-common.h | 3 +++ 3 files changed, 34 insertions(+), 4 deletions(-) Looking at the commit messages of this patchset I can conclude that your overall boot time reduction is: from ~1527ms to ~464ms This is amazing! Congrats. :) We really should in general make more use of the optimized functions and take care that the buffers (e.g. the DFU buffer in this case) are used in a sane way. Thanks, Stefan ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hi Przemyslaw, On Jan 28, 2015, at 16:10 , Przemyslaw Marczak p.marc...@samsung.com wrote: Hello Stefan, On 01/28/2015 02:12 PM, Stefan Roese wrote: Hi Przemyslaw, On 28.01.2015 13:55, Przemyslaw Marczak wrote: This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled(ARM). For tested Trats2 device, this was done in three steps. First was enable the arch memcpy and memset. The second step was enable memset for .bss clear. The third step for reduce this operation is to keep .bss section small as possible. The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it's done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers. For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated. Przemyslaw Marczak (3): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation arch/arm/lib/crt0.S | 10 +- drivers/dfu/dfu_mmc.c | 25 ++--- include/configs/exynos-common.h | 3 +++ 3 files changed, 34 insertions(+), 4 deletions(-) Looking at the commit messages of this patchset I can conclude that your overall boot time reduction is: from ~1527ms to ~464ms This is amazing! Congrats. :) Thank you. I was also amazed. The time results are taken with from the clock cycle counter, I think it's reliable. Some day I would like to check it using the oscilloscope. We really should in general make more use of the optimized functions and take care that the buffers (e.g. the DFU buffer in this case) are used in a sane way. Thanks, Stefan Yes you're right, I thought that Exynos config has enabled arch memcpy/set lib, before I checked this… Those numbers are indeed incredible; I suppose the caches are disabled? Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com Regards — Pantelis ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hi Przemyslaw, On Jan 28, 2015, at 16:30 , Przemyslaw Marczak p.marc...@samsung.com wrote: Hello, On 01/28/2015 03:18 PM, Pantelis Antoniou wrote: Hi Przemyslaw, On Jan 28, 2015, at 16:10 , Przemyslaw Marczak p.marc...@samsung.com wrote: Hello Stefan, On 01/28/2015 02:12 PM, Stefan Roese wrote: Hi Przemyslaw, On 28.01.2015 13:55, Przemyslaw Marczak wrote: This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled(ARM). For tested Trats2 device, this was done in three steps. First was enable the arch memcpy and memset. The second step was enable memset for .bss clear. The third step for reduce this operation is to keep .bss section small as possible. The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it's done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers. For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated. Przemyslaw Marczak (3): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation arch/arm/lib/crt0.S | 10 +- drivers/dfu/dfu_mmc.c | 25 ++--- include/configs/exynos-common.h | 3 +++ 3 files changed, 34 insertions(+), 4 deletions(-) Looking at the commit messages of this patchset I can conclude that your overall boot time reduction is: from ~1527ms to ~464ms This is amazing! Congrats. :) Thank you. I was also amazed. The time results are taken with from the clock cycle counter, I think it's reliable. Some day I would like to check it using the oscilloscope. We really should in general make more use of the optimized functions and take care that the buffers (e.g. the DFU buffer in this case) are used in a sane way. Thanks, Stefan Yes you're right, I thought that Exynos config has enabled arch memcpy/set lib, before I checked this… Those numbers are indeed incredible; I suppose the caches are disabled? The caches are enabled after the relocation, in one of board_init_r calls. How big is this .bss section? We’re talking about something that takes 1.5secs to clear a few MBs of memory? This is horrible. Even at the optimized case .5secs is too much. Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com Regards — Pantelis Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com Regards — Pantelis ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hello Stefan, On 01/28/2015 02:12 PM, Stefan Roese wrote: Hi Przemyslaw, On 28.01.2015 13:55, Przemyslaw Marczak wrote: This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled(ARM). For tested Trats2 device, this was done in three steps. First was enable the arch memcpy and memset. The second step was enable memset for .bss clear. The third step for reduce this operation is to keep .bss section small as possible. The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it's done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers. For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated. Przemyslaw Marczak (3): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation arch/arm/lib/crt0.S | 10 +- drivers/dfu/dfu_mmc.c | 25 ++--- include/configs/exynos-common.h | 3 +++ 3 files changed, 34 insertions(+), 4 deletions(-) Looking at the commit messages of this patchset I can conclude that your overall boot time reduction is: from ~1527ms to ~464ms This is amazing! Congrats. :) Thank you. I was also amazed. The time results are taken with from the clock cycle counter, I think it's reliable. Some day I would like to check it using the oscilloscope. We really should in general make more use of the optimized functions and take care that the buffers (e.g. the DFU buffer in this case) are used in a sane way. Thanks, Stefan Yes you're right, I thought that Exynos config has enabled arch memcpy/set lib, before I checked this... Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
Hello, On 01/28/2015 03:18 PM, Pantelis Antoniou wrote: Hi Przemyslaw, On Jan 28, 2015, at 16:10 , Przemyslaw Marczak p.marc...@samsung.com wrote: Hello Stefan, On 01/28/2015 02:12 PM, Stefan Roese wrote: Hi Przemyslaw, On 28.01.2015 13:55, Przemyslaw Marczak wrote: This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled(ARM). For tested Trats2 device, this was done in three steps. First was enable the arch memcpy and memset. The second step was enable memset for .bss clear. The third step for reduce this operation is to keep .bss section small as possible. The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it's done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers. For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated. Przemyslaw Marczak (3): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation arch/arm/lib/crt0.S | 10 +- drivers/dfu/dfu_mmc.c | 25 ++--- include/configs/exynos-common.h | 3 +++ 3 files changed, 34 insertions(+), 4 deletions(-) Looking at the commit messages of this patchset I can conclude that your overall boot time reduction is: from ~1527ms to ~464ms This is amazing! Congrats. :) Thank you. I was also amazed. The time results are taken with from the clock cycle counter, I think it's reliable. Some day I would like to check it using the oscilloscope. We really should in general make more use of the optimized functions and take care that the buffers (e.g. the DFU buffer in this case) are used in a sane way. Thanks, Stefan Yes you're right, I thought that Exynos config has enabled arch memcpy/set lib, before I checked this… Those numbers are indeed incredible; I suppose the caches are disabled? The caches are enabled after the relocation, in one of board_init_r calls. Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com Regards — Pantelis Best regards, -- Przemyslaw Marczak Samsung RD Institute Poland Samsung Electronics p.marc...@samsung.com ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot
[U-Boot] [PATCH 0/3] arm: reduce .bss section clear time
This patchset reduces the boot time for ARM architecture, Exynos boards, and boards with DFU enabled(ARM). For tested Trats2 device, this was done in three steps. First was enable the arch memcpy and memset. The second step was enable memset for .bss clear. The third step for reduce this operation is to keep .bss section small as possible. The .bss section will grow if we have a lot of static variables. This section is cleared before jump to the relocated U-Boot, and it's done word by word. To reduce the time for this step, we can enable arch memset, which uses multiple ARM registers. For configs with DFU enabled, we can find the dfu buffer in this section, which has at least 8MB (32MB for trats2). This is a lot of useless data, which is not required for standard boot. So this buffer should be dynamic allocated. Przemyslaw Marczak (3): exynos: config: enable arch memcpy and arch memset arm: relocation: clear .bss section with arch memset if defined dfu: mmc: file buffer: remove static allocation arch/arm/lib/crt0.S | 10 +- drivers/dfu/dfu_mmc.c | 25 ++--- include/configs/exynos-common.h | 3 +++ 3 files changed, 34 insertions(+), 4 deletions(-) -- 1.9.1 ___ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot