Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-02-13 Thread Przemyslaw Marczak

Hello,

On 02/12/2015 05:07 PM, Tom Rini wrote:

On Thu, Feb 05, 2015 at 10:51:00AM +0100, Lukasz Majewski wrote:

Hi Simon,


Hi Lukasz,

On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com
wrote:

Dear All,


And the next is interesting.
   odroid_defconfig has more than 80MB for malloc (we need about
64mb for the DFU now, to be able write 32MB file).

This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc
is set to 0 in function mem_malloc_init(). So for this config that
function sets more than 80MB to zero.

This is not good, because we shouldn't expect zeroed memory
returned by malloc pointer. This is a job for calloc.

Especially if some command expects zeroed memory after malloc,
probably after few next calls - it can crash...


I think that the above excerpt is _really_ important and should be
discussed.

I've cut it from the original post, so it won't get lost between
the lines.

It seems really strange, that malloc() area is cleared after
relocation. Which means that all first malloc'ed buffers get
implicitly zeroed.

Przemek is right here that this zeroing shouldn't be performed.

I'm also concerned about potential bugs, which show up (or even
worse - won't show up soon) after this change.

Hence, I would like to ask directly the community about the possible
solutions.

Please look at: ./common/dlmalloc.c mem_alloc_init() function [1].

On the one hand removing memset() at [1] speeds up booting time and
makes malloc() doing what is is supposed to do.

On the other hand there might be in space some boards, which rely on
this memset and without it some wired things may start to happening.


I think removing it is a good idea. It was one optimisation that I did
for boot time in the Chromium tree. If you do it now (and Tom agrees)
then there is plenty of time to test for this release cycle. You could
go further and add a test CONFIG which fills it with some other
non-zero value.


Tom, is such approach acceptable for you?


I was thinking at first we should default to a poisoned value.  But
given what we're seeing with generic board updates (lots of boards
aren't even build-tested at every release which isn't really a
surprise), I think the funky boards which may exist are probably not
going to be seen for a while anyhow so we'd have to default to a poison
for a long while.  So yes, lets just add a CONFIG option (and Kconfig
line) to optionally do it and default to no memset.

... but I just audited everyone doing malloc ( and found a few things
to fixup so we really do want to take a poke around.



The present malloc implementation, which includes the memset at init, 
could be little confusing. Because of this memset. the few first calls 
of malloc will always return a pointer to the zeroed memory region. Of 
course, the only calloc do the right job and set with zeros the memory 
region, which was used by any previous malloc call.


So, maybe some code expects zeroed memory also when using malloc.

In this case, I would like to skip this memset as an option.

This also requires skip some checking in calloc code.

I will send the patch with such option for Kconfig.

Best regards,
--
Przemyslaw Marczak
Samsung RD Institute Poland
Samsung Electronics
p.marc...@samsung.com
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-02-13 Thread Przemyslaw Marczak

Hello Simon,

On 02/02/2015 07:15 PM, Simon Glass wrote:

Hi Lukasz,

On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com wrote:

Dear All,


And the next is interesting.
   odroid_defconfig has more than 80MB for malloc (we need about 64mb
for the DFU now, to be able write 32MB file).

This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is
set to 0 in function mem_malloc_init(). So for this config that
function sets more than 80MB to zero.

This is not good, because we shouldn't expect zeroed memory returned
by malloc pointer. This is a job for calloc.

Especially if some command expects zeroed memory after malloc,
probably after few next calls - it can crash...


I think that the above excerpt is _really_ important and should be
discussed.

I've cut it from the original post, so it won't get lost between the
lines.

It seems really strange, that malloc() area is cleared after
relocation. Which means that all first malloc'ed buffers get
implicitly zeroed.

Przemek is right here that this zeroing shouldn't be performed.

I'm also concerned about potential bugs, which show up (or even worse -
won't show up soon) after this change.

Hence, I would like to ask directly the community about the possible
solutions.

Please look at: ./common/dlmalloc.c mem_alloc_init() function [1].

On the one hand removing memset() at [1] speeds up booting time and
makes malloc() doing what is is supposed to do.

On the other hand there might be in space some boards, which rely on
this memset and without it some wired things may start to happening.


I think removing it is a good idea. It was one optimisation that I did
for boot time in the Chromium tree. If you do it now (and Tom agrees)
then there is plenty of time to test for this release cycle. You could
go further and add a test CONFIG which fills it with some other
non-zero value.

Regards,
Simon



Filling the malloc area with some pattern was a good idea to find out, 
why my trats2 had some issue after skip the memset with zeros in malloc 
init.


And actually the issue was not in malloc call, but it was in calloc.

The present implementation assumes that memory reserved for malloc
is zeroed at init. And the calloc do the check, how much of the 
allocated memory is fresh(doesn't require zeroing).


After skip this fresh memory check, the calloc works fine.

Anyway, I think that this should be optional and tested on every config, 
before enable.


I would like to test something and will send the updated patch set on 
Monday.


Best regards,
--
Przemyslaw Marczak
Samsung RD Institute Poland
Samsung Electronics
p.marc...@samsung.com
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-02-13 Thread Tom Rini
On Fri, Feb 13, 2015 at 04:48:15PM +0100, Przemyslaw Marczak wrote:
 Hello,
 
 On 02/12/2015 05:07 PM, Tom Rini wrote:
 On Thu, Feb 05, 2015 at 10:51:00AM +0100, Lukasz Majewski wrote:
 Hi Simon,
 
 Hi Lukasz,
 
 On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com
 wrote:
 Dear All,
 
 And the next is interesting.
odroid_defconfig has more than 80MB for malloc (we need about
 64mb for the DFU now, to be able write 32MB file).
 
 This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc
 is set to 0 in function mem_malloc_init(). So for this config that
 function sets more than 80MB to zero.
 
 This is not good, because we shouldn't expect zeroed memory
 returned by malloc pointer. This is a job for calloc.
 
 Especially if some command expects zeroed memory after malloc,
 probably after few next calls - it can crash...
 
 I think that the above excerpt is _really_ important and should be
 discussed.
 
 I've cut it from the original post, so it won't get lost between
 the lines.
 
 It seems really strange, that malloc() area is cleared after
 relocation. Which means that all first malloc'ed buffers get
 implicitly zeroed.
 
 Przemek is right here that this zeroing shouldn't be performed.
 
 I'm also concerned about potential bugs, which show up (or even
 worse - won't show up soon) after this change.
 
 Hence, I would like to ask directly the community about the possible
 solutions.
 
 Please look at: ./common/dlmalloc.c mem_alloc_init() function [1].
 
 On the one hand removing memset() at [1] speeds up booting time and
 makes malloc() doing what is is supposed to do.
 
 On the other hand there might be in space some boards, which rely on
 this memset and without it some wired things may start to happening.
 
 I think removing it is a good idea. It was one optimisation that I did
 for boot time in the Chromium tree. If you do it now (and Tom agrees)
 then there is plenty of time to test for this release cycle. You could
 go further and add a test CONFIG which fills it with some other
 non-zero value.
 
 Tom, is such approach acceptable for you?
 
 I was thinking at first we should default to a poisoned value.  But
 given what we're seeing with generic board updates (lots of boards
 aren't even build-tested at every release which isn't really a
 surprise), I think the funky boards which may exist are probably not
 going to be seen for a while anyhow so we'd have to default to a poison
 for a long while.  So yes, lets just add a CONFIG option (and Kconfig
 line) to optionally do it and default to no memset.
 
 ... but I just audited everyone doing malloc ( and found a few things
 to fixup so we really do want to take a poke around.
 
 
 The present malloc implementation, which includes the memset at
 init, could be little confusing. Because of this memset. the few
 first calls of malloc will always return a pointer to the zeroed
 memory region. Of course, the only calloc do the right job and set
 with zeros the memory region, which was used by any previous malloc
 call.
 
 So, maybe some code expects zeroed memory also when using malloc.
 
 In this case, I would like to skip this memset as an option.
 
 This also requires skip some checking in calloc code.
 
 I will send the patch with such option for Kconfig.

There might be a few things which are depending on this odd behviour but
they shouldn't.  Most things are zeroing out twice since it's normal to
malloc+memset if you care about contents being zero.  The only real
hang-up I see is that we're half way to release, more or less.  Maybe we
do this early post v2015.04 release to give more time to catch the odd
cases?  And spend some time now auditing.

-- 
Tom


signature.asc
Description: Digital signature
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-02-12 Thread Tom Rini
On Thu, Feb 05, 2015 at 10:51:00AM +0100, Lukasz Majewski wrote:
 Hi Simon,
 
  Hi Lukasz,
  
  On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com
  wrote:
   Dear All,
  
   And the next is interesting.
 odroid_defconfig has more than 80MB for malloc (we need about
   64mb for the DFU now, to be able write 32MB file).
  
   This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc
   is set to 0 in function mem_malloc_init(). So for this config that
   function sets more than 80MB to zero.
  
   This is not good, because we shouldn't expect zeroed memory
   returned by malloc pointer. This is a job for calloc.
  
   Especially if some command expects zeroed memory after malloc,
   probably after few next calls - it can crash...
  
   I think that the above excerpt is _really_ important and should be
   discussed.
  
   I've cut it from the original post, so it won't get lost between
   the lines.
  
   It seems really strange, that malloc() area is cleared after
   relocation. Which means that all first malloc'ed buffers get
   implicitly zeroed.
  
   Przemek is right here that this zeroing shouldn't be performed.
  
   I'm also concerned about potential bugs, which show up (or even
   worse - won't show up soon) after this change.
  
   Hence, I would like to ask directly the community about the possible
   solutions.
  
   Please look at: ./common/dlmalloc.c mem_alloc_init() function [1].
  
   On the one hand removing memset() at [1] speeds up booting time and
   makes malloc() doing what is is supposed to do.
  
   On the other hand there might be in space some boards, which rely on
   this memset and without it some wired things may start to happening.
  
  I think removing it is a good idea. It was one optimisation that I did
  for boot time in the Chromium tree. If you do it now (and Tom agrees)
  then there is plenty of time to test for this release cycle. You could
  go further and add a test CONFIG which fills it with some other
  non-zero value.
 
 Tom, is such approach acceptable for you?

I was thinking at first we should default to a poisoned value.  But
given what we're seeing with generic board updates (lots of boards
aren't even build-tested at every release which isn't really a
surprise), I think the funky boards which may exist are probably not
going to be seen for a while anyhow so we'd have to default to a poison
for a long while.  So yes, lets just add a CONFIG option (and Kconfig
line) to optionally do it and default to no memset.

... but I just audited everyone doing malloc ( and found a few things
to fixup so we really do want to take a poke around.

-- 
Tom


signature.asc
Description: Digital signature
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-02-05 Thread Lukasz Majewski
Hi Simon,

 Hi Lukasz,
 
 On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com
 wrote:
  Dear All,
 
  And the next is interesting.
odroid_defconfig has more than 80MB for malloc (we need about
  64mb for the DFU now, to be able write 32MB file).
 
  This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc
  is set to 0 in function mem_malloc_init(). So for this config that
  function sets more than 80MB to zero.
 
  This is not good, because we shouldn't expect zeroed memory
  returned by malloc pointer. This is a job for calloc.
 
  Especially if some command expects zeroed memory after malloc,
  probably after few next calls - it can crash...
 
  I think that the above excerpt is _really_ important and should be
  discussed.
 
  I've cut it from the original post, so it won't get lost between
  the lines.
 
  It seems really strange, that malloc() area is cleared after
  relocation. Which means that all first malloc'ed buffers get
  implicitly zeroed.
 
  Przemek is right here that this zeroing shouldn't be performed.
 
  I'm also concerned about potential bugs, which show up (or even
  worse - won't show up soon) after this change.
 
  Hence, I would like to ask directly the community about the possible
  solutions.
 
  Please look at: ./common/dlmalloc.c mem_alloc_init() function [1].
 
  On the one hand removing memset() at [1] speeds up booting time and
  makes malloc() doing what is is supposed to do.
 
  On the other hand there might be in space some boards, which rely on
  this memset and without it some wired things may start to happening.
 
 I think removing it is a good idea. It was one optimisation that I did
 for boot time in the Chromium tree. If you do it now (and Tom agrees)
 then there is plenty of time to test for this release cycle. You could
 go further and add a test CONFIG which fills it with some other
 non-zero value.

Tom, is such approach acceptable for you?

 
 Regards,
 Simon



-- 
Best regards,

Lukasz Majewski

Samsung RD Institute Poland (SRPOL) | Linux Platform Group
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-02-02 Thread Simon Glass
Hi Lukasz,

On 2 February 2015 at 01:46, Lukasz Majewski l.majew...@samsung.com wrote:
 Dear All,

 And the next is interesting.
   odroid_defconfig has more than 80MB for malloc (we need about 64mb
 for the DFU now, to be able write 32MB file).

 This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is
 set to 0 in function mem_malloc_init(). So for this config that
 function sets more than 80MB to zero.

 This is not good, because we shouldn't expect zeroed memory returned
 by malloc pointer. This is a job for calloc.

 Especially if some command expects zeroed memory after malloc,
 probably after few next calls - it can crash...

 I think that the above excerpt is _really_ important and should be
 discussed.

 I've cut it from the original post, so it won't get lost between the
 lines.

 It seems really strange, that malloc() area is cleared after
 relocation. Which means that all first malloc'ed buffers get
 implicitly zeroed.

 Przemek is right here that this zeroing shouldn't be performed.

 I'm also concerned about potential bugs, which show up (or even worse -
 won't show up soon) after this change.

 Hence, I would like to ask directly the community about the possible
 solutions.

 Please look at: ./common/dlmalloc.c mem_alloc_init() function [1].

 On the one hand removing memset() at [1] speeds up booting time and
 makes malloc() doing what is is supposed to do.

 On the other hand there might be in space some boards, which rely on
 this memset and without it some wired things may start to happening.

I think removing it is a good idea. It was one optimisation that I did
for boot time in the Chromium tree. If you do it now (and Tom agrees)
then there is plenty of time to test for this release cycle. You could
go further and add a test CONFIG which fills it with some other
non-zero value.

Regards,
Simon
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-02-02 Thread Lukasz Majewski
Dear All,

 And the next is interesting.
   odroid_defconfig has more than 80MB for malloc (we need about 64mb
 for the DFU now, to be able write 32MB file).
 
 This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is
 set to 0 in function mem_malloc_init(). So for this config that
 function sets more than 80MB to zero.
 
 This is not good, because we shouldn't expect zeroed memory returned
 by malloc pointer. This is a job for calloc.
 
 Especially if some command expects zeroed memory after malloc,
 probably after few next calls - it can crash...

I think that the above excerpt is _really_ important and should be
discussed.

I've cut it from the original post, so it won't get lost between the
lines.

It seems really strange, that malloc() area is cleared after
relocation. Which means that all first malloc'ed buffers get
implicitly zeroed.

Przemek is right here that this zeroing shouldn't be performed.

I'm also concerned about potential bugs, which show up (or even worse -
won't show up soon) after this change.

Hence, I would like to ask directly the community about the possible
solutions.

Please look at: ./common/dlmalloc.c mem_alloc_init() function [1].

On the one hand removing memset() at [1] speeds up booting time and
makes malloc() doing what is is supposed to do.

On the other hand there might be in space some boards, which rely on
this memset and without it some wired things may start to happening.


-- 
Best regards,

Lukasz Majewski

Samsung RD Institute Poland (SRPOL) | Linux Platform Group
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-01-29 Thread Przemyslaw Marczak

Hello,

On 01/28/2015 01:55 PM, Przemyslaw Marczak wrote:

This patchset reduces the boot time for ARM architecture,
Exynos boards, and boards with DFU enabled(ARM).

For tested Trats2 device, this was done in three steps.

First was enable the arch memcpy and memset.
The second step was enable memset for .bss clear.
The third step for reduce this operation is to keep .bss section
small as possible.

The .bss section will grow if we have a lot of static variables.
This section is cleared before jump to the relocated U-Boot,
and it's done word by word. To reduce the time for this step,
we can enable arch memset, which uses multiple ARM registers.

For configs with DFU enabled, we can find the dfu buffer in this section,
which has at least 8MB (32MB for trats2). This is a lot of useless data,
which is not required for standard boot. So this buffer should be dynamic
allocated.

Przemyslaw Marczak (3):
   exynos: config: enable arch memcpy and arch memset
   arm: relocation: clear .bss section with arch memset if defined
   dfu: mmc: file buffer: remove static allocation

  arch/arm/lib/crt0.S | 10 +-
  drivers/dfu/dfu_mmc.c   | 25 ++---
  include/configs/exynos-common.h |  3 +++
  3 files changed, 34 insertions(+), 4 deletions(-)



So I made some additional tests with the oscilloscope.

Quick about the measurement:
The board is Odroid X2; Exynos4412(this one have gpio header).

Time is measured between change the state of one GPIo pin.

GPIO HI - set the gpio register in reset label in: 
arch/arm/cpu/armv7/start.S
GPIO LO - set gpio register with bootcmd with setting register by 
mw.l ...


${bootdelay}=0

odroid_defconfig  = .bss ~32.3MB

I tested few changes:
- 850ms - no changes:
- 840ms - + CONFIG_USE_ARCH_MEMCPY/MEMSET
- 540ms - .bss memset (patch 2)
- 210ms - dynamic allocation dfu file buf (patch 3)

And the next is interesting.
 odroid_defconfig has more than 80MB for malloc (we need about 64mb for 
the DFU now, to be able write 32MB file).


This is the CONFIG_SYS_MALLOC_LEN. And the memory area for malloc is set 
to 0 in function mem_malloc_init(). So for this config that function

sets more than 80MB to zero.

This is not good, because we shouldn't expect zeroed memory returned by 
malloc pointer. This is a job for calloc.


Especially if some command expects zeroed memory after malloc, probably 
after few next calls - it can crash...


For the testing purposes I changed the memset area in mem_malloc_init().
The CONFIG_SYS_MALLOC_LEN is unchanged, so the dfu can still alloc 2x32MB...

The results:
- 158ms - malloc memset len: 40MB
- 109ms - malloc memset len:  1MB

And a quick test for Trats2 with trace clock cycle counter:
- 333ms - malloc memset len:  1MB (for the standard config it was more 
than 1520ms)


The malloc memset can't be removed now, because it requires check/change 
to calloc a lot of calls, but the board can boot if I set this to 256K.


So the final improvement which could be achieved for the odroid config 
is 850ms - 109 ms. This is about 8 times faster.


And the tested boards difference:
- Trats2 - 800MHz
- Odroid X2 - 1000MHz
- different BL1/BL2

Now I'm not so sure about the measurement reliability using the trace.

The Trats2 has no gpios header, and now I don't have time for the 
combinations.


So enable the DFU in the board config will increase the boot time.
But the real reason is that the malloc memory area is set to zero on boot.

I think, that we should follow the malloc/calloc/realloc differences 
like in this description: http://man7.org/linux/man-pages/man3/malloc.3.html


Now I go for some holidays, and probably I will be unreachable until 
9-th February. Sorry for troubles.


Best regards,
--
Przemyslaw Marczak
Samsung RD Institute Poland
Samsung Electronics
p.marc...@samsung.com
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-01-29 Thread Przemyslaw Marczak

Hello,

On 01/28/2015 03:34 PM, Pantelis Antoniou wrote:

Hi Przemyslaw,


On Jan 28, 2015, at 16:30 , Przemyslaw Marczak p.marc...@samsung.com wrote:

Hello,

On 01/28/2015 03:18 PM, Pantelis Antoniou wrote:

Hi Przemyslaw,


On Jan 28, 2015, at 16:10 , Przemyslaw Marczak p.marc...@samsung.com wrote:

Hello Stefan,

On 01/28/2015 02:12 PM, Stefan Roese wrote:

Hi Przemyslaw,

On 28.01.2015 13:55, Przemyslaw Marczak wrote:

This patchset reduces the boot time for ARM architecture,
Exynos boards, and boards with DFU enabled(ARM).

For tested Trats2 device, this was done in three steps.

First was enable the arch memcpy and memset.
The second step was enable memset for .bss clear.
The third step for reduce this operation is to keep .bss section
small as possible.

The .bss section will grow if we have a lot of static variables.
This section is cleared before jump to the relocated U-Boot,
and it's done word by word. To reduce the time for this step,
we can enable arch memset, which uses multiple ARM registers.

For configs with DFU enabled, we can find the dfu buffer in this section,
which has at least 8MB (32MB for trats2). This is a lot of useless data,
which is not required for standard boot. So this buffer should be dynamic
allocated.

Przemyslaw Marczak (3):
   exynos: config: enable arch memcpy and arch memset
   arm: relocation: clear .bss section with arch memset if defined
   dfu: mmc: file buffer: remove static allocation

  arch/arm/lib/crt0.S | 10 +-
  drivers/dfu/dfu_mmc.c   | 25 ++---
  include/configs/exynos-common.h |  3 +++
  3 files changed, 34 insertions(+), 4 deletions(-)


Looking at the commit messages of this patchset I can conclude that your
overall boot time reduction is:

from ~1527ms
to ~464ms

This is amazing! Congrats. :)



Thank you. I was also amazed.

The time results are taken with from the clock cycle counter, I think it's 
reliable. Some day I would like to check it using the oscilloscope.


We really should in general make more use of the optimized functions and
take care that the buffers (e.g. the DFU buffer in this case) are used
in a sane way.

Thanks,
Stefan




Yes you're right, I thought that Exynos config has enabled arch memcpy/set lib, 
before I checked this…



Those numbers are indeed incredible; I suppose the caches are disabled?




The caches are enabled after the relocation, in one of board_init_r calls.



How big is this .bss section? We’re talking about something that takes 1.5secs 
to clear
a few MBs of memory? This is horrible.

Even at the optimized case .5secs is too much.



The .bss section was about 32.3 MB, where the 32MB were required for dfu 
file buf. With the third patch, the .bss section is about 300k.
I'm not saying, that this is fast booting. But the achieved improvement 
is really good for this config



Best regards,
--
Przemyslaw Marczak
Samsung RD Institute Poland
Samsung Electronics
p.marc...@samsung.com


Regards

— Pantelis




Best regards,
--
Przemyslaw Marczak
Samsung RD Institute Poland
Samsung Electronics
p.marc...@samsung.com


Regards

— Pantelis



Best regards,
--
Przemyslaw Marczak
Samsung RD Institute Poland
Samsung Electronics
p.marc...@samsung.com
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-01-28 Thread Stefan Roese

Hi Przemyslaw,

On 28.01.2015 13:55, Przemyslaw Marczak wrote:

This patchset reduces the boot time for ARM architecture,
Exynos boards, and boards with DFU enabled(ARM).

For tested Trats2 device, this was done in three steps.

First was enable the arch memcpy and memset.
The second step was enable memset for .bss clear.
The third step for reduce this operation is to keep .bss section
small as possible.

The .bss section will grow if we have a lot of static variables.
This section is cleared before jump to the relocated U-Boot,
and it's done word by word. To reduce the time for this step,
we can enable arch memset, which uses multiple ARM registers.

For configs with DFU enabled, we can find the dfu buffer in this section,
which has at least 8MB (32MB for trats2). This is a lot of useless data,
which is not required for standard boot. So this buffer should be dynamic
allocated.

Przemyslaw Marczak (3):
   exynos: config: enable arch memcpy and arch memset
   arm: relocation: clear .bss section with arch memset if defined
   dfu: mmc: file buffer: remove static allocation

  arch/arm/lib/crt0.S | 10 +-
  drivers/dfu/dfu_mmc.c   | 25 ++---
  include/configs/exynos-common.h |  3 +++
  3 files changed, 34 insertions(+), 4 deletions(-)


Looking at the commit messages of this patchset I can conclude that your 
overall boot time reduction is:


from ~1527ms
to ~464ms

This is amazing! Congrats. :)

We really should in general make more use of the optimized functions and 
take care that the buffers (e.g. the DFU buffer in this case) are used 
in a sane way.


Thanks,
Stefan

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-01-28 Thread Pantelis Antoniou
Hi Przemyslaw,

 On Jan 28, 2015, at 16:10 , Przemyslaw Marczak p.marc...@samsung.com wrote:
 
 Hello Stefan,
 
 On 01/28/2015 02:12 PM, Stefan Roese wrote:
 Hi Przemyslaw,
 
 On 28.01.2015 13:55, Przemyslaw Marczak wrote:
 This patchset reduces the boot time for ARM architecture,
 Exynos boards, and boards with DFU enabled(ARM).
 
 For tested Trats2 device, this was done in three steps.
 
 First was enable the arch memcpy and memset.
 The second step was enable memset for .bss clear.
 The third step for reduce this operation is to keep .bss section
 small as possible.
 
 The .bss section will grow if we have a lot of static variables.
 This section is cleared before jump to the relocated U-Boot,
 and it's done word by word. To reduce the time for this step,
 we can enable arch memset, which uses multiple ARM registers.
 
 For configs with DFU enabled, we can find the dfu buffer in this section,
 which has at least 8MB (32MB for trats2). This is a lot of useless data,
 which is not required for standard boot. So this buffer should be dynamic
 allocated.
 
 Przemyslaw Marczak (3):
   exynos: config: enable arch memcpy and arch memset
   arm: relocation: clear .bss section with arch memset if defined
   dfu: mmc: file buffer: remove static allocation
 
  arch/arm/lib/crt0.S | 10 +-
  drivers/dfu/dfu_mmc.c   | 25 ++---
  include/configs/exynos-common.h |  3 +++
  3 files changed, 34 insertions(+), 4 deletions(-)
 
 Looking at the commit messages of this patchset I can conclude that your
 overall boot time reduction is:
 
 from ~1527ms
 to ~464ms
 
 This is amazing! Congrats. :)
 
 
 Thank you. I was also amazed.
 
 The time results are taken with from the clock cycle counter, I think it's 
 reliable. Some day I would like to check it using the oscilloscope.
 
 We really should in general make more use of the optimized functions and
 take care that the buffers (e.g. the DFU buffer in this case) are used
 in a sane way.
 
 Thanks,
 Stefan
 
 
 
 Yes you're right, I thought that Exynos config has enabled arch memcpy/set 
 lib, before I checked this…
 

Those numbers are indeed incredible; I suppose the caches are disabled?


 Best regards,
 -- 
 Przemyslaw Marczak
 Samsung RD Institute Poland
 Samsung Electronics
 p.marc...@samsung.com

Regards

— Pantelis

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-01-28 Thread Pantelis Antoniou
Hi Przemyslaw,

 On Jan 28, 2015, at 16:30 , Przemyslaw Marczak p.marc...@samsung.com wrote:
 
 Hello,
 
 On 01/28/2015 03:18 PM, Pantelis Antoniou wrote:
 Hi Przemyslaw,
 
 On Jan 28, 2015, at 16:10 , Przemyslaw Marczak p.marc...@samsung.com 
 wrote:
 
 Hello Stefan,
 
 On 01/28/2015 02:12 PM, Stefan Roese wrote:
 Hi Przemyslaw,
 
 On 28.01.2015 13:55, Przemyslaw Marczak wrote:
 This patchset reduces the boot time for ARM architecture,
 Exynos boards, and boards with DFU enabled(ARM).
 
 For tested Trats2 device, this was done in three steps.
 
 First was enable the arch memcpy and memset.
 The second step was enable memset for .bss clear.
 The third step for reduce this operation is to keep .bss section
 small as possible.
 
 The .bss section will grow if we have a lot of static variables.
 This section is cleared before jump to the relocated U-Boot,
 and it's done word by word. To reduce the time for this step,
 we can enable arch memset, which uses multiple ARM registers.
 
 For configs with DFU enabled, we can find the dfu buffer in this section,
 which has at least 8MB (32MB for trats2). This is a lot of useless data,
 which is not required for standard boot. So this buffer should be dynamic
 allocated.
 
 Przemyslaw Marczak (3):
   exynos: config: enable arch memcpy and arch memset
   arm: relocation: clear .bss section with arch memset if defined
   dfu: mmc: file buffer: remove static allocation
 
  arch/arm/lib/crt0.S | 10 +-
  drivers/dfu/dfu_mmc.c   | 25 ++---
  include/configs/exynos-common.h |  3 +++
  3 files changed, 34 insertions(+), 4 deletions(-)
 
 Looking at the commit messages of this patchset I can conclude that your
 overall boot time reduction is:
 
 from ~1527ms
 to ~464ms
 
 This is amazing! Congrats. :)
 
 
 Thank you. I was also amazed.
 
 The time results are taken with from the clock cycle counter, I think it's 
 reliable. Some day I would like to check it using the oscilloscope.
 
 We really should in general make more use of the optimized functions and
 take care that the buffers (e.g. the DFU buffer in this case) are used
 in a sane way.
 
 Thanks,
 Stefan
 
 
 
 Yes you're right, I thought that Exynos config has enabled arch memcpy/set 
 lib, before I checked this…
 
 
 Those numbers are indeed incredible; I suppose the caches are disabled?
 
 
 
 The caches are enabled after the relocation, in one of board_init_r calls.
 

How big is this .bss section? We’re talking about something that takes 1.5secs 
to clear
a few MBs of memory? This is horrible.

Even at the optimized case .5secs is too much.

 Best regards,
 --
 Przemyslaw Marczak
 Samsung RD Institute Poland
 Samsung Electronics
 p.marc...@samsung.com
 
 Regards
 
 — Pantelis
 
 
 
 Best regards,
 -- 
 Przemyslaw Marczak
 Samsung RD Institute Poland
 Samsung Electronics
 p.marc...@samsung.com

Regards

— Pantelis

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-01-28 Thread Przemyslaw Marczak

Hello Stefan,

On 01/28/2015 02:12 PM, Stefan Roese wrote:

Hi Przemyslaw,

On 28.01.2015 13:55, Przemyslaw Marczak wrote:

This patchset reduces the boot time for ARM architecture,
Exynos boards, and boards with DFU enabled(ARM).

For tested Trats2 device, this was done in three steps.

First was enable the arch memcpy and memset.
The second step was enable memset for .bss clear.
The third step for reduce this operation is to keep .bss section
small as possible.

The .bss section will grow if we have a lot of static variables.
This section is cleared before jump to the relocated U-Boot,
and it's done word by word. To reduce the time for this step,
we can enable arch memset, which uses multiple ARM registers.

For configs with DFU enabled, we can find the dfu buffer in this section,
which has at least 8MB (32MB for trats2). This is a lot of useless data,
which is not required for standard boot. So this buffer should be dynamic
allocated.

Przemyslaw Marczak (3):
   exynos: config: enable arch memcpy and arch memset
   arm: relocation: clear .bss section with arch memset if defined
   dfu: mmc: file buffer: remove static allocation

  arch/arm/lib/crt0.S | 10 +-
  drivers/dfu/dfu_mmc.c   | 25 ++---
  include/configs/exynos-common.h |  3 +++
  3 files changed, 34 insertions(+), 4 deletions(-)


Looking at the commit messages of this patchset I can conclude that your
overall boot time reduction is:

from ~1527ms
to ~464ms

This is amazing! Congrats. :)



Thank you. I was also amazed.

The time results are taken with from the clock cycle counter, I think 
it's reliable.  Some day I would like to check it using the oscilloscope.



We really should in general make more use of the optimized functions and
take care that the buffers (e.g. the DFU buffer in this case) are used
in a sane way.

Thanks,
Stefan




Yes you're right, I thought that Exynos config has enabled arch 
memcpy/set lib, before I checked this...


Best regards,
--
Przemyslaw Marczak
Samsung RD Institute Poland
Samsung Electronics
p.marc...@samsung.com
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


Re: [U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-01-28 Thread Przemyslaw Marczak

Hello,

On 01/28/2015 03:18 PM, Pantelis Antoniou wrote:

Hi Przemyslaw,


On Jan 28, 2015, at 16:10 , Przemyslaw Marczak p.marc...@samsung.com wrote:

Hello Stefan,

On 01/28/2015 02:12 PM, Stefan Roese wrote:

Hi Przemyslaw,

On 28.01.2015 13:55, Przemyslaw Marczak wrote:

This patchset reduces the boot time for ARM architecture,
Exynos boards, and boards with DFU enabled(ARM).

For tested Trats2 device, this was done in three steps.

First was enable the arch memcpy and memset.
The second step was enable memset for .bss clear.
The third step for reduce this operation is to keep .bss section
small as possible.

The .bss section will grow if we have a lot of static variables.
This section is cleared before jump to the relocated U-Boot,
and it's done word by word. To reduce the time for this step,
we can enable arch memset, which uses multiple ARM registers.

For configs with DFU enabled, we can find the dfu buffer in this section,
which has at least 8MB (32MB for trats2). This is a lot of useless data,
which is not required for standard boot. So this buffer should be dynamic
allocated.

Przemyslaw Marczak (3):
   exynos: config: enable arch memcpy and arch memset
   arm: relocation: clear .bss section with arch memset if defined
   dfu: mmc: file buffer: remove static allocation

  arch/arm/lib/crt0.S | 10 +-
  drivers/dfu/dfu_mmc.c   | 25 ++---
  include/configs/exynos-common.h |  3 +++
  3 files changed, 34 insertions(+), 4 deletions(-)


Looking at the commit messages of this patchset I can conclude that your
overall boot time reduction is:

from ~1527ms
to ~464ms

This is amazing! Congrats. :)



Thank you. I was also amazed.

The time results are taken with from the clock cycle counter, I think it's 
reliable. Some day I would like to check it using the oscilloscope.


We really should in general make more use of the optimized functions and
take care that the buffers (e.g. the DFU buffer in this case) are used
in a sane way.

Thanks,
Stefan




Yes you're right, I thought that Exynos config has enabled arch memcpy/set lib, 
before I checked this…



Those numbers are indeed incredible; I suppose the caches are disabled?




The caches are enabled after the relocation, in one of board_init_r calls.


Best regards,
--
Przemyslaw Marczak
Samsung RD Institute Poland
Samsung Electronics
p.marc...@samsung.com


Regards

— Pantelis




Best regards,
--
Przemyslaw Marczak
Samsung RD Institute Poland
Samsung Electronics
p.marc...@samsung.com
___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot


[U-Boot] [PATCH 0/3] arm: reduce .bss section clear time

2015-01-28 Thread Przemyslaw Marczak
This patchset reduces the boot time for ARM architecture,
Exynos boards, and boards with DFU enabled(ARM).

For tested Trats2 device, this was done in three steps.

First was enable the arch memcpy and memset.
The second step was enable memset for .bss clear.
The third step for reduce this operation is to keep .bss section
small as possible.

The .bss section will grow if we have a lot of static variables.
This section is cleared before jump to the relocated U-Boot,
and it's done word by word. To reduce the time for this step,
we can enable arch memset, which uses multiple ARM registers.

For configs with DFU enabled, we can find the dfu buffer in this section,
which has at least 8MB (32MB for trats2). This is a lot of useless data,
which is not required for standard boot. So this buffer should be dynamic
allocated.

Przemyslaw Marczak (3):
  exynos: config: enable arch memcpy and arch memset
  arm: relocation: clear .bss section with arch memset if defined
  dfu: mmc: file buffer: remove static allocation

 arch/arm/lib/crt0.S | 10 +-
 drivers/dfu/dfu_mmc.c   | 25 ++---
 include/configs/exynos-common.h |  3 +++
 3 files changed, 34 insertions(+), 4 deletions(-)

-- 
1.9.1

___
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot