from:"epsi"

Re: Memory performance / Cache problem

2009-10-15 Thread epsi


 On Wednesday 14 October 2009 17:48:39 ext e...@gmx.de wrote:
  Mem clock is both times 166MHz. I don't know whether are differences in
  cycle access and timing, but memclock is fine.
 
  Following Siarhei hints of initialize the buffers (around 1.2 MByte
 each)
  I get different results in 22kernel for use of
  malloc alone
  memcpy =   473.764, loop4 =   448.430, loop1 =   102.770, rand =   
 29.641
  calloc alone
  memcpy =   405.947, loop4 =   361.550, loop1 =95.441, rand =   
 21.853
  malloc+memset:
  memcpy =   239.294, loop4 =   188.617, loop1 =80.871, rand =
 4.726
 
  In 31kernel all 3 measures are about the same (unfortunatly low) level
 of
  malloc+memset in 22.
 
  First of all: What performance can be expected?
  Does 22 make failures if it is so much faster?
  Can the later kernels get a boost in memory handling?
 
 What you see is just a (fake) performance boost because you have a single
 physical page shared between all the virtual pages in the source buffer.
 So
 you get no cache misses on read operations and everything seems fast.
 
 This is unlikely to happen on real use, and it does not reflect real
 memory
 performance. So the benchmark is inadequate.

 
 You can get some basic information here:
 http://en.wikipedia.org/wiki/Copy-on-write
 
 Regarding the difference in behavior between .22 and recent kernels. It
 may be
 some regression in copy-on-write implementation, or just some change done
 on
 purpose. That is assuming that the userspace stuff was identical in both
 tests.
 

Ok, understand the difference if the memory is uninitialised.
But why there is the difference in malloc + memset and calloc?
In both cases the memory will be cleared.


-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RE: RE: Memory performance / Cache problem

2009-10-14 Thread epsi

Mem clock is both times 166MHz. I don't know whether are differences in cycle 
access and timing, but memclock is fine.

Following Siarhei hints of initialize the buffers (around 1.2 MByte each)
I get different results in 22kernel for use of
malloc alone
memcpy =   473.764, loop4 =   448.430, loop1 =   102.770, rand =29.641
calloc alone
memcpy =   405.947, loop4 =   361.550, loop1 =95.441, rand =21.853
malloc+memset:
memcpy =   239.294, loop4 =   188.617, loop1 =80.871, rand = 4.726

In 31kernel all 3 measures are about the same (unfortunatly low) level of 
malloc+memset in 22.

First of all: What performance can be expected?
Does 22 make failures if it is so much faster?
Can the later kernels get a boost in memory handling?

I used a standard memcpy (think this is glib and hence not neonbased)? 
To be neonbased I guess it has to be recompiled?

How can I find out that neon and cache settings are ok?
Using a Omap3530 on EVM board

Unfortunatly I don't have a Lauterbach, just a Spectrum Digital which works 
only until Linux kernel is booting.

Best regards
Steffen


 Original-Nachricht 
 Datum: Wed, 14 Oct 2009 08:59:05 -0500
 Von: Woodruff, Richard r-woodru...@ti.com
 An: e...@gmx.de e...@gmx.de, Premi, Sanjeev pr...@ti.com, 
 linux-omap@vger.kernel.org linux-omap@vger.kernel.org
 Betreff: RE: RE: Memory performance / Cache problem

  There is no newer u-boot from TI available. There is a SDK 02.01.03.11
  but it contains the same uboot 2008.10 with the only addition of the
 second
  generation of EVM boards with another network chip.
 
  So I checked the uboot from git, but this doesn't support Microns NAND
 Flash
  anymore. It is just working with ONENAND.
 
  I found a patch which shows the L2 Cache status while kernel boot and
  implemented it : L2 Cache seems to be already enabled - so this is not
 the
  reason.
 
  So any other ideas?
 
 Are you confident your memory bus isn't running at 1/2 speed?
 
 I recall there was a couple day window during wtbu kernel upgrades where
 memory bus speed with pm was running 1/2 speed after kernel started up. 
 This was somewhat a side effect of constraints frame work and a regression in
 forward porting. It seems unlikely psp kernel would have shipped with this
 bug but its something to check. This would match your results.
 
 If your memcpy() is neon based then I might be worried about
 l1neon-caching effects along with factors of (exlcusive-l1-l2-read-allocate 
 cache + pld
 not being effective on l1 only l2).
 
 Which memcpy test are you using? Something in lmbench or just one you
 wrote.  Generally results are a little hard to interpret with exclusive cache
 behavior in 3430's r1px core.  3630's r3p2 core gives more traditional
 results as exclusive feature has been removed by arm.
 
 If you have the ability using Lauterbach + per file will allow internal
 space dump which will show all critical parameters during test.  It's a 1
 minute check for someone who has done it before to ensure the few parameters
 needed are in line.  I can send an example off line of how to do capture.  I
 won't have time to expand on all relevant parameters.
 
 Regards,
 Richard W.

-- 
Neu: GMX DSL bis 50.000 kBit/s und 200,- Euro Startguthaben!
http://portal.gmx.net/de/go/dsl02
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RE: RE: RE: Memory performance / Cache problem

2009-10-14 Thread epsi

  Mem clock is both times 166MHz. I don't know whether are differences in
 cycle
  access and timing, but memclock is fine.
 
 How did you physically verify this?

Oszi show 166MHz, also the kernel message about freq are in both kernels the 
same.

  Following Siarhei hints of initialize the buffers (around 1.2 MByte
 each)
  I get different results in 22kernel for use of
  malloc alone
  memcpy =   473.764, loop4 =   448.430, loop1 =   102.770, rand =   
 29.641
  calloc alone
  memcpy =   405.947, loop4 =   361.550, loop1 =95.441, rand =   
 21.853
  malloc+memset:
  memcpy =   239.294, loop4 =   188.617, loop1 =80.871, rand =
 4.726
 
  In 31kernel all 3 measures are about the same (unfortunatly low) level
 of
  malloc+memset in 22.
 
 Yes aligned buffers can make a difference.  But probably more so for small
 copies.  Of course you must touch the memory or mprotect() it so its
 faulted in, but indications are you have done this.

Mh, alignment (to an address) is done with malloc already. Probably you mean 
something different. I don't understand the difference. For me is 
malloc+memset=calloc. 
I'll send you the benchmark code, if you like. 

  I used a standard memcpy (think this is glib and hence not neonbased)?
  To be neonbased I guess it has to be recompiled?
 
 The version of glibc in use can make a difference.  CodeSourcery in 2009
 release added PLD's to mem operations.  This can give a good benefit.  It
 might be you have optimized library in one case and a non-optimized in
 another.

In both kernels I used the same rootfs (via NFS). Indeed I used CS2009q1 and 
its libs, but we are talking about factor 2..6. This must be something serious.

What is your feeling? Does the 22 something strange or are the newer kernels 
slower that they have to be.

Would be interesting to see results on other Omap3 boards with both old an new 
kernels.

Best regards
Steffen
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: RE: Memory performance / Cache problem

2009-10-13 Thread epsi

 
 Can you upgrade to a newer u-boot? Either from the PSP release
 OR u-boot tree hosted at git.denx.de (atleast 2009.03)?
 
 Also, it will be good to see the sample program you are using.
 
 ~sanjeev
 

There is no newer u-boot from TI available. There is a SDK 02.01.03.11
but it contains the same uboot 2008.10 with the only addition of the second 
generation of EVM boards with another network chip.

So I checked the uboot from git, but this doesn't support Microns NAND Flash 
anymore. It is just working with ONENAND.

I found a patch which shows the L2 Cache status while kernel boot and 
implemented it : L2 Cache seems to be already enabled - so this is not the 
reason.

So any other ideas? 
-- 
GRATIS für alle GMX-Mitglieder: Die maxdome Movie-FLAT!
Jetzt freischalten unter http://portal.gmx.net/de/go/maxdome01
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory performance / Cache problem

2009-10-13 Thread epsi

The L2 cache is set and running.
I don't know - can it be configured or misconfigured somehow?

I just checked the output of 2.6.22 kernel and get these lines (which I don't 
have in newer kernels):

CPU0: D VIPT write-through cache
CPU0: cache: 768 bytes, associativity 1, 8 byte lines, 64 sets
Built 1 zonelists.  Total pages: 32512

I am wondering what is this? First thought was L1 cache, but it's to small. 

The benchmark is running on same hardware, same uboot, same rootfs, just the 
kernel is different.


 On Monday 12 October 2009 10:54:09 ext e...@gmx.de wrote:
  I found the memory performance of newer kernels are quit poor on an
  EVM-Omap3 board. It works with 2-6 times performance on the old 2.6.22
  kernel from TI's PSP.
 
  Possible reasons:
  - problem in config the kernel (did omap3_evm_defconfig)
  - problem in kernel
  - kernel expects some settings from uboot, which are not done there
 
  I have tried the 2.6.29rc3 (from TI's PSP) and the 2.6.31 from git-tree.
  Both behave quite simular:
 
  Transport in MByte:
memcpy =   204.073, loop4 =   183.212, loop1 =81.693, rand =
  4.534
 
  while the 22 kernel:
   memcpy =   453.932, loop4 =   469.934, loop1 =   125.031, rand =
  29.631
 
  Can someone give me help or can at least confirm that?
 
 The numbers from 2.6.22 kernel look much better than anything I have ever
 seen with OMAP3.
 
 How are you doing benchmarking? Is source buffer properly initialized?
 
 The point is that if you just happen to allocate a large buffer without
 initializing it, it may end up having all the memory pages referencing to
 a
 single zero page in physical memory. In this case reading from this buffer
 will in fact be perfectly cached in L1 cache and memcpy would look fast.
 
 If it is not the case, investigating how to boost memory performance in
 the
 latest kernels is very interesting for sure.
 
 -- 
 Best regards,
 Siarhei Siamashka


-- 
Neu: GMX DSL bis 50.000 kBit/s und 200,- Euro Startguthaben!
http://portal.gmx.net/de/go/dsl02
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: Memory performance / Cache problem

2009-10-12 Thread epsi

Linux version 2.6.31 (s...@localhost) (gcc version 4.3.3 (Sourcery G++ Lite 
2009q1-203) ) #1 Mon Oct 12 08:30:58 CEST 2009
CPU: ARMv7 Processor [411fc082] revision 2 (ARMv7), cr=10c53c7f
CPU: VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
Machine: OMAP3 EVM
Memory policy: ECC disabled, Data cache writeback
OMAP3430 ES2.1
SRAM: Mapped pa 0x4020 to va 0xe300 size: 0x10
Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 32512  

 

I see, you get the message about L2 Cache, which I don't have
Do you know how to enable this?
Shoudn't the kernel configure all this things - not rely on bootloader?
I am using the U-Boot 2008.10 (TIs PSP)
The old 22 kernel is independet from the uboot in this issue.

Thanks
Steffen
-- 
Jetzt kostenlos herunterladen: Internet Explorer 8 und Mozilla Firefox 3.5 -
sicherer, schneller und einfacher! http://portal.gmx.net/de/go/atbrowser
--
To unsubscribe from this list: send the line unsubscribe linux-omap in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Memory performance / Cache problem

Re: RE: RE: Memory performance / Cache problem

Re: RE: RE: RE: Memory performance / Cache problem

Re: RE: Memory performance / Cache problem

Re: Memory performance / Cache problem

RE: Memory performance / Cache problem

6 matches

Site Navigation

Mail list logo

Footer information