Re: jffs zlib tuning

2008-01-09 Thread Jordan Crouse
On 08/01/08 17:09 -0800, William Fisher wrote:
 Jordan Crouse wrote:
 On 08/01/08 12:06 -0500, Bernardo Innocenti wrote:
 (cc CP, aleph)

 David Woodhouse wrote:

 1. Did anybody profile the kernel while reading files? Last thing I red 
 on this list is that the profiler does not work on the XO in kernel 
 mode. Did anybody fix that
 I believe that standard kernel profiling (on timer ticks) has always
 worked, and even continues to work even though we use a tickless kernel
 now. I think oprofile also works.
 oprofile works, but for some reason it cannot generate
 call graphs.

 It vaguely remember that the problem was that on the Geode we
 were using sw timers rather than NMI as a timing source.
 Right - but that should only prevent us from benchmarking within 
 interrupts
 in the kernel - it shouldn't have any effect on our userland benchmarking.
 I'm no oprofile expert (I couldn't get it working at all when I tried it
 the other day), but do you have the debug version of libc loaded too?  
 Maybe
 it can't find the symbols.
 Jordan
 If you have NMI interrupts selected for Oprofile, you can also
 get samples from the other lower level interrupt handlers.

 Since OProfile can be run in either NMI interrupts or normal
 timer based interrupts.

We can't use NMI, because we have no mechanism for causing the NMIs.
Modern processors such as the k8 use registers called event counters
to count a number of events between sample periods (events being some
processor quality like number of instructions executed or number of
cache misses).  These event counters can be set up in such a way that
they cause a NMI when the counter rolls over.  This is how oprofile
takes advantage of the silicon.  The kicker is that even though the Geode
has event counters, we cannot set them up to cause a NMI.  So, with that
mechanism lost, we're stuck with the timer tick.

This has been discussed several times over the lifetime of the project - you
can go back over the archives of the list and see our past conclusions
on this matter.

Jordan


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: jffs zlib tuning

2008-01-09 Thread Tom Sylla
On Jan 9, 2008 10:51 AM, Jordan Crouse [EMAIL PROTECTED] wrote:
 We can't use NMI, because we have no mechanism for causing the NMIs.
 Modern processors such as the k8 use registers called event counters
 to count a number of events between sample periods (events being some
 processor quality like number of instructions executed or number of
 cache misses).  These event counters can be set up in such a way that
 they cause a NMI when the counter rolls over.  This is how oprofile
 takes advantage of the silicon.  The kicker is that even though the Geode
 has event counters, we cannot set them up to cause a NMI.  So, with that
 mechanism lost, we're stuck with the timer tick.

 This has been discussed several times over the lifetime of the project - you
 can go back over the archives of the list and see our past conclusions
 on this matter.

There is still the possibility of using the MFGPT block in 5536 to
cause periodic NMIs. That would be similar to oprofile's RTC or timer
interrupt mode, but with support to profile interrupt handler code,
and pretty-good granularity (14MHz-based)

The perfmon-rollover style (not available on LX) is certainly the best
way to do something like oprofile, but if you can make periodic NMIs
fast enough, you should be able to get pretty close to similar
quality.

This completely new oprofile mode would be very specific to
GX/LX/5536, but it would improve oprofile usefulness if someone wants
to make the (pretty large) effort.
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: jffs zlib tuning

2008-01-08 Thread David Woodhouse

On Mon, 2008-01-07 at 18:15 +0100, NoiseEHC wrote:
 This message is primarily written for Bernardo Innocenti but everybody 
 with relevant knowledge is welcomed to give some insight.
 
 I have decided two months ago that will write an asm implementation for 
 zlib inflate (decompression) since Mitch Bradley said that the read 
 speed is 3MB/sec which is dominated by the decompression code.
 http://lists.laptop.org/pipermail/devel/2007-November/007527.html
 
 
 Since then I went through the pain of installing linux in VirtualPC, 
 compiling the code in linux and ended up with a kernel module which can 
 test zlib code finally (took a month of my spare time, if I would have 
 known this in advance I would not have started...). Now I understand the 
 zlib code but need some info before acting on wrong assumptions:
 
 1. Did anybody profile the kernel while reading files? Last thing I red 
 on this list is that the profiler does not work on the XO in kernel 
 mode. Did anybody fix that

I believe that standard kernel profiling (on timer ticks) has always
worked, and even continues to work even though we use a tickless kernel
now. I think oprofile also works.

 2. How does the file reading work? As I imagine the flash is read by DMA 
 and the resulting data is uncompressed to a buffer. Is it correct?. 

Yes, that's right.

 Is the decompressed data gets copied to the target location or does it gets 
 decompressed to their final place? 

http://dev.laptop.org/git?p=olpc-2.6;a=blob;f=fs/jffs2/read.c#l81

 If it is copied, did somebody profile how much time it takes? These
 questions are important to know how much L2 cache is trashed in the
 process and which data needs prefetching.

That hasn't been profiled specifically, no.

 3. How long is the average data length which jffs2 uses for calling inflate?

The maximum length of uncompressed data is 4KiB. The mean is probably
slightly less than that. You could instrument the jffs2dump program from
mtd-utils to give you more accurate answers.

If you want to get involved with compression, see
http://www.inf.u-szeged.hu/jffs2/bbc.php

-- 
dwmw2

___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: jffs zlib tuning

2008-01-08 Thread Bernardo Innocenti
(cc CP, aleph)

David Woodhouse wrote:

 1. Did anybody profile the kernel while reading files? Last thing I red 
 on this list is that the profiler does not work on the XO in kernel 
 mode. Did anybody fix that
 
 I believe that standard kernel profiling (on timer ticks) has always
 worked, and even continues to work even though we use a tickless kernel
 now. I think oprofile also works.

oprofile works, but for some reason it cannot generate
call graphs.

It vaguely remember that the problem was that on the Geode we
were using sw timers rather than NMI as a timing source.

Without call graphs, benchmarking our rendering stack has been
somewhat harder.  You'd find lots of time spent in generic
functions such as memcpy(), without a clue why.

-- 
 \___/
 |___|   Bernardo Innocenti - http://www.codewiz.org/
  \___\  One Laptop Per Child - http://www.laptop.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: jffs zlib tuning

2008-01-08 Thread Jordan Crouse
On 08/01/08 12:06 -0500, Bernardo Innocenti wrote:
 (cc CP, aleph)

 David Woodhouse wrote:

 1. Did anybody profile the kernel while reading files? Last thing I red 
 on this list is that the profiler does not work on the XO in kernel mode. 
 Did anybody fix that
 I believe that standard kernel profiling (on timer ticks) has always
 worked, and even continues to work even though we use a tickless kernel
 now. I think oprofile also works.

 oprofile works, but for some reason it cannot generate
 call graphs.

 It vaguely remember that the problem was that on the Geode we
 were using sw timers rather than NMI as a timing source.

Right - but that should only prevent us from benchmarking within interrupts
in the kernel - it shouldn't have any effect on our userland benchmarking.

I'm no oprofile expert (I couldn't get it working at all when I tried it
the other day), but do you have the debug version of libc loaded too?  Maybe
it can't find the symbols.

Jordan

-- 
Jordan Crouse
Systems Software Development Engineer 
Advanced Micro Devices, Inc.


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: jffs zlib tuning

2008-01-08 Thread William Fisher
Jordan Crouse wrote:
 On 08/01/08 12:06 -0500, Bernardo Innocenti wrote:
 (cc CP, aleph)

 David Woodhouse wrote:

 1. Did anybody profile the kernel while reading files? Last thing I red 
 on this list is that the profiler does not work on the XO in kernel mode. 
 Did anybody fix that
 I believe that standard kernel profiling (on timer ticks) has always
 worked, and even continues to work even though we use a tickless kernel
 now. I think oprofile also works.
 oprofile works, but for some reason it cannot generate
 call graphs.

 It vaguely remember that the problem was that on the Geode we
 were using sw timers rather than NMI as a timing source.
 
 Right - but that should only prevent us from benchmarking within interrupts
 in the kernel - it shouldn't have any effect on our userland benchmarking.
 
 I'm no oprofile expert (I couldn't get it working at all when I tried it
 the other day), but do you have the debug version of libc loaded too?  Maybe
 it can't find the symbols.
 
 Jordan
 
If you have NMI interrupts selected for Oprofile, you can also
get samples from the other lower level interrupt handlers.

Since OProfile can be run in either NMI interrupts or normal
timer based interrupts.

Check out the oprofile driver for your kernel and check which
.config option you have selected for your build.

I use NMI based sampling with the AMD  use it to look at the profiles in the
various device driver interrupt handlers for both disk
and ethernet controllers.

Check out: ../kernel/linux-2.6.20/drivers/oprofile/timer_int.c

This contains the ops vector initialization code, from which
the timer_notify() procedure is called.

The other code to look at is ../kernel/profile.c which contains
the general profile support.

-- Bill
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: jffs zlib tuning

2008-01-07 Thread Bernardo Innocenti
NoiseEHC wrote:

 It is unfortunate that it was not discussed on this list. Anyway, at 
 least I have learnt writing Linux drivers in the process...
 What compression can be achieved with LZO? Last time I used it only for 
 real time compression (backup), and not because of the compression 
 ratio. (I suppose that you have tested it.)

Note that we compress 4KB blocks in jffs2.  With small block sizes,
algorithms with better compression tend to loose most of their
advantage.

We can't say much until we benchmark lzo vs gzip with small
files.

-- 
 \___/
 |___|   Bernardo Innocenti - http://www.codewiz.org/
  \___\  One Laptop Per Child - http://www.laptop.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: jffs zlib tuning

2008-01-07 Thread Bernardo Innocenti
NoiseEHC wrote:

 4. Bernardo, as I imagine you are compiling kernels every day. If I send 
 you a patch (normally a rewritten inffast.c), would you test it on a 
 real XO machine, please?

I don't build kernels every day, but I'd be delighted to test
such an improvement.  And if you make it clean enough, I'm sure
it would have an easy way into the mainline kernel.

However, I recommend hacking in libz first, making it work with
gzip, and staert porting it to the kernel next step. Debugging
and benchmarking in userspace is *so* much easier.

Also, I'd be very surprised if such an obvious optimization
hadn't been tried already in 20+ years of gzip.  Try digging
around: you may find that it's not worth it.

-- 
 \___/
 |___|   Bernardo Innocenti - http://www.codewiz.org/
  \___\  One Laptop Per Child - http://www.laptop.org/
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: jffs zlib tuning

2008-01-07 Thread NoiseEHC

 However, I recommend hacking in libz first, making it work with
 gzip, and staert porting it to the kernel next step. Debugging
 and benchmarking in userspace is *so* much easier.

Ah, I tried it. Unfortunately the zlib in the kernel is a heavily 
modified zlib and I was not able to compile it in user space. So I have 
learnt writing kernel modules (took just 2 days, it is so simple 
compared to windows drivers that I still cannot believe it). And there 
is a comment in the kernel zlib that user-space support was removed...
 Also, I'd be very surprised if such an obvious optimization
 hadn't been tried already in 20+ years of gzip.  Try digging
 around: you may find that it's not worth it.

The optimization that I thought of is absolutely Geode specific. First 
it needs some prefetch, and secondly it has a lot of branches. The Geode 
has a very simple 1 bit branch predictor (it seems like that but not 
documented) so it can waste 20-40 cycles for every run (every 
length/distance code). I know that it is hard to create better code than 
a C compiler nowadays so I am sure that simply rewriting the code in asm 
would not speed things up (5 years ago LZO had several asm 
implementations for 486/586/686 but ironically all were slower than the 
compiler generated one.)

Now that you have mentioned that jffs2 uses only 4K blocks, it can be 
possible that the bottleneck is not in inffast.c. Do you have ANY 
perf/profile data, please?
All I would like to know whether the bottleneck lies in inffast or not:
/*
   When large enough input and output buffers are supplied to inflate(), for
   example, a 16K input buffer and a 64K output buffer, more than 95% of the
   inflate execution time is spent in this routine.
*/


___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: jffs zlib tuning

2008-01-07 Thread imm
 On 7 Jan 2008, at 19:28, Benjamin M. Schwartz wrote:

 Nonetheless, I decided to benchmark the compressors on
 http://xs-dev.laptop.org/~cscott/olpc/streams/joyride/build1513/ 
 devel_jffs2/xo-1-olpc-stream-joyride-build-1513-20080105_1602- 
 devel_jffs2-tree.tar.bz2

 Results:
 Uncompressed 732M
 LZO -1 (fast)345M
 LZO -9 (slow)298M
 Gzip (defaults)  269M
 Bzip2 (as downloaded)236M

 Note that fast vs. slow refers to compression speed.   
 Decompression is
 the same speed in either case.

 These tests were run on the whole tar-file.  As noted by Bernie, the
 differences would likely be smaller on 4KB blocks.

 I had some notes, which I can't now find, for a real-time sample  
 logger we built...
 Anyway, we did some tests compressing small blocks, and I am sure  
 we saw much less of a difference than Benjamin saw on his big block.
 Also, I'm pretty sure bzip2 fared very badly in our test, the small  
 blocks didn't suit it at all (and it was slower, of course.)
 Of course, the statistics of our sample data was probably not  
 representative, so YMMV...




___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel


Re: jffs zlib tuning

2008-01-07 Thread NoiseEHC
Then it is the last nail to the coffin of asm-izing zlib. As I see the 
LZO code is cache friendly so there is no way to speed it up.
I am sorry wasting you time. :)


imm wrote:
 On 7 Jan 2008, at 19:28, Benjamin M. Schwartz wrote:

 
 Nonetheless, I decided to benchmark the compressors on
 http://xs-dev.laptop.org/~cscott/olpc/streams/joyride/build1513/ 
 devel_jffs2/xo-1-olpc-stream-joyride-build-1513-20080105_1602- 
 devel_jffs2-tree.tar.bz2

 Results:
 Uncompressed 732M
 LZO -1 (fast)345M
 LZO -9 (slow)298M
 Gzip (defaults)  269M
 Bzip2 (as downloaded)236M

 Note that fast vs. slow refers to compression speed.   
 Decompression is
 the same speed in either case.

 These tests were run on the whole tar-file.  As noted by Bernie, the
 differences would likely be smaller on 4KB blocks.
   
 I had some notes, which I can't now find, for a real-time sample  
 logger we built...
 Anyway, we did some tests compressing small blocks, and I am sure  
 we saw much less of a difference than Benjamin saw on his big block.
 Also, I'm pretty sure bzip2 fared very badly in our test, the small  
 blocks didn't suit it at all (and it was slower, of course.)
 Of course, the statistics of our sample data was probably not  
 representative, so YMMV...



 

 ___
 Devel mailing list
 Devel@lists.laptop.org
 http://lists.laptop.org/listinfo/devel


   
___
Devel mailing list
Devel@lists.laptop.org
http://lists.laptop.org/listinfo/devel