Re: [PATCH 06/14] Pramfs: Include files

2009-06-22 Thread Jörn Engel
On Mon, 22 June 2009 20:31:10 +0100, Chris Simmonds wrote:
 
 I disagree: that adds an unnecessary overhead for those architectures 
 where the cpu byte order does not match the data structure ordering. I 
 think the data structures should be native endian and when mkpramfs is 
 written it can take a flag (e.g. -r) in the same way mkcramfs does.

Just to quantify this point, I've written a small crap program:
#include stdio.h
#include stdint.h
#include byteswap.h
#include sys/time.h

long long delta(struct timeval *t1, struct timeval *t2)
{
long long delta;

delta  = 100ull * t2-tv_sec + t2-tv_usec;
delta -= 100ull * t1-tv_sec + t1-tv_usec;
return delta;
}

#define LOOPS 1
int main(void)
{
long native = 0;
uint32_t narrow = 0;
uint64_t wide = 0, native_wide = 0;
struct timeval t1, t2, t3, t4, t5;
int i;

gettimeofday(t1, NULL);
for (i = 0; i  LOOPS; i++)
native++;
gettimeofday(t2, NULL);
for (i = 0; i  LOOPS; i++)
narrow = bswap_32(bswap_64(narrow) + 1);
gettimeofday(t3, NULL);
for (i = 0; i  LOOPS; i++)
native_wide++;
gettimeofday(t4, NULL);
for (i = 0; i  LOOPS; i++)
wide = bswap_64(bswap_64(wide) + 1);
gettimeofday(t5, NULL);
printf(long:  %9lld us\n, delta(t1, t2));
printf(we32:  %9lld us\n, delta(t2, t3));
printf(u64:   %9lld us\n, delta(t3, t4));
printf(we64:  %9lld us\n, delta(t4, t5));
printf(loops: %9d\n, LOOPS);
return 0;
}

Four loops doing the same increment with different data types: long,
u64, we32 (wrong-endian) and we64.  Compile with _no_ optimizations.

Results on my i386 notebook:
long: 453953 us
we32: 880273 us
u64:  504214 us
we64:2259953 us
loops: 1

Or thereabouts, not completely stable.  Increasing the data width is 10%
slower, 32bit endianness conversions is 2x slower, 64bit conversion is
5x slower.

However, even the we64 loop still munches through 353MB/s (100M
conversions in 2.2s, 8bytes per converion.  Double the number if you
count both conversion to/from wrong endianness).  Elsewhere in this
thread someone claimed the filesystem peaks out at 13MB/s.  One might
further note that only filesystem metadata has to go through endianness
conversion, so on this particular machine it is completely lost in the
noise.

Feel free to run the program on any machine you care about.  If you get
numbers to back up your position, I'm willing to be convinced.  Until
then, I consider the alleged overhead of endianness conversion a prime
example of premature optimization.

Jörn

-- 
Joern's library part 7:
http://www.usenix.org/publications/library/proceedings/neworl/full_papers/mckusick.a
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 06/14] Pramfs: Include files

2009-06-22 Thread Jörn Engel
On Mon, 22 June 2009 23:20:39 +0100, David Woodhouse wrote:
 On Mon, 2009-06-22 at 23:41 +0200, Jörn Engel wrote:
  Four loops doing the same increment with different data types: long,
  u64, we32 (wrong-endian) and we64.  Compile with _no_ optimizations.
 
 That's a bit of a poor test then. Especially on architectures with a
 load-and-swap instruction where it really shouldn't be any slower at
 all.
 
 (Although since GCC doesn't have an __attribute__((littleendian)) I'm
 not entirely sure how to entice it into _using_ said instruction for the
 purpose of the test... I think the kernel does manage somehow though, if
 you get the sources _just_ right.)

Feel free to improve the test.  It is admittedly crap and designed to
support Chris' argument.  But seeing that it still fails to do so and
Arnd has already shown one improvement that weakened Chris' argument, I
guess we can all agree that further improvments won't change the
conclusion, can we? ;)

Jörn

-- 
It's just what we asked for, but not what we want!
-- anonymous
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LZMA inclusion

2008-12-08 Thread Jörn Engel
On Sun, 7 December 2008 23:32:32 +, Phillip Lougher wrote:
 
 Currently, as mentioned above, Squashfs decompresses into a single 
 contiguous output buffer.  But, due to the linux kernel mailing list's 
 dislike of vmalloc, this is being changed.

Don't blame lkml, blame Intel and IBM.  Back in the days of the 386, a
beefy machine had 8MB of physical memory and 4GB of virtual memory
space.  Noone had to worry about fragmentation anymore.  If you needed a
1MB buffer, you'd just round up some 256 pages and instruct the mmu to
map them into a large contiguous address range in the virtual address
space.  Life was good indeed.

But physical memory has constantly grown since, while the virtual memory
space has for a long time stagnated.  Intel even introduced some
hardware hacks to use up to 64GB of physical memory with a measly 4GB of
virtual memory.  Now it was _virtual_ memory fragmentation that you had
to worry about.

These days most CPUs you'd buy are 64bit, so virtual memory space has
become useful again.  But as a kernel hacker, you have little control
over what hardware everyone is using.  And those weird systems with
more physical than virtual memory are still around. :(

Jörn

-- 
Don't patch bad code, rewrite it.
-- Kernigham and Pike, according to Rusty
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: LZMA inclusion

2008-12-08 Thread Jörn Engel
On Mon, 8 December 2008 21:47:37 +, Phillip Lougher wrote:
 
 Yes, I'm aware of the issues with vmalloc on older hardware.

It's not even limited to older hardware.  Blue Gene supercomputers are
large clusters of ppc440 machines.  Iirc each node consists of two 32bit
cpus and up to 4GB of RAM.  Not likely to run squashfs, but hardly old
hardware either.

Or for a living room example, take a barebone with a VIA C7.  And maybe
fairly soon a large number of mobile phones will have close to 4GB RAM,
yet still run on 32bit ARM processors.  I fear those troubles are far
from gone.

Jörn

-- 
All art is but imitation of nature.
-- Lucius Annaeus Seneca
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 10/16] Squashfs: cache operations

2008-10-31 Thread Jörn Engel
On Fri, 31 October 2008 04:43:46 +, Phillip Lougher wrote:
 
 Simplicity and speed is extremely important.  The 
 squashfs_metadata_read() wrapper around the cache is designed to step 
 through the metadata a structure at a time (20 or so bytes), updating 
 the read position in the metadata each call, with more metadata cache 
 blocks being read and decompressed as necessary.  The common case where 
 the metadata is already in the cache (because we're stepping through it 
 20 or so bytes at a time), is designed to be extremely fast - a spinlock 
 and array search only.  I recently optimised the cache to use spinlocks 
 rather than mutexes and reduced the lock holding time (necessary to move 
 to spinlocks), and this resulted in a 20%+ speed improvement in reading 
 squashfs filesystems.

For the page cache, you can use read_cache_page() and
page_cache_release().  This does not even take a spinlock, hence
multiple readers can work in parallel.  The radix tree walk will take a
similar time to your array search.

You are aware that multiple cpus will play pingpong with your spinlock?

 Given the above using an address space in the page cache will result in 
 greater complexity, more memory overhead, and much slower operation. 
 There's a number of reasons for this.
 
 1. The amount and life-span of the data stored in the page cache is 
 outside of Squashfs' control.  As explained above it only makes sense to 
 temporarily cache the last couple of metadata and fragment blocks. 
 Using the page cache (if a 'large' address space is used) for these 
 keeps more of them around for longer than necessary, and will 
 potentially cause more worthy datablocks to be flushed on memory pressure.

I personally find it hard to guess access patterns.  If I constructed a
workload just large enough that your cache is slightly too small, it
will start thrashing.  Hit rates will go close to zero.  And unless I
missed something, such a workload can be constructed and may even be
used in real life.  With growing numbers of cores, this becomes
increasingly likely.

In other cases, an idle squashfs will still hold the 64k of unused cache
for ransom.  By giving up control, you allow your cache to grow and
shrink as desired and can avoid both cases.

 2. The address space will be caching uncompressed data, the squashfs 
 references to this data are the compressed locations within the 
 filesystem.  There doesn't exist a one-to-one linear mapping from 
 compressed location to decompressed location in the address space.  This 
 means a lookup table still needs to be used to store the mapping from 
 compressed location to decompressed location in the address space.  Now 
 this lookup table (however implemented) is itself at least as complex as 
 my current cache implementation.

You are currently using physical offset of the medium to address blocks
in the cache, which are 64bit.  Page cache may use 32bit to address
pages.  And since blocks can be larger than pages, you effectively need
some bits extra.  This is indeed a problem.

 3. Once the locations of the decompressed pages in the address space 
 have been found, they'll need to be looked up in the page cache, and 
 this has to be done for every 4K page.  With the default fragment size 
 of 128 KiB this means 32 separate lookups.  Somewhat slower than one 
 spinlock and array search per 128 KiB block in the squashfs cache 
 implementation.

Above you claimed the complete cache to be just 64k in size.  How can
you access 128k blocks that way?  One of the numbers appears wrong.

Ignoring that, you don't need to take either the spinlock or do a lookup
in a fast path.  If you currently have this code:

for (some condition) {
err = squashfs_read_metadata(address, ...);
}

You can transform it to:

void *handle = squashfs_get_metadata(address, ...);
for (some condition) {
err = squashfs_read_metadata(handle, address, ...);
}
squashfs_put_metadata(handle);

With the handle you can keep a reference count to your cached object.
squashfs_read_metadata() only has to call squashfs_cache_get() or
squashfs_cache_put() when moving across object boundaries.  In the
common case it simply returns data from the object referenced through
the handle.

This might be a worthwhile optimization independently of whether you use
the page cache or not.

 Comments, especially those of the form you've got this completely 
 wrong, and you can use the page cache like this, which will be simpler 
 and faster than your current implementation welcome :)  I'm not adverse 
  to using the page cache, but I can't see how it will be simpler or 
 faster than the current implementation.

Only one of your problems seems to be real.  Not sure if or how we can
solve that one, though.

Jörn

-- 
Security vulnerabilities are here to stay.
-- Scott Culp, Manager of the Microsoft Security Response Center, 2001
--
To unsubscribe from this list: 

Re: [PATCH V2 10/16] Squashfs: cache operations

2008-10-29 Thread Jörn Engel
On Wed, 29 October 2008 01:49:56 +, Phillip Lougher wrote:
 +/*
 + * Blocks in Squashfs are compressed.  To avoid repeatedly decompressing
 + * recently accessed data Squashfs uses two small metadata and fragment 
 caches.
 + *
 + * This file implements a generic cache implementation used for both caches,
 + * plus functions layered ontop of the generic cache implementation to
 + * access the metadata and fragment caches.
 + */

I tend to agree with Andrew that a lot of this should be done by the
page cache instead.  One of the problems seems to be that your blocksize
can exceed page size and there really isn't any infrastructure to deal
with such cases yet.  Bufferheads deal with blocks smaller than a page,
not the other way around.

Another is that address spaces are limited ot 16TB on 32bit
architectures.  I guess that should be good enough for a while.  I've
heard some rumors that btrfs actually uses multiple address spaces to
handle this problem, so a good strategy may be to sit back, wait and
simply copy what btrfs does once the issue becomes pressing.

To deal with large blocks, you most likely want to keep you struct
squashfs_cache_entry around and have page-private point to it.  But be
warned, as the whole page-private business is - shall we say - fragile.
You need to supply a number of methods to make things work.  And if you
fail to set any one of them, core code will assume a default, which is
to call into the bufferhead code.  And the backtrace you will receive
some time later has little or no indication that you actually only
missed one method.  I've been meaning to clean this up, but never found
the courage to actually do it.

 +/*
 + * Look-up block in cache, and increment usage count.  If not in cache, read
 + * and decompress it from disk.
 + */
 +struct squashfs_cache_entry *squashfs_cache_get(struct super_block *sb,
 + struct squashfs_cache *cache, long long block, int length)

I personally prefer u64 instead of long long.  It is a device address
for a 64bit filesystem after all.  Same for next_index.

 + if (i == cache-entries) {
 + /*
 +  * Block not in cache, if all cache entries are locked
 +  * go to sleep waiting for one to become available.
 +  */
 + if (cache-unused == 0) {
 + cache-waiting++;
 + spin_unlock(cache-lock);
 + wait_event(cache-wait_queue, cache-unused);
 + spin_lock(cache-lock);
 + cache-waiting--;

Maybe rename to no_waiters?  waiting looks more like a boolean.

 + entry-length = squashfs_read_data(sb, entry-data,
 + block, length, entry-next_index,
 + cache-block_size);
 +
 + spin_lock(cache-lock);
 +
 + if (entry-length  0)
 + entry-error = entry-length;

entry-error is of type char.  We actually have errno's defined up to
131, so if by whatever freak chance the error is -ENOTRECOVERABLE, this
will convert it to a positive number.  I wouldn't want to debug that.

 +void squashfs_cache_put(struct squashfs_cache *cache,
 + struct squashfs_cache_entry *entry)
 +{
 + spin_lock(cache-lock);
 + entry-locked--;
 + if (entry-locked == 0) {

You might want to rename this to refcount, just to make the name match
the behaviour.

Jörn

-- 
Measure. Don't tune for speed until you've measured, and even then
don't unless one part of the code overwhelms the rest.
-- Rob Pike
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Subject: [PATCH 01/16] Squashfs: inode operations

2008-10-21 Thread Jörn Engel
On Tue, 21 October 2008 12:14:26 -0400, David P. Quigley wrote:
 On Fri, 2008-10-17 at 18:53 +0200, Jörn Engel wrote:
  None of the comments below are a reason against mainline inclusion, imo.
  They should get handled, but whether that happens before or after a
  merge doesn't really matter.
  
  On Fri, 17 October 2008 16:42:50 +0100, Phillip Lougher wrote:
   
   +#include linux/squashfs_fs.h
   +#include linux/squashfs_fs_sb.h
   +#include linux/squashfs_fs_i.h
  
  Current verdict seems to be that these files should live in fs/squashfs/,
  not include/linux/.  No kernel code beside squashfs needs the headers
  and userspace tools should have a private copy.
  
 [Snip]
 
 I looked at where filesystems such as ext3 store these and it seems to
 be in include/linux. I'm assuming this is because usespace utilities
 like fsck need them. It seems wrong for userspace tools to have their
 own private copy since you can potentially have them out of sync with
 the kernel you are running and it provides more chance for you
 forgetting to update a structure somewhere. 

Existing headers remain where they are.  New headers are supposed to
go... or at least that's what I was told to do.

And being out of sync is definitely not an argument you can use with a
filesystem.  The data on your disk doesn't magically change when you
upgrade a kernel.  Nor can you assume that any given filesystem is
accessed only by Linux.  If you change the format, then locating
external copies of the header will be the least of your problems.

Jörn

-- 
Do not stop an army on its way home.
-- Sun Tzu
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Subject: [PATCH 01/16] Squashfs: inode operations

2008-10-17 Thread Jörn Engel
None of the comments below are a reason against mainline inclusion, imo.
They should get handled, but whether that happens before or after a
merge doesn't really matter.

On Fri, 17 October 2008 16:42:50 +0100, Phillip Lougher wrote:
 
 +#include linux/squashfs_fs.h
 +#include linux/squashfs_fs_sb.h
 +#include linux/squashfs_fs_i.h

Current verdict seems to be that these files should live in fs/squashfs/,
not include/linux/.  No kernel code beside squashfs needs the headers
and userspace tools should have a private copy.

 +static int squashfs_new_inode(struct super_block *s, struct inode *i,
 + struct squashfs_base_inode *inodeb)
 +{
 + if (squashfs_get_id(s, le16_to_cpu(inodeb-uid), i-i_uid) == 0)
 + goto out;
 + if (squashfs_get_id(s, le16_to_cpu(inodeb-guid), i-i_gid) == 0)
 + goto out;
 +
 + i-i_ino = le32_to_cpu(inodeb-inode_number);
 + i-i_mtime.tv_sec = le32_to_cpu(inodeb-mtime);
 + i-i_atime.tv_sec = i-i_mtime.tv_sec;
 + i-i_ctime.tv_sec = i-i_mtime.tv_sec;
 + i-i_mode = le16_to_cpu(inodeb-mode);
 + i-i_size = 0;
 +
 + return 1;
 +
 +out:
 + return 0;
 +}

Most code uses sb and inode, which I consider easier to read - if
only for consistency.

 +int squashfs_read_inode(struct inode *i, long long inode)

Is your long long inode what most filesystems call inode-i_ino?  It
seems to be.

 + if (squashfs_new_inode(s, i, inodeb) == 0)
 + goto failed_read;

Most linux functions return 0 on success and -ESOMETHING on error.  You
return 0 on error and 1 on success.  That makes it likely for someone
else to do something like

err = squashfs_foo(bar);
if (err)
goto fail;

Oops.

Jörn

-- 
Measure. Don't tune for speed until you've measured, and even then
don't unless one part of the code overwhelms the rest.
-- Rob Pike
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Subject: [PATCH 00/16] Squashfs: compressed read-only filesystem

2008-10-17 Thread Jörn Engel
On Fri, 17 October 2008 16:42:50 +0100, Phillip Lougher wrote:
 
 Codewise all of the packed bit-fields and the swap macros have been removed in
 favour of aligned structures and in-line swapping using leXX_to_cpu().  The
 code has also been extensively restructured, reformatted to kernel coding
 standards and commented.

Excellent!  The data structures look good and I don't see a reason for
another format change.  Which means the main reason against merging the
code has gone.  Your style differs from other kernel code and in a
number of cases it would be nice to be more consistent with existing
conventions.  It would certainly help others when reading the code.  And
of course, one way to do so it to just merge and wait for some janitors
to notice squashfs and send patches. :)

I have to admit I am scared of this function:
+int squashfs_read_metadata(struct super_block *s, void *buffer,
+   long long block, unsigned int offset,
+   int length, long long *next_block,
+   unsigned int *next_offset)

It takes seven parameters, five of which look deceptively similar to me.
Almost every time I see a call to this function, my mind goes blank.

There must be some way to make this function a bit more agreeable.  One
option is to fuse the block and offset parameters into a struct and
just pass two sets of this struct.  Another would be to combine the two
sets of addresses into a single one.  A quick look at some of the
callers shows seems to favor that approach.

squashfs_read_metadata(..., block, offset, ..., block, offset)
Could become
squashfs_read_metadata(..., block, offset, ...)

But again, such a change is no showstopper for mainline inclusion.

 Anyway that's my case for inclusion.  If any readers want Squashfs
 mainlined it's probably now a good time to offer support!

Please no.  A large amount of popular support would only bring you into
the reiser4 league.  Bad arguments don't improve when repeated.

Support in the form of patches would be a different matter, though.

Jörn

-- 
Mac is for working,
Linux is for Networking,
Windows is for Solitaire!
-- stolen from dc
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC - size tool for kernel build system

2008-10-09 Thread Jörn Engel
On Thu, 9 October 2008 18:21:51 +0300, Adrian Bunk wrote:
 
 The building blocks that would be useful are IMHO:
 - a make target that generates a report for one kernel
   (like the checkstack or export_report targets)
 - a script that compares two such reports and outputs the
   size differences
 
 That's also easy to do, and if that's what's wanted I can send a patch 
 that does it.
 
 Everything else is IMHO overdesigned.

Please do.

 The real problem is that dumping some scripts into the kernel sources 
 or publishing some data on a webpage doesn't make people use them.
 
 Like if you run make checkstack on the kernel today you can see that 
 drivers allocate arrays  1 kB on the stack despite checkstack being 
 available...

Funny you should mention that.  Yesterday I noticed that make checkstack
had been ported to five more architectures since my last look at the
code.  It doesn't seem likely that those ports were required by some
pointy-haired boss for feature-completeness.  Someone must actually be
using it.

The very beauty of make checkstack is that you don't even notice whether
it is being used or not.  You point to some drivers that apparently
didn't use it, which is fine.  But how many drivers _did_ use it?  How
many problems have been solved before the patches have ever been posted
for review?  Noone knows.  And that is a good thing.  We want the
problems to get solved and not become visible in the first place.

Bloatwatch imo has the design flaw that it is a central tool hosted on
some server somewhere and only documents the damage once it has
happened.  It would be much better if every developer could run
something simple locally and clean up the mess before anyone else
notices.

I partially agree with you in one point.  It would be even better if
checkstack, bloatcheck, etc. were run automatically on every kernel
compile and developers were forced to look at any problems that come up.
But that would increase compile time, which is bad.  So there needs to
be an off button as well, as there is for sparse - where off is the
default.

Jörn

-- 
But this is not to say that the main benefit of Linux and other GPL
software is lower-cost. Control is the main benefit--cost is secondary.
-- Bruce Perens
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] AXFS: Advanced XIP filesystem

2008-09-02 Thread Jörn Engel
On Tue, 2 September 2008 09:44:19 -0700, Jared Hulbert wrote:
 
 How is one expected to read those last 4 bytes of a loopbacked file?
 Are they unreadable?  We can add the padding.   I am just wondering if
 this is a bug or a known limitation in the loopback handling or if
 there is a different safer way of reading block devs with truncated
 last blocks.

Can't you just include the final magic into the last block, thereby
making the size a clean multiple of 4k?  It looks as if you have some
padding before the magic anyway.  So you just have to make sure the
padding is at least 4 bytes and write the magic to the end of it.  Apart
from solving this bug, it should also save you some space. ;)


Jörn

-- 
Invincibility is in oneself, vulnerability is in the opponent.
-- Sun Tzu
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendation for activating a deferred module init in the kernel

2008-06-17 Thread Jörn Engel
On Tue, 17 June 2008 11:23:18 -0700, Tim Bird wrote:
 
 I'm not that happy using an ioctl for this trigger.  What is
 the preferred method of activating a kernel feature like this?
 I presume something in /proc or /sys, but I'm not sure.

I personally would be unhappy with any kind of interface for this.  It
would be much nicer to make it transparent and still get the benefits.
One option would be to start a kernel thread for the initialization and
renice it to 19 or so.

If you want an explicit trigger, you could either hook into init_post()
or have hooks in the open functions of drivers with deferred
initialization.  Obviously you need to wait for completion here anyway,
so adding a trigger wouldn't be too expensive.

Jörn

-- 
Joern's library part 13:
http://www.chip-architect.com/
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendation for activating a deferred module init in the kernel

2008-06-17 Thread Jörn Engel
On Tue, 17 June 2008 12:52:22 -0700, Tim Bird wrote:
 Jörn Engel wrote:
  On Tue, 17 June 2008 11:23:18 -0700, Tim Bird wrote:
  I'm not that happy using an ioctl for this trigger.  What is
  the preferred method of activating a kernel feature like this?
  I presume something in /proc or /sys, but I'm not sure.
  
  I personally would be unhappy with any kind of interface for this.  It
  would be much nicer to make it transparent and still get the benefits.
  One option would be to start a kernel thread for the initialization and
  renice it to 19 or so.
 
 That's an interesting idea. I'm pretty sure the product guys want
 an explicit trigger, so they can make sure they've got the main
 application well underway before this deferred initialization occurs.

Well, there should be a way to ensure this doesn't hog the cpu at all -
unless it is idle or someone is actually waiting for the initialization
to finish.  Not sure if nice 19 is good enough for that.

  If you want an explicit trigger, you could either hook into init_post()
  or have hooks in the open functions of drivers with deferred
  initialization.
 
 This would presumably require multiple calls (one to the open of
 each deferred module).  I would still need a trigger for the memory
 free operation, unless I hardcode the order of the opening and just
 know that the last one should free the memory.  I'll have to see
 if all the modules being loaded like this have open()s.

If you want to keep things simple - and I believe initially you should -
you can simply do all initializations in one go.  Something like this:

int foo_open(...)
{
wait_for_deferred_init();
...
}

static DECLARE_COMPLETION(init_complete);

void wait_for_deferred_init(void)
{
static atomic_t in_progress = ATOMIC_INIT(-1);

if (!atomic_inc_not_zero(in_progress) {
wait_for_completion(init_complete);
return;
}

for (all deferred initcalls)
foo_init();

complete(init_complete);
free_memory();
}

Jörn

-- 
Anything that can go wrong, will.
-- Finagle's Law
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Recommendation for activating a deferred module init in the kernel

2008-06-17 Thread Jörn Engel
On Tue, 17 June 2008 12:55:31 -0700, Tim Bird wrote:
 
 Sorry - I responded too quickly.  I'm not sure I follow the
 original suggestion.  How would I call the open function of
 a module that is not initialized yet?

Hmm, good point.  I guess that suggestion has just failed the reality
test.

Jörn

-- 
You ain't got no problem, Jules. I'm on the motherfucker. Go back in
there, chill them niggers out and wait for the Wolf, who should be
coming directly.
-- Marsellus Wallace
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Not as much ccache win as I expected

2008-06-15 Thread Jörn Engel
On Fri, 13 June 2008 14:10:29 -0700, Tim Bird wrote:
 
 Maybe I should just be grateful for any ccache hits I get.

ccache's usefulness depends on your workload.  If you make a change to
include/linux/fs.h, close to 100% of the kernel is rebuilt, with or
without ccache.  But when you revert that change, the build time differs
dramatically.  Without ccache, fs.h was simply changed again and
everything is rebuild.  With ccache, there are hits for the old version
and all is pulled from the cache - provided you have allotted enough
disk for it.

If you never revert to an old version or do some equivalent operation,
ccache can even be a net loss.  On a fast machine, the additional disk
accesses are easily more expensive than the minimal cpu gains.

Jörn

-- 
Public Domain  - Free as in Beer
General Public - Free as in Speech
BSD License- Free as in Enterprise
Shared Source  - Free as in Work will make you...
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] add diffconfig utility

2008-06-10 Thread Jörn Engel
On Tue, 10 June 2008 12:41:54 -0700, Tim Bird wrote:
 Delivery-date: Tue, 10 Jun 2008 21:44:08 +0200
 From: Tim Bird [EMAIL PROTECTED]
 To: linux-embedded linux-embedded@vger.kernel.org
 CC: linux kernel [EMAIL PROTECTED]
 Subject: [PATCH] add diffconfig utility

Neat.  But I have one nagging question: who do you expect to merge this
patch? ;)

Jörn

-- 
Premature optimization is the root of all evil.
-- Donald Knuth
--
To unsubscribe from this list: send the line unsubscribe linux-embedded in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html