Re: [Unionfs] SMP machines+acpi = segfaults

Tomas M Mon, 13 Feb 2006 05:04:01 -0800

I don't know anything about the accuracy of the log (trace) I sent here,

I just did "dmesg" that's all, it's real. I don't know what's going oninside the kernel.

I don't understand these things, and i don't claim it's anyones fault. :-)


I'm just working in a "strange" environment where some problems occur
even if I don't experience these problems in my installed system (running on
the same hardware with the same kernel).

Today I tested the kernel with ACPI compiled in, but SMP turned off,
and it's working like a charm even in my chrooted env.

To Phillip: i'm using squashfs 3.0pre from CVS (yesterday).
The CVS version is _not_ proved by many years of using :)
But again, i don't claim it's anyones fault.

I will try more combinations (older unionfs and squashfs versions)
and I will post my results here.


Tomas


Phillip Lougher wrote:

There are a large number of things wrong with that stack trace.Firstly, it is very deep, secondly, the sequence of function calls are

impossible.

The stack trace starts with a vfs read, which continues through to
page cache readahead.  This should enter Squashfs at
squashfs_readpage.  It however, enters Squashfs at squashfs_lookup
which is part of the directory entry lookup, i.e. wrong.  The trace
then shows a squashfs_readpage call which is right, but which,
however, will never been called from squashfs_lookup.  This then
proceeds to return to squashfs_lookup, which creates a new inode which
proceeds to read a fragment where it segfaults inside zlib
decompression.  Again, fragments are never read as part of inode
lookup.

This is a nonsense stack trace which appears to consist of two
separate stack traces merged into one.

In addition, this and the previous stack trace shows the kernel
segfaulting in different kernel routines called from Squashfs in
different places and different circumstances.  Either there are
multiple Squashfs bugs which have never shown up before, or Squashfs
is running in a corrupt environment.  Experience has shown this is

normally because multiple threads are trampling over the same memory.There is, obviously, something wrong here but I feel it is something

more fundamental then a bug in Squashfs which has never shown up
before.

A final point is to mention that what you're doing is reading
applications from a Squashfs filesystem off a liveCD.  In such
circumstances Squashfs will obviously show up (along with unionfs),
because it is the filesystem satisfying the filedata.  The fact is
does show up in the stack traces is no indication it is causing the
problem.

So far I've seen nothing that convinces me Squashfs is bugged, in fact
everything I've seen indicates a severely mucked up kernel.  If there
was something that pointed to a Squashfs bug I could do something
about it, however I don't think that is the case.

Phillip



On 2/12/06, Tomas M <[EMAIL PROTECTED]> wrote:

Well, I was referencing different dmesg logs then the one sent to this
mailing list,
and there were a lot of "squashfs" strings in the kernel oops :)
Please see new attachment now with mentioned dmesg output.

When I compile the kernel without ACPI at all, then it doesn't behaves
like a SMP kernel - only one processor is shown in /proc/cpuinfo
and the system works perfectly.
It seems to me like it's not even possible to compile kernel with SMP
support
without ACPI, but I don't understand these things much, I am sorry.

Please note that these logs are produced in a LiveCD environment,
in other words in chrooted squashfs+unionfs, this is sometimes
very strange combination.

Tomas M



Phillip Lougher wrote:

On 2/12/06, Tomas M <[EMAIL PROTECTED]> wrote:

I examined the dmesg again,
and it seems to be related rather to squashfs then unionfs...
So I would like to apologize, for bothering you in wrong mailing list :)

Unfortunately, the dmesg is very inconclusive as to what is going
wrong.  The kernel isn't panicing (in the one dmesg given in this
thread) in SquashFS, but in the lower level block I/O handling code.
Again, the SquashFS error messages you reported (off the mailing
list), are generated when the block I/O system has reported errors.
Given this, I'm inclined to believe the bugs are caused by the kernel
operating unreliably because of the previously mentioned ACPI hardware
issues.

You could try building a non-ACPI SMP kernel to prove it is the ACPI
settings causing the problem.  SquashFS has been around for quite a
while and I've had no SMP problems reported for many years, and so I
believe the locking to be pretty stable.  if there was problems with
SMP I would not expect the erorr messages you're getting and quite
different behaviour.

Regards

Phillip Lougher

I'm sorry, next time I will be more careful.


Tomas M


Josef Sipek wrote:

On Sat, Feb 11, 2006 at 06:38:03PM +0100, Tomas M wrote:

When I use unionfs in my Linux notebook, everything is OK.  But when I
use the same unionfs.ko module on the P4 machine (with the same
kernel), it causes many SEGFAULTs etc.

What does dmesg say?

Then it works flawlessly. Maybe it's only a bug in Linux kernel ACPI
or SMP code,

ACPI is notorious for being mis-implemented by the hardware
manufacturers, kernel developers can't really do anything about that,
except blacklist the worst cases, and work around the better ones.

nevertheless I would like to ask a general question: Is
it any special case of usage to run unionsf on multiprocessor
machines?  In other words, do you, developers, think about
multiprocessor configuration while developing unionfs? Is there any
difference?

I, personally, try to think about concurency issues whenever I can. As
you may have noticed unionfs code wasn't always written with it in mind
(for example, branch management is currently quite racy.) This, I think,
is because UnionFS was developed very quickly on single processor
systems running 2.4 kernels (which do not have preemption.) 2.6's
preemption, and now the easy access to "SMP" systems is making many
kernel bugs appear seemingly out of nowhere. Bugs which for long time
sat in the code, waiting for the right sequence of circumstances to
occur.

Basically I don't depend on the answer, just would like to bring this
to your attention.

Thank you. I'm sure that once we submit the code for review for eventual
inclusion into vanilla kernel, we'll get plenty of comments about locking
:)

Jeff.
_______________________________________________
unionfs mailing list
[email protected]
http://www.fsl.cs.sunysb.edu/mailman/listinfo/unionfs

_______________________________________________
unionfs mailing list
[email protected]
http://www.fsl.cs.sunysb.edu/mailman/listinfo/unionfs

Unable to handle kernel NULL pointer dereference at virtual address 00000014
 printing eip:
c0272be5
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: nls_iso8859_2 nls_iso8859_1 nls_cp437 unionfs squashfs
CPU:    0
EIP:    0060:[<c0272be5>]    Not tainted VLI
EFLAGS: 00010246   (2.6.15.4)
EIP is at zlib_inflate+0x25e/0x3a5
eax: 00000000   ebx: f8ce9580   ecx: 00000004   edx: f8ce9580
esi: 0000001f   edi: fffffffb   ebp: 00000001   esp: f56c59e0
ds: 007b   es: 007b   ss: 0068
Process startkde (pid: 3863, threadinfo=f56c4000 task=f7db4030)
Stack: 000000da 00000005 000000c6 f48544c6 f7336239 f56c5b4c f8ce0482 c02727a4
       f4840788 f48407bc f48407f0 f4840824 f48406b8 f48406ec f4840720 f4840754
       f48405e8 f484061c f4840650 f4840684 f4840518 f484054c f4840580 f48405b4
Call Trace:
 [<f8ce0482>] squashfs_read_data+0x30e/0x3c6 [squashfs]
 [<c02727a4>] zlib_inflateEnd+0x29/0x34
 [<f8ce0984>] get_fragment_location+0xf8/0x12a [squashfs]
 [<f8ce4f9c>] squashfs_alloc_inode+0xf/0x1b [squashfs]
 [<c015e696>] alloc_inode+0x12/0x13b
 [<c015ef0e>] new_inode+0x24/0x7f
 [<f8ce0c19>] squashfs_new_inode+0x10/0x82 [squashfs]
 [<c015f563>] __insert_inode_hash+0x43/0x5e
 [<f8ce2260>] squashfs_iget+0x15d5/0x15fe [squashfs]
 [<f8ce4dc4>] squashfs_lookup+0x381/0x45c [squashfs]
 [<f8ce0b7d>] get_cached_fragment+0x18d/0x219 [squashfs]
 [<c0135bb2>] get_page_from_freelist+0x72/0x89
 [<f8ce3c53>] squashfs_readpage+0x202/0x4ae [squashfs]
 [<c015dfe5>] d_rehash+0x4d/0x5d
 [<f8ce4e92>] squashfs_lookup+0x44f/0x45c [squashfs]
 [<c013551e>] prep_new_page+0x5a/0x61
 [<c0135a11>] buffered_rmqueue+0x141/0x1d1
 [<c0135bb2>] get_page_from_freelist+0x72/0x89
 [<c026e0a4>] radix_tree_node_alloc+0x10/0x49
 [<c026e24e>] radix_tree_insert+0x5e/0xe7
 [<c0132075>] add_to_page_cache+0x2f/0x94
 [<c0137b5b>] read_pages+0x8d/0xd0
 [<c0135bb2>] get_page_from_freelist+0x72/0x89
 [<c0135c10>] __alloc_pages+0x47/0x259
 [<c0137cae>] __do_page_cache_readahead+0x110/0x12e
 [<c0137dbf>] blockable_page_cache_readahead+0x46/0x97
 [<c0137f28>] page_cache_readahead+0x7f/0x115
 [<c0132624>] do_generic_mapping_read+0x11d/0x3ce
 [<c0132b34>] __generic_file_aio_read+0x195/0x1c7
 [<c01328d5>] file_read_actor+0x0/0xca
 [<c0132bad>] generic_file_read+0x0/0xab
 [<c0132c41>] generic_file_read+0x94/0xab
 [<c0462815>] schedule+0x9e1/0xa2c
 [<c012966a>] autoremove_wake_function+0x0/0x2d
 [<c04628e1>] wait_for_completion+0x81/0xae
 [<f8e98fc1>] unionfs_read+0x71/0xd0 [unionfs]
 [<c014a49d>] vfs_read+0x85/0x122
 [<c0152adf>] kernel_read+0x31/0x3b
 [<c01534d5>] prepare_binprm+0xbf/0xc8
 [<c0153885>] do_execve+0xeb/0x1c4
 [<c01015f3>] sys_execve+0x2b/0x6d
 [<c01026e9>] syscall_call+0x7/0xb
Code: 00 0d 00 00 00 8b 43 1c c7 40 04 00 00 00 00 e9 f4 fd ff ff 85 ed 75 02 89 fd 
83 fd 01 0f 85 23 01 00 00 8b 43 1c 8d 48 04 89 da <8b> 40 14 e8 bb e5 ff ff 8b 
43 1c 83 78 0c 00 89 fd 74 0b c7 00


_______________________________________________
unionfs mailing list
[email protected]
http://www.fsl.cs.sunysb.edu/mailman/listinfo/unionfs

_______________________________________________
unionfs mailing list
[email protected]
http://www.fsl.cs.sunysb.edu/mailman/listinfo/unionfs

Re: [Unionfs] SMP machines+acpi = segfaults

Reply via email to