Hi all,

I could find out the reason for my earlier posting which said that
ufs_dirremove panicked saying that namelen==0. 

We have an HA application which reads directory contents on one machine
and sends them to the other one. The second machine compares the
directory contents sent by the first one with those on present machine.
The data retrieved from other machine is getting retrieved wrong because
of which the panic occurred. 

When there are small number of directories on the first machine
everything works fine, where as if the number is around 2000 there is
this problem of not getting correct data. Everything with the same code
works on sparc platform. This problem is only on Intel architecture.

Can you suggest me as to what areas should I look into to solve this
issue.


Thanks
Priya

-----Original Message-----
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: Thursday, January 03, 2008 3:06 PM
To: Vamsee Priya
Cc: [EMAIL PROTECTED]; [email protected]
Subject: RE: [osol-discuss] SIGSEGV
inlibc.so.1`_malloc_unlockedonSolarisx86machine


Try to run this on OpenSolaris, not on something older.
The advantages are:

        - the failure mode below doesn't exist in OpenSolaris
          (check the code - you won't find that ufs_fault call anymore)

        - you can DTrace on function arguments easily (ok, that's on
          S10 as well)

        - you get function arguments even in a kernel crashdump just
          by "<frameptr>$C".

For S10, the strategy how to pry func arguments out of kernel stacks is 
outlined in this piece:

http://opensolaris.org/os/community/documentation/files/book.pdf

Read chapters 3 and the examples 6/7.


Best wishes,
happy new year !
FrankH.



On Thu, 3 Jan 2008, Vamsee Priya wrote:

> Hi
>
> Thanks a lot for your help....I could find the bug in my program....I
> corrected one of the data types and everything worked fine....
> I have a kernel module which uses this user program...I am getting a
> panic with the following stack trace.
>
> Jan  3 10:42:16 upsuite1 genunix: [ID 938853 kern.notice]
ufs_dirremove:
> namlen == 0
> Jan  3 10:42:16 upsuite1 genunix: [ID 938853 kern.notice]
ufs_dirremove:
> namlen == 0
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe8000851770 genunix:vcmn_err+13 (fffffe80008517a0, ffffffff8)
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe80008517a0 ufs:real_panic_v+120 ()
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe80008517f0 ufs:ufs_fault_v+b6 ()
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe80008518d0 ufs:ufs_fault+9b ()
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe80008519a0 ufs:ufs_dirremove+245 ()
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe8000851a10 ufs:ufs_rmdir+ad ()
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe8000851a20 genunix:fop_rmdir+e ()
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe8000851a20 genunix:fop_rmdir+e ()
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe8000851ae0 ipfs:ipfs_lose+36d ()
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe8000851de0 ipfs:ipfs_ioctl+2075 ()
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe8000851df0 genunix:fop_ioctl+b ()
> Jan  3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice]
> fffffe8000851ed0 genunix:ioctl+ac ()
>
> When does name length for ufs_rmdir comes as zero? I tried to print in
> some statements to get what is the actual name and length. But I don't
> get them printed....
>
>
> Thanks
> Priya
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf
> Of [EMAIL PROTECTED]
> Sent: Thursday, December 27, 2007 2:48 PM
> To: Vamsee Priya
> Cc: [EMAIL PROTECTED]; [email protected]
> Subject: Re: [osol-discuss] SIGSEGV in
> libc.so.1`_malloc_unlockedonSolarisx86machine
>
>
>> Hi
>>
>> I have tried LD_PRELOAD and UMEM_DEBUG with my program on Sparc.
>> Everything worked. I also am unable to find any bug in my program.
>>
>> No clue as to who is the culprit..
>
> You will need to go over your code and check it carefully.
> Something is copying a few extra bytes into a structure.
>
> (Note that structures aligments and sizes are different in x86
> (smaller) and that therefor overruns which happen on x86 may not
happen
> on
> SPARC.
>
> Casper
>
>
>
>


_______________________________________________
opensolaris-discuss mailing list
[email protected]

Reply via email to