Hi all, I could find out the reason for my earlier posting which said that ufs_dirremove panicked saying that namelen==0.
We have an HA application which reads directory contents on one machine and sends them to the other one. The second machine compares the directory contents sent by the first one with those on present machine. The data retrieved from other machine is getting retrieved wrong because of which the panic occurred. When there are small number of directories on the first machine everything works fine, where as if the number is around 2000 there is this problem of not getting correct data. Everything with the same code works on sparc platform. This problem is only on Intel architecture. Can you suggest me as to what areas should I look into to solve this issue. Thanks Priya -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, January 03, 2008 3:06 PM To: Vamsee Priya Cc: [EMAIL PROTECTED]; [email protected] Subject: RE: [osol-discuss] SIGSEGV inlibc.so.1`_malloc_unlockedonSolarisx86machine Try to run this on OpenSolaris, not on something older. The advantages are: - the failure mode below doesn't exist in OpenSolaris (check the code - you won't find that ufs_fault call anymore) - you can DTrace on function arguments easily (ok, that's on S10 as well) - you get function arguments even in a kernel crashdump just by "<frameptr>$C". For S10, the strategy how to pry func arguments out of kernel stacks is outlined in this piece: http://opensolaris.org/os/community/documentation/files/book.pdf Read chapters 3 and the examples 6/7. Best wishes, happy new year ! FrankH. On Thu, 3 Jan 2008, Vamsee Priya wrote: > Hi > > Thanks a lot for your help....I could find the bug in my program....I > corrected one of the data types and everything worked fine.... > I have a kernel module which uses this user program...I am getting a > panic with the following stack trace. > > Jan 3 10:42:16 upsuite1 genunix: [ID 938853 kern.notice] ufs_dirremove: > namlen == 0 > Jan 3 10:42:16 upsuite1 genunix: [ID 938853 kern.notice] ufs_dirremove: > namlen == 0 > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe8000851770 genunix:vcmn_err+13 (fffffe80008517a0, ffffffff8) > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe80008517a0 ufs:real_panic_v+120 () > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe80008517f0 ufs:ufs_fault_v+b6 () > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe80008518d0 ufs:ufs_fault+9b () > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe80008519a0 ufs:ufs_dirremove+245 () > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe8000851a10 ufs:ufs_rmdir+ad () > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe8000851a20 genunix:fop_rmdir+e () > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe8000851a20 genunix:fop_rmdir+e () > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe8000851ae0 ipfs:ipfs_lose+36d () > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe8000851de0 ipfs:ipfs_ioctl+2075 () > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe8000851df0 genunix:fop_ioctl+b () > Jan 3 10:42:16 upsuite1 genunix: [ID 655072 kern.notice] > fffffe8000851ed0 genunix:ioctl+ac () > > When does name length for ufs_rmdir comes as zero? I tried to print in > some statements to get what is the actual name and length. But I don't > get them printed.... > > > Thanks > Priya > > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf > Of [EMAIL PROTECTED] > Sent: Thursday, December 27, 2007 2:48 PM > To: Vamsee Priya > Cc: [EMAIL PROTECTED]; [email protected] > Subject: Re: [osol-discuss] SIGSEGV in > libc.so.1`_malloc_unlockedonSolarisx86machine > > >> Hi >> >> I have tried LD_PRELOAD and UMEM_DEBUG with my program on Sparc. >> Everything worked. I also am unable to find any bug in my program. >> >> No clue as to who is the culprit.. > > You will need to go over your code and check it carefully. > Something is copying a few extra bytes into a structure. > > (Note that structures aligments and sizes are different in x86 > (smaller) and that therefor overruns which happen on x86 may not happen > on > SPARC. > > Casper > > > > _______________________________________________ opensolaris-discuss mailing list [email protected]
