NFS Locking Problems (Was: Re: I'm impressed, but ...)
On Saturday 07 December 2002 05:40 pm, Philip Paeps wrote: On 2002-12-07 23:10:18 (+0100), Cliff Sarginson [EMAIL PROTECTED] wrote: On Sat, Dec 07, 2002 at 08:41:35PM +0100, Philip Paeps wrote: On 2002-11-25 01:49:34 (+0100), Philip Paeps [EMAIL PROTECTED] wrote: Perhaps someone can make sense of this? I'm happy I can read my mail now, without having to kick the power button every so often, but I'd prefer to store my mail here and have the mailserver write over NFS. (Mainly for speed reasons). I suspect file locks across NFS as a possible source of this kind of problem. Locking is always a problem over NFS :-/ It's one of the reasons I'm using maildirs instead of normal happy mboxes. You could use IMAP instead, allowing you to read and sort your mail from anywhere. Mutt is capable of IMAP, no? Theoretically - correct me if I'm wrong - file locks shouldn't matter with maildirs as once a file is written, there's not much chance of it having to be written again, let alone by more than one process? This is up to the processes that have opened the files. File locks over NFS are obtained by userland programs via the fcntl(2) system call, and it's quite likely (and desired, in fact) that mutt is obtaining, at the least, read locks on the files that it has opened. How would one verify that NFS locking is causing pain? There's some NFS debugging stuff in NOTES... I'm willing to try anything to help fix this :-) A pair of simple test programs works well. One file creates a file, opens it, and locks it. The other program attempts to obtain either conflicting (e.g., try to grab a write-lock while the other process has a read-lock) or non-conflicting locks (try to grab a read-lock while the other process has a read-lock). It is best to run the programs on different hosts, against the same NFS filesystem. I've discovered that fcntl(2) on 4.x systems misbehaves in that the client host keeps track of the locks, so that different programs on the same client host (say, host A) honor the locks, but the lock is not propagated to the NFS server; therefore, processes on other hosts are not restrained by the locks that have been acquired by processes on host A. I have not tried these tests on current. To be fair, I only recently read (here) that rpc.lockd has to run on the client side as well as on the server side, and I have not yet repeated my tests. That's on my List of Things to Do RSN. -- Chris BeHanna http://www.pennasoft.com Principal Consultant PennaSoft Corporation [EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: NFS Locking Problems (Was: Re: I'm impressed, but ...)
On Sun, 8 Dec 2002, Chris BeHanna wrote: Locking is always a problem over NFS :-/ It's one of the reasons I'm using maildirs instead of normal happy mboxes. NFS locking is a problem on 4.X. NFS locks work on -current and have for about a year. FreeBSD -current NFS locks work with both FreeBSD and Solaris clients and servers. FreeBSD -current NFS locks are known to fail with Linux and the problem seems to be on the Linux side (At least that's what I will maintain until someone proves otherwise ;) ) The first bug report on the rpc.lockd rewrite was someone using NFS locking on mboxes (FreeBSD client and Solaris server). Consequently, we know that even mbox locking works in at least one situation. To be fair, I only recently read (here) that rpc.lockd has to run on the client side as well as on the server side, and I have not yet repeated my tests. That's on my List of Things to Do RSN. In order to honor NFS locks, *all* NFS machines must run rpc.lockd or its equivalent. NFS file locking is not part of the NFS protocol, it has its own protocol for the purpose which uses RPC. FreeBSD NFS locking on -current also seems to require rpc.statd on both NFS clients and servers as well. This is the daemon which recovers locks when an NFS server crashes. I need to look at the standard and see if this is really required or could be made optional. -a To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-11-25 01:49:34 (+0100), Philip Paeps [EMAIL PROTECTED] wrote: 2. This one's the most irritating. I use Mutt as my mailclient using Maildirs for storage. It occasionally happens that Mutt just 'hangs' reading a directory, and there's no way for me to kill it. Ps axl shows it as being in state Ds or Ds+ and blocked by ufs. I've fiddled a bit with this one. I still can't reproduce it reliably, but I got it to go away :-) Before, I had my maildir living on this machine (running -current), and my mailserver (running stable) mounted it over NFS and wrote mails to it. Often, when reading things locally in the directory, it just hung. I haven't been able to pinpoint why it hung, but I suspect it could be that it didn't like mails being written into it? Problem with nfsd perhaps? I've now turned the system around. My maildir now lives on the mailserver, and I mount it here. Problem has gone away. Perhaps someone can make sense of this? I'm happy I can read my mail now, without having to kick the power button every so often, but I'd prefer to store my mail here and have the mailserver write over NFS. (Mainly for speed reasons). - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. The length of a marriage is inversely proportional to the amount spent on the wedding. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On Sat, Dec 07, 2002 at 08:41:35PM +0100, Philip Paeps wrote: On 2002-11-25 01:49:34 (+0100), Philip Paeps [EMAIL PROTECTED] wrote: Perhaps someone can make sense of this? I'm happy I can read my mail now, without having to kick the power button every so often, but I'd prefer to store my mail here and have the mailserver write over NFS. (Mainly for speed reasons). I suspect file locks across NFS as a possible source of this kind of problem. -- Regards Cliff Sarginson The Netherlands [ This mail has been checked as virus-free ] To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-12-07 23:10:18 (+0100), Cliff Sarginson [EMAIL PROTECTED] wrote: On Sat, Dec 07, 2002 at 08:41:35PM +0100, Philip Paeps wrote: On 2002-11-25 01:49:34 (+0100), Philip Paeps [EMAIL PROTECTED] wrote: Perhaps someone can make sense of this? I'm happy I can read my mail now, without having to kick the power button every so often, but I'd prefer to store my mail here and have the mailserver write over NFS. (Mainly for speed reasons). I suspect file locks across NFS as a possible source of this kind of problem. Locking is always a problem over NFS :-/ It's one of the reasons I'm using maildirs instead of normal happy mboxes. Theoretically - correct me if I'm wrong - file locks shouldn't matter with maildirs as once a file is written, there's not much chance of it having to be written again, let alone by more than one process? How would one verify that NFS locking is causing pain? There's some NFS debugging stuff in NOTES... I'm willing to try anything to help fix this :-) - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. If you see a man approaching you with the obvious intent of doing you good, you should run for your life. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-11-26 12:37:25 (+0100), Robert Drehmel [EMAIL PROTECTED] wrote: On Mon, Nov 25, 2002 at 10:04:23PM +0100, Philip Paeps wrote: Before it is a few megs of the same. Basically Mutt reading my mailbox. Anything else I can do to help? You could give the attached patch a try. I'm afraid it doesn't solve the problem :-/ Still hung on me just now. It looked like it might have 'improved matters', though, but it might have been just a fluke. I'll keep testing and will try to find a way to reproduce it reliably, but it's being a pain to track down. - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. The wrong quarterback is the one that's in there. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-11-25 20:11:01 (-0500), Jeff Roberson [EMAIL PROTECTED] wrote: On Tue, 26 Nov 2002, Philip Paeps wrote: I was also starting to think in the NFS direction. It's one of the reasons I use Maildirs. My setup is a bit convoluted too. This machine (the one now running -current and hanging) plays NFS server. The mailserver mounts the maildir and writes mail into it. Now, the 'hangs' appear to be somewhat random, I can open a few mailfolders without issues, then suddenly I get a hang if I try another one. Also, I notice that when I send a mail when I'm in a folder which is set up to save mails in itself (what a ridiculous sentence), they don't get saved until a while later. I'm going to fiddle a bit with my setup and see if I can reliably reproduce hangs. 'Random' is very difficult to debug, I know :-/ If you do not have it compiled in already, please add DDB to your kernel config. The next time it hangs type CTRL+ALT+ESC to enter the debugger. Then please provide the output of 'ps' and 'show lockedvnods'. You may then use 'call boot(0)' to reboot your system. Ps inside the debugger gives me the following about Mutt: 638 c2ea0938 da743000 1001 634 638 0004002 norm[SLPQ ufs c4cb68dc[SLP] mutt Show lockedvnods gave me no output at all. - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. BOFH Excuse #83: Support staff hung over, send aspirin and come back LATER. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On Mon, Nov 25, 2002 at 10:04:23PM +0100, Philip Paeps wrote: Before it is a few megs of the same. Basically Mutt reading my mailbox. Anything else I can do to help? You could give the attached patch a try. Were you by any chance using truss(1) just before mutt started to hang? ciao, -robert Index: src/sys/alpha/alpha/trap.c === RCS file: /home/ncvs/src/sys/alpha/alpha/trap.c,v retrieving revision 1.102 diff -u -r1.102 trap.c --- src/sys/alpha/alpha/trap.c 24 Oct 2002 23:09:47 - 1.102 +++ src/sys/alpha/alpha/trap.c 26 Nov 2002 10:51:25 - @@ -738,7 +738,8 @@ td-td_retval[0] = 0; td-td_retval[1] = 0; - STOPEVENT(p, S_SCE, (callp-sy_narg SYF_ARGMASK)); + STOPEVENT(p, S_SCE, (callp-sy_narg SYF_ARGMASK), + syscall entry); error = (*callp-sy_call)(td, args + hidden); } @@ -784,7 +785,7 @@ * register set. If we ever support an emulation where this * is not the case, this code will need to be revisited. */ - STOPEVENT(p, S_SCX, code); + STOPEVENT(p, S_SCX, code, syscall exit); #ifdef DIAGNOSTIC cred_free_thread(td); Index: src/sys/i386/i386/trap.c === RCS file: /home/ncvs/src/sys/i386/i386/trap.c,v retrieving revision 1.237 diff -u -r1.237 trap.c --- src/sys/i386/i386/trap.c7 Nov 2002 01:34:23 - 1.237 +++ src/sys/i386/i386/trap.c26 Nov 2002 10:51:26 - @@ -1028,7 +1028,7 @@ td-td_retval[0] = 0; td-td_retval[1] = frame.tf_edx; - STOPEVENT(p, S_SCE, narg); + STOPEVENT(p, S_SCE, narg, syscall entry); error = (*callp-sy_call)(td, args); } @@ -1092,7 +1092,7 @@ * register set. If we ever support an emulation where this * is not the case, this code will need to be revisited. */ - STOPEVENT(p, S_SCX, code); + STOPEVENT(p, S_SCX, code, syscall exit); #ifdef DIAGNOSTIC cred_free_thread(td); Index: src/sys/ia64/ia64/trap.c === RCS file: /home/ncvs/src/sys/ia64/ia64/trap.c,v retrieving revision 1.65 diff -u -r1.65 trap.c --- src/sys/ia64/ia64/trap.c24 Oct 2002 23:09:48 - 1.65 +++ src/sys/ia64/ia64/trap.c26 Nov 2002 10:51:28 - @@ -852,7 +852,8 @@ td-td_retval[0] = 0; td-td_retval[1] = 0; - STOPEVENT(p, S_SCE, (callp-sy_narg SYF_ARGMASK)); + STOPEVENT(p, S_SCE, (callp-sy_narg SYF_ARGMASK), + syscall entry); error = (*callp-sy_call)(td, args); } @@ -900,7 +901,7 @@ * register set. If we ever support an emulation where this * is not the case, this code will need to be revisited. */ - STOPEVENT(p, S_SCX, code); + STOPEVENT(p, S_SCX, code, syscall exit); #ifdef DIAGNOSTIC cred_free_thread(td); @@ -1013,7 +1014,7 @@ td-td_retval[0] = 0; td-td_retval[1] = framep-tf_r[FRAME_R10]; /* edx */ - STOPEVENT(p, S_SCE, narg); + STOPEVENT(p, S_SCE, narg, ia32 syscall entry); error = (*callp-sy_call)(td, args64); } @@ -1077,7 +1078,7 @@ * register set. If we ever support an emulation where this * is not the case, this code will need to be revisited. */ - STOPEVENT(p, S_SCX, code); + STOPEVENT(p, S_SCX, code, ia32 syscall exit); #ifdef DIAGNOSTIC cred_free_thread(td); Index: src/sys/kern/kern_exec.c === RCS file: /home/ncvs/src/sys/kern/kern_exec.c,v retrieving revision 1.201 diff -u -r1.201 kern_exec.c --- src/sys/kern/kern_exec.c25 Nov 2002 04:37:44 - 1.201 +++ src/sys/kern/kern_exec.c26 Nov 2002 10:51:29 - @@ -563,12 +563,6 @@ KNOTE(p-p_klist, NOTE_EXEC); p-p_flag = ~P_INEXEC; - /* -* If tracing the process, trap to debugger so breakpoints -* can be set before the program executes. -*/ - _STOPEVENT(p, S_EXEC, 0); - if (p-p_flag P_TRACED) psignal(p, SIGTRAP); @@ -640,8 +634,14 @@ if (imgp-object) vm_object_deallocate(imgp-object); - if (error == 0) + if (error == 0) { + /* +* If tracing the process, trap to debugger so breakpoints +* can be set before the program executes. +*/ + STOPEVENT(p, S_EXEC, 0, exec); goto done2; + } exec_fail: /* we're done here, clear P_INEXEC */ Index: src/sys/kern/kern_exit.c === RCS file:
Re: I'm impressed, but ...
On 2002-11-26 12:37:25 (+0100), Robert Drehmel [EMAIL PROTECTED] wrote: On Mon, Nov 25, 2002 at 10:04:23PM +0100, Philip Paeps wrote: Before it is a few megs of the same. Basically Mutt reading my mailbox. Anything else I can do to help? You could give the attached patch a try. Okay. I'll let you know if it helps :-) Were you by any chance using truss(1) just before mutt started to hang? No. I used truss after it hung - and after I rebooted - to try and figure out why it hangs the next time it hung. I'll get back to you after my kernel is compiled. - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. If at first you don't succeed, try something else. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
Hello Philip, On Mon, Nov 25, 2002 at 01:49:34AM +0100, Philip Paeps wrote: [reformatted] 2. This one's the most irritating. I use Mutt as my mailclient using Maildirs for storage. It occasionally happens that Mutt just 'hangs' reading a directory, and there's no way for me to kill it. Ps axl shows it as being in state Ds or Ds+ and blocked by ufs. do you use truss(1)? ciao, -robert To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-11-25 11:45:36 (+0100), Robert Drehmel [EMAIL PROTECTED] wrote: On Mon, Nov 25, 2002 at 01:49:34AM +0100, Philip Paeps wrote: [reformatted] 2. This one's the most irritating. I use Mutt as my mailclient using Maildirs for storage. It occasionally happens that Mutt just 'hangs' reading a directory, and there's no way for me to kill it. Ps axl shows it as being in state Ds or Ds+ and blocked by ufs. do you use truss(1)? Frequently, but I hadn't thought about it in this case :-) Next time it just sits there, I'll try to find out what truss tells me. If nothing, I'll try to reproduce the problem running inside truss. I'll get with more info as soon as things die. - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. BOFH Excuse #101: Collapsed Backbone To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
Philip Paeps wrote: Hi guys - I accidently upgraded my workstation to -CURRENT over the weekend (forgot a tag=RELENG_4 in my supfile and let it cook). It being a workstation and containing no critical data, I decided to just stick with it and give it a go . The machine's been running for a day and a bit now, and it appears to be quit e stable, but there's a couple of things worrying me. I'd be happy to help analyse the problems further if someone tells me how :-) 1. When I boot my machine, it gives me the following messages: | [...] | vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0 | unknown: PNP0303 can't assign resources (port) | unknown: PNP0f13 can't assign resources (irq) | unknown: PNP0c02 can't assign resources (port) | unknown: PNP0501 can't assign resources (port) | unknown: PNP0700 can't assign resources (port) | unknown: PNP0401 can't assign resources (port) | unknown: PNP0501 can't assign resources (port) | Timecounters tick every 10.000 msec | ahc0: Someone reset channel A | [...] All my hardware (the stuff I've tested anyway) appears to work. Any idea which device is being unknown, or how I could find out? Do you also get an 'unable to initialize ACPI' message when your system boots? I stopped getting this message when I compiled ACPI support into the kernel: device acpi options ACPI_DEBUG Someone here will probably say that you don't need to compile it into the kernel, you can use the kernel module and you can use loader.conf to do this. See /usr/src/sys/boot/forth/loader.conf and loader.conf(5) for more details. FWIW, this file should probably have been installed into /boot/loader.conf.(default|sample|etc), then lazy people like me would have noticed a significant difference in loader.conf from 4.7 to current and investigated further. Ian To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-11-25 13:09:56 (+0100), Philip Paeps [EMAIL PROTECTED] wrote: On 2002-11-25 11:45:36 (+0100), Robert Drehmel [EMAIL PROTECTED] wrote: On Mon, Nov 25, 2002 at 01:49:34AM +0100, Philip Paeps wrote: [reformatted] 2. This one's the most irritating. I use Mutt as my mailclient using Maildirs for storage. It occasionally happens that Mutt just 'hangs' reading a directory, and there's no way for me to kill it. Ps axl shows it as being in state Ds or Ds+ and blocked by ufs. do you use truss(1)? Frequently, but I hadn't thought about it in this case :-) Next time it just sits there, I'll try to find out what truss tells me. If nothing, I'll try to reproduce the problem running inside truss. I'll get with more info as soon as things die. Mmm, truss doesn't give me anything particularly useful. The last few lines when it hangs are: | read(0x0,0xbfbfe19b,0x1) = 1 (0x1) | write(1,0x80e1000,6) = 6 (0x6) | write(1,0x80e1000,6) = 6 (0x6) | stat(/etc/nsswitch.conf,0xbfbfd950) ERR#2 'No such file or |directory' | geteuid() = 1001 (0x3e9) | stat(/etc/pwd.db,0xbfbfd860)= 0 (0x0) | open(/etc/pwd.db,0x0,00)= 4 (0x4) | fcntl(0x4,0x2,0x1)= 0 (0x0) | read(0x4,0x8133a00,0x104) = 260 (0x104) | lseek(4,0x5000,0) = 20480 (0x5000) | read(0x4,0x8477000,0x1000)= 4096 (0x1000) | lseek(4,0x4000,0) = 16384 (0x4000) | read(0x4,0x8478000,0x1000)= 4096 (0x1000) | lseek(4,0x6000,0) = 24576 (0x6000) | read(0x4,0x8479000,0x1000)= 4096 (0x1000) | lseek(4,0x7000,0) = 28672 (0x7000) | read(0x4,0x847a000,0x1000)= 4096 (0x1000) | ls ...and then it just sits there... It doesn't even finish printing the line. Ps axl tells me it's waiting on ufs, and there's no way to kill it, other than a reboot. When rebooting, it tells me it gives up on one buffer, and then just stays hanging there. Perhaps breaking into a debugger will provide some more useful information. I'll try that next. - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. Real programmers don't notch their desks for each completed service request. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-11-25 17:23:50 (+0200), [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Philip Paeps wrote: 1. When I boot my machine, it gives me the following messages: | [...] | vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0 | unknown: PNP0303 can't assign resources (port) | unknown: PNP0f13 can't assign resources (irq) | unknown: PNP0c02 can't assign resources (port) | unknown: PNP0501 can't assign resources (port) | unknown: PNP0700 can't assign resources (port) | unknown: PNP0401 can't assign resources (port) | unknown: PNP0501 can't assign resources (port) | Timecounters tick every 10.000 msec | ahc0: Someone reset channel A | [...] All my hardware (the stuff I've tested anyway) appears to work. Any idea which device is being unknown, or how I could find out? Do you also get an 'unable to initialize ACPI' message when your system boots? Nope, I don't use ACPI. I didn't have it in my kernel, and don't load it dynamically either. I stopped getting this message when I compiled ACPI support into the kernel: device acpi options ACPI_DEBUG I tried that, I still get the same message as above, in addition to some new happy messages: | ACPI-0159: *** Error: AcpiLoadTables: Could not get RSDP, AE_NO_ACPI_TABLES | ACPI-0213: *** Error: AcpiLoadTables: Could not load tables: AE_NO_ACPI_TABLES | ACPI: table load failed: AE_NO_ACPI_TABLES I've probably turned ACPI off in the BIOS (haven't checked), if there's even ACPI stuff available on this machine. Someone here will probably say that you don't need to compile it into the kernel, you can use the kernel module and you can use loader.conf to do this. See /usr/src/sys/boot/forth/loader.conf and loader.conf(5) for more details. FWIW, this file should probably have been installed into /boot/loader.conf.(default|sample|etc), then lazy people like me would have noticed a significant difference in loader.conf from 4.7 to current and investigated further. Loader.conf works nicely, but putting acpi in there, or in the kernel, gives exactly the same results as above: the PNP messages, plus the ACPI complaints. - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. BOFH Excuse #282: High altitude condensation from U.S.A.F prototype aircraft has contaminated the primary subnet mask. Turn off your computer for 9 days to avoid damaging it. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On Mon, Nov 25, 2002 at 01:49:34AM +0100, Philip Paeps wrote the words in effect of: | [...] | vga0: Generic ISA VGA at port 0x3c0-0x3df iomem 0xa-0xb on isa0 | unknown: PNP0303 can't assign resources (port) | unknown: PNP0f13 can't assign resources (irq) | unknown: PNP0c02 can't assign resources (port) | unknown: PNP0501 can't assign resources (port) | unknown: PNP0700 can't assign resources (port) | unknown: PNP0401 can't assign resources (port) | unknown: PNP0501 can't assign resources (port) | Timecounters tick every 10.000 msec | ahc0: Someone reset channel A | [...] All my hardware (the stuff I've tested anyway) appears to work. Any idea which device is being unknown, or how I could find out? Hi there. Can you try changing the hardware tunable, hw.pci.allow_unsupported_io_range, to the value of 1 in your loader.conf. I think this should do it. You can then check this value after you booted by `sysctl hw.pci`. 2. This one's the most irritating. I use Mutt as my mailclient using Maildirs for storage. It occasionally happens that Mutt just 'hangs' reading a directory, and there's no way for me to kill it. Ps axl shows it as being in state Ds or Ds+ and blocked by ufs. Hmm, this also happens in the case of dd(1). If you invoke dd(1) as: # dd if=/dev/zero of=/tmp/somefile As you can see, it gets stuck when not provided with a count variable. It hangs in the `ufs' state. I am currently looking into this. I am thinking, this is because a 0 byte file is found disturbing. 3. I can't seem to restart my machine properly. This might be related to the above, as the only reason for me to restart the machine is the fact that I can't kill Mutt however much I try, and really would like to read my mail. It will sync disks and say 'done', but then it just sits there doing nothing until I flip the power-switch. Exactly the same thing happens when any process hangs in the `ufs' state. It syncs the disks, when you `shutdown` or `fastboot`. This indicates a bug in the kernel. As I said, I'm happy to help analyse and debug these issues, but I don't know where to look :-) Can you try using `ktrace`, like this: root# ktrace mutt (or the command which makes it hang) root# kdump -f ktrace.out (this is the output needed) Cheers. -- Hiten Pandya ([EMAIL PROTECTED], [EMAIL PROTECTED]) http://www.unixdaemons.com/~hiten/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-11-25 14:41:22 (-0500), Hiten Pandya [EMAIL PROTECTED] wrote: On Mon, Nov 25, 2002 at 01:49:34AM +0100, Philip Paeps wrote the words in effect of: | unknown: PNP0401 can't assign resources (port) | unknown: PNP0501 can't assign resources (port) Can you try changing the hardware tunable, hw.pci.allow_unsupported_io_range, to the value of 1 in your loader.conf. I think this should do it. You can then check this value after you booted by `sysctl hw.pci`. I'm afraid that doesn't cure the 'problem'. (juno:/home/philip)# sysctl hw.pci hw.pci.enable_io_modes: 1 hw.pci.allow_unsupported_io_range: 1 Exactly the same output as above. 2. This one's the most irritating. I use Mutt as my mailclient using Maildirs for storage. It occasionally happens that Mutt just 'hangs' reading a directory, and there's no way for me to kill it. Ps axl shows it as being in state Ds or Ds+ and blocked by ufs. Hmm, this also happens in the case of dd(1). If you invoke dd(1) as: # dd if=/dev/zero of=/tmp/somefile As you can see, it gets stuck when not provided with a count variable. It hangs in the `ufs' state. I am currently looking into this. I am thinking, this is because a 0 byte file is found disturbing. Mmm, this doesn't seem hang for me. It just keeps filling the file, but doesn't hang. (juno:/usr/src)# dd if=/dev/zero of=/tmp/somefile ^C980608+0 records in 980607+0 records out 502070784 bytes transferred in 32.172603 secs (15605538 bytes/sec) Can you try using `ktrace`, like this: root# ktrace mutt (or the command which makes it hang) root# kdump -f ktrace.out (this is the output needed) Will do. Just building a kernel with ktrace, as I accidently removed it. I'll get back with more info. - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. (1) If the weather is extremely bad, church attendance will be down. (2) If the weather is extremely good, church attendance will be down. (3) If the bulletin covers are in short supply church attendance will exceed all expectations. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-11-25 14:41:22 (-0500), Hiten Pandya [EMAIL PROTECTED] wrote: On Mon, Nov 25, 2002 at 01:49:34AM +0100, Philip Paeps wrote the words in effect of: As I said, I'm happy to help analyse and debug these issues, but I don't know where to look :-) Can you try using `ktrace`, like this: root# ktrace mutt (or the command which makes it hang) root# kdump -f ktrace.out (this is the output needed) This is the last screenfull of kdump: 567 muttCALL stat(0xbfbfebb0,0xbfbfeb50) 567 muttNAMI /home/philip/Maildir/lists/bsd/freebsd-current/cur 567 muttRET stat 0 567 muttCALL stat(0xbfbfebb0,0xbfbfeb50) 567 muttNAMI /home/philip/Maildir/lists/bsd/freebsd-current/new 567 muttRET stat 0 567 muttCALL stat(0xbfbfebb0,0xbfbfeae0) 567 muttNAMI /home/philip/Maildir/lists/bsd/freebsd-current/new 567 muttRET stat 0 567 muttCALL open(0xbfbfebb0,0x4,0xbfbfe9e8) 567 muttNAMI /home/philip/Maildir/lists/bsd/freebsd-current/new 567 muttRET open 3 567 muttCALL fstat(0x3,0xbfbfeae0) 567 muttRET fstat 0 567 muttCALL fcntl(0x3,0x2,0x1) 567 muttRET fcntl 0 567 muttCALL fstatfs(0x3,0xbfbfe9e0) 567 muttRET fstatfs 0 567 muttCALL getdirentries(0x3,0x80f4000,0x1000,0x80f3054) 567 muttRET getdirentries 12/0x200 567 muttCALL write(0x1,0x80e2000,0x2) 567 muttGIO fd 1 wrote 2 bytes 1 567 muttRET write 2 567 muttCALL getdirentries(0x3,0x80f4000,0x1000,0x80f3054) 567 muttRET getdirentries 0 567 muttCALL lseek(0x3,0,0,0,0) 567 muttRET lseek 0 567 muttCALL close(0x3) 567 muttRET close 0 567 muttCALL open(0xbfbfebb0,0,0x1b6) 567 muttNAMI /home/philip/Maildir/lists/bsd/freebsd-current/new/1038256312.43736_1.fortuna.home.paeps.cx Before it is a few megs of the same. Basically Mutt reading my mailbox. Anything else I can do to help? - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. BOFH Excuse #225: It's those computer people in X {city of world}. They keep stuffing things up. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
Philip Paeps wrote: On 2002-11-25 14:41:22 (-0500), Hiten Pandya wrote: On Mon, Nov 25, 2002 at 01:49:34AM +0100, Philip Paeps wrote: | unknown: PNP0401 can't assign resources (port) | unknown: PNP0501 can't assign resources (port) Can you try changing the hardware tunable, hw.pci.allow_unsupported_io_range, to the value of 1 in your loader.conf. I think this should do it. You can then check this value after you booted by `sysctl hw.pci`. I'm afraid that doesn't cure the 'problem'. I think Hiten responded based on the can't assign resource messages, without reading all the way through; I sometimes do kneee-jerk responses to problem reports, as well. The reason his advice didn't help you suppress the messages is that the failure is in port and IRQ assignments, not in memory window assignments. The problem is related to multiple claimants for the device: the BIOS, vs. the OS. If you change the BIOS settings for PnP OS, the messages should go away. Note that the messages are just warnings; they will not make anything not work, given your configuration. The maildirs issue, I won't comment on, at this time. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
Well, since no one seem to have mentioned... There is a note on the TODO list that there are race conditions with truss. Perhaps mutt is freezing because you are using truss elsewhere? Philip Paeps wrote: On 2002-11-25 13:09:56 (+0100), Philip Paeps [EMAIL PROTECTED] wrote: On 2002-11-25 11:45:36 (+0100), Robert Drehmel [EMAIL PROTECTED] wrote: On Mon, Nov 25, 2002 at 01:49:34AM +0100, Philip Paeps wrote: [reformatted] 2. This one's the most irritating. I use Mutt as my mailclient using Maildirs for storage. It occasionally happens that Mutt just 'hangs' reading a directory, and there's no way for me to kill it. Ps axl shows it as being in state Ds or Ds+ and blocked by ufs. do you use truss(1)? Frequently, but I hadn't thought about it in this case :-) Next time it just sits there, I'll try to find out what truss tells me. If nothing, I'll try to reproduce the problem running inside truss. I'll get with more info as soon as things die. Mmm, truss doesn't give me anything particularly useful. The last few lines when it hangs are: | read(0x0,0xbfbfe19b,0x1) = 1 (0x1) | write(1,0x80e1000,6) = 6 (0x6) | write(1,0x80e1000,6) = 6 (0x6) | stat(/etc/nsswitch.conf,0xbfbfd950) ERR#2 'No such file or directory' | geteuid() = 1001 (0x3e9) | stat(/etc/pwd.db,0xbfbfd860)= 0 (0x0) | open(/etc/pwd.db,0x0,00)= 4 (0x4) | fcntl(0x4,0x2,0x1)= 0 (0x0) | read(0x4,0x8133a00,0x104) = 260 (0x104) | lseek(4,0x5000,0) = 20480 (0x5000) | read(0x4,0x8477000,0x1000)= 4096 (0x1000) | lseek(4,0x4000,0) = 16384 (0x4000) | read(0x4,0x8478000,0x1000)= 4096 (0x1000) | lseek(4,0x6000,0) = 24576 (0x6000) | read(0x4,0x8479000,0x1000)= 4096 (0x1000) | lseek(4,0x7000,0) = 28672 (0x7000) | read(0x4,0x847a000,0x1000)= 4096 (0x1000) | ls ...and then it just sits there... It doesn't even finish printing the line. Ps axl tells me it's waiting on ufs, and there's no way to kill it, other than a reboot. When rebooting, it tells me it gives up on one buffer, and then just stays hanging there. Perhaps breaking into a debugger will provide some more useful information. I'll try that next. - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. Real programmers don't notch their desks for each completed service request. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message -- Daniel C. Sobral(8-DCS) [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] Fundamentalist Debianites, core children of the Linuxen sounds like it could come from the Book of Mormon, or Tolkien on a bad day... To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On Mon, 25 Nov 2002, Philip Paeps wrote: On 2002-11-25 14:41:22 (-0500), Hiten Pandya [EMAIL PROTECTED] wrote: On Mon, Nov 25, 2002 at 01:49:34AM +0100, Philip Paeps wrote the words in effect of: | unknown: PNP0401 can't assign resources (port) | unknown: PNP0501 can't assign resources (port) Can you try changing the hardware tunable, hw.pci.allow_unsupported_io_range, to the value of 1 in your loader.conf. I think this should do it. You can then check this value after you booted by `sysctl hw.pci`. I'm afraid that doesn't cure the 'problem'. (juno:/home/philip)# sysctl hw.pci hw.pci.enable_io_modes: 1 hw.pci.allow_unsupported_io_range: 1 Exactly the same output as above. The messages are essentially harmless and won't affect your system's functionality. I believe Jeff submitted a patch for this earlier on arch@. You can't fix this with the sysctls above. -Nate To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-11-25 14:32:27 (-0800), Terry Lambert [EMAIL PROTECTED] wrote: Philip Paeps wrote: On 2002-11-25 14:41:22 (-0500), Hiten Pandya wrote: On Mon, Nov 25, 2002 at 01:49:34AM +0100, Philip Paeps wrote: | unknown: PNP0401 can't assign resources (port) | unknown: PNP0501 can't assign resources (port) Can you try changing the hardware tunable, hw.pci.allow_unsupported_io_range, to the value of 1 in your loader.conf. I think this should do it. You can then check this value after you booted by `sysctl hw.pci`. I'm afraid that doesn't cure the 'problem'. I think Hiten responded based on the can't assign resource messages, without reading all the way through; I sometimes do kneee-jerk responses to problem reports, as well. The reason his advice didn't help you suppress the messages is that the failure is in port and IRQ assignments, not in memory window assignments. Aha, okay. I've just learned something new. :-) The problem is related to multiple claimants for the device: the BIOS, vs. the OS. If you change the BIOS settings for PnP OS, the messages should go away. Note that the messages are just warnings; they will not make anything not work, given your configuration. Thanks for the hint! I went fiddling with settings in my BIOS, and turned on ACPI (there was no 'PnP OS' setting, but someone else mentioned ACPI earlier), and the message is now gone. It's been replaced with a lot of messages telling me that ACPI is working and happy to serve me. The maildirs issue, I won't comment on, at this time. I hope I can provide enough information for someone to solve it though :-) It would be nice to be able to read my mail 'reliably' :-) - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. BOFH Excuse #299: The data on your hard drive is out of balance. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
Philip Paeps wrote: The maildirs issue, I won't comment on, at this time. I hope I can provide enough information for someone to solve it though :-) It would be nice to be able to read my mail 'reliably' :-) The problem is not the amount, but the type of information. You need to characterize the problem well enough that you can write a little program that can repeat it on someone else's machine, without them having to create an installation identical to yours on a scratch box ...particularly when it looks like if they tried that, it would work for them. Right now, there are other people using the same software that can't repeat the problem. Without knowing whether or not you are both/neither/or-or-the-other using NFS, etc., it's really impossible to even point you in the right direction (NFS is my hunch, in this case; it's a common reason for use of maildirs, to try and side-step locking issues). You probably need to get together with the other person who said they were *not* having a problem, and do a detailed compare on system configuration, if all other things are equal. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On 2002-11-25 16:00:52 (-0800), Terry Lambert [EMAIL PROTECTED] wrote: Philip Paeps wrote: The maildirs issue, I won't comment on, at this time. I hope I can provide enough information for someone to solve it though :-) It would be nice to be able to read my mail 'reliably' :-) The problem is not the amount, but the type of information. You need to characterize the problem well enough that you can write a little program that can repeat it on someone else's machine, without them having to create an installation identical to yours on a scratch box ...particularly when it looks like if they tried that, it would work for them. That sounds like the first law of debugging to me :-) Right now, there are other people using the same software that can't repeat the problem. Without knowing whether or not you are both/neither/or-or-the-other using NFS, etc., it's really impossible to even point you in the right direction (NFS is my hunch, in this case; it's a common reason for use of maildirs, to try and side-step locking issues). I was also starting to think in the NFS direction. It's one of the reasons I use Maildirs. My setup is a bit convoluted too. This machine (the one now running -current and hanging) plays NFS server. The mailserver mounts the maildir and writes mail into it. Now, the 'hangs' appear to be somewhat random, I can open a few mailfolders without issues, then suddenly I get a hang if I try another one. Also, I notice that when I send a mail when I'm in a folder which is set up to save mails in itself (what a ridiculous sentence), they don't get saved until a while later. I'm going to fiddle a bit with my setup and see if I can reliably reproduce hangs. 'Random' is very difficult to debug, I know :-/ You probably need to get together with the other person who said they were *not* having a problem, and do a detailed compare on system configuration, if all other things are equal. So who is this mysterious other person? :-) I'll get back with some more concrete information as soon as I have some. It's an intriguing challenge. - Philip -- Philip Paeps Please don't CC me, I am [EMAIL PROTECTED] subscribed to the list. BOFH Excuse #2: solar flares To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: I'm impressed, but ...
On Tue, 26 Nov 2002, Philip Paeps wrote: I was also starting to think in the NFS direction. It's one of the reasons I use Maildirs. My setup is a bit convoluted too. This machine (the one now running -current and hanging) plays NFS server. The mailserver mounts the maildir and writes mail into it. Now, the 'hangs' appear to be somewhat random, I can open a few mailfolders without issues, then suddenly I get a hang if I try another one. Also, I notice that when I send a mail when I'm in a folder which is set up to save mails in itself (what a ridiculous sentence), they don't get saved until a while later. I'm going to fiddle a bit with my setup and see if I can reliably reproduce hangs. 'Random' is very difficult to debug, I know :-/ If you do not have it compiled in already, please add DDB to your kernel config. The next time it hangs type CTRL+ALT+ESC to enter the debugger. Then please provide the output of 'ps' and 'show lockedvnods'. You may then use 'call boot(0)' to reboot your system. Thanks! Jeff To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message