how to diagnose server freeze with ddb?
Hello, I have a server that freezes under high load sometimes. It is on FreeBSD 7.3. It does not respond neither by network nor to keyboard. In the same time I can hit Ctrl-Alt-ESC and go to debugger - it works. What can I try to do in DDB to find out the reason of server freezing? Thanks in advance! -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
7.3 - optimizing filesystem - cache all metadata
Hello! I have FreeBSD 7.3 server that is used for web sites. It performs many filesystem operations, so filesystem performance is very important. I am looking how can it be improved. I already use vfs.lookup shared=1, it helped me some time ago to decrease CPU time usage on filesystem operations. I also increased vfs.ufs.dirhash_maxmem to 67108864. But It still sometimes takes several seconds to ls directory that s not in the cache and fstat() calls sometimes slow when IO is high. Can filesystem performance be improved more? I think performance would benefit from increasing memory used for file metadata cache. One of the most frequent operations is fstat(). If it could be possible to tell FreeBSD to keep all metadata cache in memory and never clear it, all repeating fstat() operations would become very fast. How can I see how much memory is used for filesystem cache? Is it possible to increase this memory and increase time that cache entry is keeped in the memory (probably to infinity)? Thanks in advance. -- Олег Петрачев ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
memoryuse vs vmemoryuse
Hello! I am trying to set user limits in login.conf, and I see there are 'memoryuse' and 'vmemoryuse'. Handbook describes only the former.. What is the difference between them? -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Rescan hard drives
Hello, I am using Adaptec 3405 with FreeBSD 7.3. After hot-swapping some hard drives and deleting/creating RAID1 mirror I stuck with situation when I can not work with newly created device. It is aacd1. Previously it was 1000G drive without RAID. Now it is 300G RAID1 mirror. But `geom disk list` reports: Geom name: aacd1 Providers: 1. Name: aacd1 Mediasize: 999642103808 (931G) Sectorsize: 512 Mode: r0w0e0 Geom name: aacd1 Providers: 1. Name: aacd1 Mediasize: 299563483136 (279G) Sectorsize: 512 Mode: r0w0e0 fwsectors: 63 fwheads: 255 If I delete this RAID1 with arcconf, geom disk list still reports about 931G drive at aacd1, though it was swapped out from server. `dmesg | tail` says kernel detected new drive: aacd1: on aac0 aacd1: 285686MB (585084928 sectors) But fdisk, bsdlabel, dd and others do not work with aacd1 because of error message: 'Device not configured'. Also there is old partition table of previous disk seen in `ls /dev/aacd1*`, tough new disk was not partitioned. Is there a way to reread connected drives information in FreeBSD? I tried: # atacontrol list ATA channel 2: Master: no device present Slave: no device present ATA channel 3: Master: no device present Slave: no device present ATA channel 4: Master: no device present Slave: no device present ATA channel 5: Master: no device present Slave: no device present And # camcontrol devlist camcontrol: couldn't open /dev/xpt0: No such file or directory They do not work :( Would be appreciated for any hint. -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: zfs on 7.3 with 7.2 world
Hello, > > I want to start using ZFS v13 and I have FreeBSD 7.2 world with 7.3 > kernel. > > > > And if I need to upgrade something in the world - what should it be? > > Why do you not update FreeBSD properly? If you want to use 7.3, install > kernel _and_ world. (I would suggest using 8.1 though.) > > If it would be my own desktop I surely did upgrade it as described and switched to 8.1 too. But this is a production server, so I am trying to keep changes as minimal as possible and only if changes are required indeed. -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
zfs on 7.3 with 7.2 world
Hello, I want to start using ZFS v13 and I have FreeBSD 7.2 world with 7.3 kernel. Do I have to upgrade zfs/zpool binaries (and maybe some libraries) to 7.3 or only recent kernel version is required to work with ZVS v13 safely? And if I need to upgrade something in the world - what should it be? Thanks. -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
fsck reports errors on clean filesystem (mounted rw)
Hello. I ran fsck on my filesystems while system was running (partitons were mounted rw with moderate FS usage). fsck reported there were errors (INCORRECT BLOCK COUNT and others). I decided to reboot to single mode and check all filesystems. But in single mode fsck did not find any errors. 1. Can I be sure my filesystem is consistent? 2. If fsck reports nonexistent errors (and probably will try to fix them if asked), isn't it even danger to run fsck on running system? 3. How can I check (not fix) filesystems while partitions are mouted rw and are under usage? FreeBSD 7.3/kernel, 7.2/world. Thanks in advance. -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
quotaon stucked in 'syncer' state
Hello, On FreeBSD 7.3-STABLE I have a job in my root crontab that is executed every night: 5 0 * * * /usr/sbin/quotaoff -a; /sbin/quotacheck -aug; /usr/sbin/quotaon -a; Today I've found out that two quotaon processes stucked in 'syncer' state in top ('D' state in ps). No quotaon/quotaoff can be started now: # /usr/sbin/quotaoff -a quotaoff: /home: Operation already in progress quotaoff: /home: Operation already in progress And these processes can not be killed. Here is ps/top output: # ps auwwx | grep 'quot[a]' root 2462 0.0 0.0 4608 912 ?? DThu12AM 0:00.03 /usr/sbin/quotaon -a root 60450 0.0 0.0 4608 928 ?? DFri12AM 0:00.04 /usr/sbin/quotaon -a # top -b -Uroot 100500 | grep quota 60450 root 1 -40 4608K 928K syncer 4 0:00 0.00% quotaon 2462 root 1 -40 4608K 912K syncer 5 0:00 0.00% quotaon Is there any way to finish these stucked processes without reboot? -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: User cpu time VS system cpu time
Hello, >> I want to understand difference between user CPU time and system CPU >> time in system accounting. > But keep in mind that "kernel time" is a broad category - while IO time in > itself does not count as CPU time, file system operations for example do, > because they really can be CPU intensive. Ivan, thanks for the great explanation. I think that I can measure user filesystem usage with sa - it reports number of IO operations per user/command. In which other cases kernel time is used instead of user time for a process? I do not mean all of them - just that usually occur in practice. I've noticed that there are moments when system load in top for system time is very high (60-80% while user load is 15-25%, this produces very high LA also). All processes that were run at this time show high kernel time usage, although they usually do not. System is getting back to normal after Apache restart (I think this is related to Apache shared memory somehow, but not sure). This makes me suspect that system time in sa can not be relied on while measuring user system usage, because it notably varies under some circumstances for same operations. Am I wrong? -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
User cpu time VS system cpu time
Hello, I want to understand difference between user CPU time and system CPU time in system accounting. When some process uses many system CPU, does it really mean that process prouduces heavy load on server and takes up resources that could be used by other tasks instead? Or it only means that this process performs many waits for, say, I/O operations? Thanks in advance! -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Unique process id (not pid) and accounting daemon
Hello. I am trying to create an accounting daemon that would be more precise than usual BSD system accounting. It should read the whole process tree from time to time (say, every 10 seconds) and log changes in usage of CPU, I/O operations and memory per process. After daemon notices process exit, it should read /var/account/acct to get a last portion of accounting data and make a last entry for the process. Also daemon should read /var/account/acct to find information about processes that had been running between taking process tree snapshots. There is a problem: it is not always possible to link a process in a process tree against matching process in an accounting file. Only command name, user/group id and start time will match, but: * start time may change (i. e. after ntpdate); * command name saved in /var/account/acct is 15 characters max (AC_COMM_LEN in sys/sys/acct.h), while command name in the process tree is 19 characters max (MAXCOMLEN in sys/sys/param.h). To ensure that process in the process tree and process in the accounting file are the same, I want to add unique process identifier (uint64_t) to 'proc' struct in sys/sys/proc.h and increment it for every process fork. I see it is possible to do this just before sx_sunlock() in fork1() in sys/kern/kern_fork.c. I'll have to add saving of this identifier in kern_acct.c, of course. This way I will be extremely easy to remember a process in the process tree and find a matching one in the accounting file after it finishes. Am I looking in a right direction or should I try some other way? Thanks in advance. -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Unique id of a process (not pid)
Hello, Is there any unique identifier of a process in FreeBSD (not PID)? I am trying to get list of processes and watch for changes with kvm_getprocs(). I want to catch every process start and exit (except those processes that were started and finished between calls to kvm_getprocs()). But between calls to this function one process may exit and be replaced with another process with the same pid and same command name. The only difference is a start time of processes. Looks like this is a solution, but process start time may change if system time was shifted (i. e. with ntpdate). I can track these shifts too, but it looks to be too complex. Is there any simpler way to identify a process? Thanks in advance. -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Max kernel dump size
> > How can I calculate max kernel dump size? I want to create my swap > partition > > as small as possible, just to fit kernel dump needs. > > I'm not sure you really can. You'll definitely have enough if you allow > a bit more than you have memory, but these days that's going to be > overkill most of the time. > > Yes, at this time I use SWAP = RAM + 1G formula. And yes, this is an overkill especially for expensive SAS drives. I've noticed that my kernel dumps do not exceed 2-3.5G usually. Maybe I can collect stats for amount of Active memory used and assume that kernel dump will not get larger than, say, Active memory + 50%? -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Max kernel dump size
Hello everybody. How can I calculate max kernel dump size? I want to create my swap partition as small as possible, just to fit kernel dump needs. Thanks. -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: FreeBSD is too filesystem errors sensitive
>> After panic data *is* getting corrupted anyway - MySQL tables that were >> open are broken, soft-updates are unsync'ed etc etc. > If it's an option for you, you may want to look into disabling soft > updates as well so that you don't have to just hope that everything gets > synced before the end of the world. Depending on your usage, however, > this might result in unacceptably poor performance. I am thinking about it. I am using RAID controller with battery and write cache enabled, but I just did not test performance for Softupdates vs no Softupdates + Write cache. Probably, someone have done this already? -- // cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: FreeBSD is too filesystem errors sensitive
panics like 'freeing free block' or 'ffs_valloc: dup alloc' Is there a way to say "Dear kernel, don't panic, I'am holding your hand, keep working please-please-please?" If so, can it lead to complete filesystem corruption indeed or it is not so serious? Afaik you can't do this. And you shouldn't do if it'd be possible. The file system errors you mention above should not happen under any normal circumstances. They may happen after a crash caused by other reasons but should get repaired by fsck. The kernel cannot continue with such errors because the whole file system metadata cannot be trusted anymore until repaired. Thanks. What I can definitely state is that after reboot nothing will get any better. I will have same filesystem with same errors + new errors that appeared because soft-updates were not synced, and I will have fsck running in background. I'd prefer to just start fsck in background, skipping that annoying reboot phase ;-) Am I willing strange? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: FreeBSD is too filesystem errors sensitive
... Is there a way to say "Dear kernel, don't panic, I'am holding your hand, keep working please-please-please?" If so, can it lead to complete filesystem corruption indeed or it is not so serious? Drop to DDB, fix it, and 'continue'? If I type 'continue' kernel says 'Dumping... rebooting...'. What magic am I missing that you probably meant under "fix it"? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: FreeBSD is too filesystem errors sensitive
.. but the hell why is it required to panic and kill everything that would be working happily even if something very disasterous happen to /backup partition, in example? All those errors indicate file system corruption. To protect other data from getting corrupted (e.g. by invalid pointers or calculations), the kernel panics. ...and (hopefully) reboots, determines that there were filesystem errors, and attempts to correct them with fsck(8) Why isn't it possible to do the same without a reboot? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: FreeBSD is too filesystem errors sensitive
Please forgive me for probably a very stupid question. But why is FreeBSD so sensitive to filesystem errors that it ends up with panics like 'freeing free block' or 'ffs_valloc: dup alloc'? I just can't get it. Failed to allocate vnode? Go allocate another one! Freeing free block? Leave it free then! I understand these situations should never happen, but the hell why is it required to panic and kill everything that would be working happily even if something very disasterous happen to /backup partition, in example? Probably because UFS is not designed to be a backup file system but a working one :) All those errors indicate file system corruption. To protect other data from getting corrupted (e.g. by invalid pointers or calculations), the kernel panics. To protect us against terrorists our government do strange things too ;-) After panic data *is* getting corrupted anyway - MySQL tables that were open are broken, soft-updates are unsync'ed etc etc. Server is required to reboot, fsck, time is wasted while this occurs. Why all this should happen because of a single vnode fail? Why not just throw message in /var/log/messages, return "oh, I failed to save a file" to the process that initiated the operation and just go on? Are consequences of attept to "free already free block" *so* dangerous that it is needed to give up on EVERYTHING? Let's say it was not /backup partition, ok, it was /var/tmp/some-php-session or even /var/cron/tabs/someuser file that failed. So what? Even /boot/kernel/kernel corruption is not critical if you are not going to reboot right now (or if you have /boot/kernel.old :) Is there a way to say "Dear kernel, don't panic, I'am holding your hand, keep working please-please-please?" If so, can it lead to complete filesystem corruption indeed or it is not so serious? Thanks. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
FreeBSD is too filesystem errors sensitive
Hello. Please forgive me for probably a very stupid question. But why is FreeBSD so sensitive to filesystem errors that it ends up with panics like 'freeing free block' or 'ffs_valloc: dup alloc'? I just can't get it. Failed to allocate vnode? Go allocate another one! Freeing free block? Leave it free then! I understand these situations should never happen, but the hell why is it required to panic and kill everything that would be working happily even if something very disasterous happen to /backup partition, in example? Would be very appreciated if someone could explain that... thanks. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
nice for disk I/O
Hello. It is well known that nice allows to change CPU scheduling priority. But is there something that would tune disk I/O priority for a particular process? Thanks in advance. -- cronfy ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
2 processes reproducible read same file with different speed
Hello. I've noticed a very weird behavior of 2 Apache processes that shold read the same file to process a request (they configured to read it on every request). One spends about 6ms to read the file, and second spends about 114ms (I used ktrace to find this out). Every time, on every request, the problem is reproducible. Apaches are the same, the only difference between them that they are working from different users to serve different sites. Same binary, same config. First Apache used to work in the same way some time ago - it spent ~120ms to read the file. But once it changed and now it is working fast. Restarts of Apache do not look to affect on anything. The file that Apache should read is 315k long. Apache reads it by small blocks of 4096 bytes each. May be FreeBSD has some memory about how process is working with files and after some time enables some optimization or caching? I just do not have any clue... :( Can anyone explain this please? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
FreeBSD 7.2 Fatal trap 9 - general protection fault while in kernel mode
Hello, I have Fatal trap 9: general protection fault while in kernel mode with FreeBSD 7.2 and kernel csup'ed and build on 22 Oct using standard-supfile. How can I find out what is the problem? Message: Fatal trap 9: general protection fault while in kernel mode cpuid = 11; apic id = 13 instruction pointer = 0x8:0x802a65c1 stack pointer = 0x10:0x79d75380 frame pointer = 0x10:0x79d753a0 code segment= base 0x0, limit 0xf, type 0x1b = DPL 0, pres 1, long 1, def32 0, gran 1 processor eflags= interrupt enabled, resume, IOPL = 0 current process = 114 (php) Backtrace: db> bt Tracing pid 114 tid 100403 td 0xff00452ec370 devstat_start_transaction() at devstat_start_transaction+0x11 g_io_request() at g_io_request+0x11f breadn() at breadn+0xd3 bread() at bread+0x1e ffs_vgetf() at ffs_vgetf+0x2dc ufs_root() at ufs_root+0x21 lookup() at lookup+0x981 namei() at namei+0x33e kern_statfs() at kern_statfs+0x60 statfs() at statfs+0x2a syscall() at syscall+0x256 Xfast_syscall() at Xfast_syscall+0xab --- syscall (396, FreeBSD ELF64, statfs), rip = 0x8022ade1c, rsp = 0x7fffc528, rbp = 0x802536bb8 --- Kernel config: GENERIC config was changed: I disabled options for hardware that I do not require on this server and added some options options QUOTA options KDB options DDB I am using aacu RAID driver from Adaptec's site: aacu0: Adaptec 2405, aac driver 2.2.8-17517 What can it be? Soft-updates? aacu driver problem? Something else? Any help would be appreciated. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: get accounting info for running process
Is it possible to find out how much a process have used CPU user time/system time/IO operations for now by it's pid? Like in sa, but for running process. Dan, Mel, thanks for your answers. I examined 'ps' sources and decided to use kvm_getprocs() and rusage structure. I am trying to create a daemon that would report system accounting stats for every X seconds, let's say 10. 'sa' reports about terminated processes only, but it would be nice to have more detailed system usage stats per user for a given time interval (i.e. last 10 seconds), including tasks that are not finished at the moment of querying. I can achieve this by querying list of processes each 10 seconds and producing diffs between previous and current list, saving these to some log and combining data with /var/account/acct file. The only thing I do not want to do is to invent a wheel ;-) I googled much for such solutions, but did not find any. May be someone knows existing products that has this functionality already? Thanks in advance. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
get accounting info for running process
Hello. Is it possible to find out how much a process have used CPU user time/system time/IO operations for now by it's pid? Like in sa, but for running process. Thanks in advance. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"