Do you have the panic output... kernel stack trace. We'll need that to figure this out. Without that, we can only speculate.
mike wrote: > On 4/21/08, Tao Ma <[EMAIL PROTECTED]> wrote: > >> mike wrote: >> >>> I have changed my kernel back to 2.6.22-14-server, and now I don't get >>> the kernel panics. It seems like an issue with 2.6.24-16 and some i/o >>> made it crash... >>> >>> >>> >> OK, so it seems that it is a bug for ocfs2 kernel, not the ocfs2-tools. :) >> Then could you please describe it in more detail about how the kernel panic >> happens? >> > > Yeah, this specific issue seems like a kernel issue. > > I don't know, these are production systems and I am already getting > angry customers. I can't really test anymore. Both are standard Ubuntu > kernels. > > Okay: 2.6.22-14-server (I think still minor file access issues) > Breaks under load: 2.6.24-16-server > > > >>> However I am still getting file access timeouts once in a while. I am >>> nervous about putting more load on the setup. >>> >>> >>> >> Also please provide more details about it. >> > > I am using nginx for a frontend load balancer, and nginx for a > webserver as well. This doesn't seem to be related to the webserver at > all though, it was happening before this. > > lvs01 proxies traffic in to web01, web02, and web03 (currently using > nginx, before I was using LVS/ipvsadm) > > Every so often, one of the webservers sends me back > > >>> [EMAIL PROTECTED] .batch]# cat /etc/default/o2cb >>> >>> # O2CB_ENABLED: 'true' means to load the driver on boot. >>> O2CB_ENABLED=true >>> >>> # O2CB_BOOTCLUSTER: If not empty, the name of a cluster to start. >>> O2CB_BOOTCLUSTER=mycluster >>> >>> # O2CB_HEARTBEAT_THRESHOLD: Iterations before a node is considered dead. >>> O2CB_HEARTBEAT_THRESHOLD=7 >>> >>> >>> >> This value is a little smaller, so how did you build up your shared >> disk(iSCSI or ...)? The most common value I heard of is 61. It is about 120 >> secs. I don't know the reason and maybe Sunil can tell you. ;) >> You can also refer to >> http://oss.oracle.com/projects/ocfs2/dist/documentation/ocfs2_faq.html#TIMEOUT. >> >> >>> # O2CB_IDLE_TIMEOUT_MS: Time in ms before a network connection is >>> considered dead. >>> O2CB_IDLE_TIMEOUT_MS=10000 >>> >>> # O2CB_KEEPALIVE_DELAY_MS: Max time in ms before a keepalive packet is >>> >> sent >> >>> O2CB_KEEPALIVE_DELAY_MS=5000 >>> >>> # O2CB_RECONNECT_DELAY_MS: Min time in ms between connection attempts >>> O2CB_RECONNECT_DELAY_MS=2000 >>> >>> >>> On 4/21/08, Tao Ma <[EMAIL PROTECTED]> wrote: >>> >>> >>> >>>> Hi Mike, >>>> Are you sure it is caused by the update of ocfs2-tools? >>>> AFAIK, the ocfs2-tools only include tools like mkfs, fsck and tunefs >>>> >> etc. So >> >>>> if you don't make any change to the disk(by using this new tools), it >>>> shouldn't cause the problem of kernel panic since they are all user >>>> >> space >> >>>> tools. >>>> Then there is only one thing maybe. Have you modify >>>> >> /etc/sysconfig/o2cb(This >> >>>> is the place for RHEL, not sure the place in ubuntu)? I have checked the >>>> >> rpm >> >>>> package for RHEL, it will update /etc/sysconfig/o2cb and this file has >>>> >> some >> >>>> timeouts defined in it. >>>> So do you have some backups for this file? If yes, please restore it to >>>> >> see >> >>>> whether it helps(I can't say it for sure). >>>> If not, do you remember the old value of some timeouts you set for >>>> >> ocfs2? If >> >>>> yes, you can use o2cb configure to set them by yourself. >>>> >>>> >>>> >> > > _______________________________________________ > Ocfs2-users mailing list > [email protected] > http://oss.oracle.com/mailman/listinfo/ocfs2-users > _______________________________________________ Ocfs2-users mailing list [email protected] http://oss.oracle.com/mailman/listinfo/ocfs2-users
