Hi Mi, I don't think there have been any applicable commits since 06/28 to Orange-Branch that would address this issue. Is the panic consistently reproducible? If so, what workload leads to the panic? Single client with writes to a single file? I'll look at the logs to see if anything stands out, otherwise I may need to locally reproduce the issue to track down what's going on.
Thanks for reporting the issue, Michael On Tue, Jul 12, 2011 at 5:43 PM, Mi Zhou <[email protected]> wrote: > Hi, > > I checked out the code from the cvs branch on 6/28, I don't see an > immediate kernel panic any more, but still got kernel panic after some > intensive write to the file system (pls see attached screen shot). > > This is what is in the server log: > pvfs02: [E 07/12/2011 16:05:30] Error: encourage_recv_incoming: mop_id > 17bb01c0 in RTS_DONE message not found. > pvfs02: [E 07/12/2011 16:05:30] > [bt] /opt/pvfs/pvfs/sbin/pvfs2-server(error+0xca) [0x449d3a] > pvfs02: [E 07/12/2011 16:05:30] > [bt] /opt/pvfs/pvfs/sbin/pvfs2-server [0x446f30] > pvfs02: [E 07/12/2011 16:05:30] > [bt] /opt/pvfs/pvfs/sbin/pvfs2-server [0x448a79] > pvfs02: [E 07/12/2011 16:05:30] > [bt] /opt/pvfs/pvfs/sbin/pvfs2-server(BMI_testunexpected+0x383) > [0x445263] > pvfs02: [E 07/12/2011 16:05:30] > [bt] /opt/pvfs/pvfs/sbin/pvfs2-server [0x47b27c] > pvfs02: [E 07/12/2011 16:05:30] [bt] /lib64/libpthread.so.0 > [0x3ba820673d] > pvfs02: [E 07/12/2011 16:05:30] [bt] /lib64/libc.so.6(clone > +0x6d) [0x3ba7ad44bd] > > > > And this is the client log: > > [E 16:10:17.359727] job_time_mgr_expire: job time out: cancelling bmi > operation, job_id: 173191. > [E 16:10:19.371926] Warning: ib_tcp_client_connect: connect to server > pvfs02:3337: Connection refused. > [E 16:10:19.371943] Receive immediately failed: Connection refused > [E 16:10:21.382875] Warning: ib_tcp_client_connect: connect to server > pvfs02:3337: Connection refused. > [E 16:10:21.382888] Receive immediately failed: Connection refused > > We have pvfs01-03 as running the pvfs-server. Both client and server are > on centos 5 x63_64, kernel version 2.6.18-238.9.1.el5. > > Any advice? > > Thanks, > > Mi > > > On Thu, 2011-07-07 at 12:33 -0500, Ted Hesselroth wrote: > > That did resolve the problem. Thanks. > > > > On 7/7/2011 11:19 AM, Michael Moore wrote: > > > Hi Ted, > > > > > > There was a regression when adding support for newer kernels that made > it in > > > to the 2.8.4 release. I believe that's the issue you're seeing (a > kernel > > > panic immediately on modprobe/insmod). The next release will include > that > > > fix. Until then, if you can check out the latest version of the code > from > > > CVS, it should resolve the issue. The CVS branch is Orange-Branch, full > > > directions for CVS checkout at http://www.orangefs.org/support/ > > > > > > We are currently running the kernel module with the latest code on > CentOS 5 > > > and SL 6 systems. Let me know how it goes. > > > > > > For anyone interested, the commit to resolve the issue was: > > > > http://www.pvfs.org/fisheye/changelog/~br=Orange-Branch/PVFS/?cs=Orange-Branch:mtmoore:20110530154853 > > > > > > Michael > > > > > > > > > On Thu, Jul 7, 2011 at 11:36 AM, Ted Hesselroth<[email protected]> wrote: > > > > > >> I have built the kernel module from orangefs-2.8.4 source against a > 64-bit > > >> 2.6.18-238.12.1 linux kernel source, and against a 32-bit > 2.6.18-238.9.1 > > >> source. In both cases, the kernel hung when the module was inserted > with > > >> insmod. The first did report "kernel: Oops: 0000 [1] SMP". The > distributions > > >> are Scientific Linux 5.x, which is rpm-based and similar to Centos. > > >> > > >> Are there kernels for this scenario for which the build is known to > work? > > >> The server build and install went fine, but I would like to configure > some > > >> clients to access orangefs through a mount point. > > >> > > >> Thanks. > > >> > > >> ______________________________**_________________ > > >> Pvfs2-users mailing list > > >> Pvfs2-users@beowulf-**underground.org< > [email protected]> > > >> http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users< > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users> > > >> > > > > > _______________________________________________ > > Pvfs2-users mailing list > > [email protected] > > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > > -- > > Mi Zhou > System Integration Engineer > Information Sciences > St. Jude Children's Research Hospital > 262 Danny Thomas Pl. MS 312 > Memphis, TN 38105 > 901.595.5771 > > Email Disclaimer: www.stjude.org/emaildisclaimer >
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
