Re: [lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-13 Thread Andreas Dilger
One thing to check if you are not seeing any benefit from running e2fsck, is to make sure you are using the latest e2fsprogs-1.45.2.wc1. You could also try upgrading the server to Lustre 2.10.8. Based on the kernel version, it looks like RHEL6.7, which should still work with 2.10 (the previous

Re: [lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-12 Thread Torsten Harenberg
Dear all, Am 10.03.20 um 08:18 schrieb Torsten Harenberg: > During the last days (since thursday), our Lustre instance was > surprisingly stable. We lowered a bit the load by limiting the # of > running jobs which might also helped to stablize the system. > > We enabled kdump, so if another

Re: [lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-10 Thread Torsten Harenberg
Dear all, thanks for all of your replies. Am 09.03.20 um 13:32 schrieb Andreas Dilger: > It would be better to run a full e2fsck, since that not only rebuilds > the quota tables, but also ensures that the values going into the quota > tables are correct.  Since the time taken by "tune2fs -O

Re: [lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-09 Thread Andreas Dilger
On Mar 5, 2020, at 09:11, Mohr Jr, Richard Frank mailto:rm...@utk.edu>> wrote: On Mar 5, 2020, at 2:48 AM, Torsten Harenberg mailto:harenb...@physik.uni-wuppertal.de>> wrote: [QUOTA WARNING] Usage inconsistent for ID 2901:actual (757747712, 217) != expected (664182784, 215) I assume you

Re: [lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-05 Thread Mohr Jr, Richard Frank
> On Mar 5, 2020, at 2:48 AM, Torsten Harenberg > wrote: > > [QUOTA WARNING] Usage inconsistent for ID 2901:actual (757747712, 217) > != expected (664182784, 215) I assume you are running ldiskfs as the backend? If so, have you tried regenerating the quota info for the OST? I believe the

Re: [lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-05 Thread Degremont, Aurelien
I understand you want to avoid deploying kdump, but you should focus on saving your console history somewhere. It will be difficult to help without the panic message. For 2.10, maybe I was a bit optimistic. I think you should be able to build the RPMs from sources, but pre-build packages are

Re: [lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-05 Thread Torsten Harenberg
Hi Aurélien, thanks for your quick reply, really appreciate it. Am 05.03.20 um 10:20 schrieb Degremont, Aurelien: > - What is the exact error message when the panic happens? Could you > copy/paste few log messages from this panic message? a running ssh shell only shows kernel panic - not

Re: [lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-05 Thread Degremont, Aurelien
Hello Torsten, - What is the exact error message when the panic happens? Could you copy/paste few log messages from this panic message? - Did you try searching for this pattern onto jira.whamcloud.com, to see if this is an already known bug. - It seems related to quota. Is disabling quota an

[lustre-discuss] old Lustre 2.8.0 panic'ing continously

2020-03-04 Thread Torsten Harenberg
Dear all, I know it's dared to ask for help for such an old system. We still run a CentOS 6 based Lustre 2.8.0 system (kernel-2.6.32-573.12.1.el6_lustre.x86_64, lustre-2.8.0-2.6.32_573.12.1.el6_lustre.x86_64.x86_64). It's out of warrenty and about to be replaced. The approval process for the