Re: [lustre-discuss] OpenSFS / EOFS Presentations index

2015-05-11 Thread Alexander I Kulyavtsev
DocDB can be handy to manage documents.
http://docdb-v.sourceforge.net/

Check "public" instance here to see examples:
https://cd-docdb.fnal.gov/

Alex.

On May 11, 2015, at 8:46 PM, Scott Nolin 
mailto:scott.no...@ssec.wisc.edu>> wrote:

It would be really convenient if all the presentations for various LUG, LAD, 
and similar meetings were available in one page.

Ideally there would also be some kind of keywords for each presentation for 
easy searches, but even just having a comprehensive list of links would be 
valuable I think.

Scott
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] OpenSFS / EOFS Presentations index

2015-05-11 Thread Scott Nolin
It would be really convenient if all the presentations for various LUG, 
LAD, and similar meetings were available in one page.


Ideally there would also be some kind of keywords for each presentation 
for easy searches, but even just having a comprehensive list of links 
would be valuable I think.


Scott
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre issue with OST setting to read-only mode as soon as writes are attempted. using Lustre 1.8.8

2015-05-11 Thread Kurt Strosahl
It took a while but now it has finished.


:~]e2fsck -fy -C 0 /dev/sdc2 -j /dev/sdd5
e2fsck 1.42.3.wc3 (15-Aug-2012)  
Pass 1: Checking inodes, blocks, and sizes   
Pass 2: Checking directory structure   
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts  
Pass 5: Checking group summary information 
Block bitmap differences:  -1845243648 -1845243668 -(1845243713--1845243714) 
-1845243738 -1845243742 -(1845243751--1845243753) -(1845243756--1845243761) 
-1845243763 -1845243765 -1845243767 -1845243769 -(1845243776--1845243778) 
-(1845243781--1845243786) -1845243790 -1845243793 -(1845243816--1845243817) 
-1845243819 -1845243822 -(1845243824--1845243826) -(1845243829--1845243831) 
-(1845243890--1845243894) -(1845243899--1845243902) -(1845244225--1845244227) 
-1845244247 -1845244275 -1845244290 -1845244294 -1845244296 -1845244301 
-1845244304 -1845244311 -1845244319 -(1845244322--1845244324) -1845244330 
-(1845244348--1845244349) -1845244352 -1845244354 -1845244360 -1845244367 
-1845244371 -1845244374 -1845244381 -(1845244385--1845244386) 
-(1845244395--1845244399) -(1845244409--1845244413) 
   
Fix? yes
   

   
lustre-OST0060: * FILE SYSTEM WAS MODIFIED *   
lustre-OST0060: 451137/22888704 files (39.9% non-contiguous), 
2331868992/2929721492 blocks

I mounted the ost but haven't set it to read-write yet, because of the below 
error...
  Lustre: lustre-OST0060: sending delayed replies to recovered clients
LustreError: 12922:0:(filter_log.c:135:filter_cancel_cookies_cb()) error 
cancelling log cookies: rc = -19
LustreError: 12922:0:(filter_log.c:135:filter_cancel_cookies_cb()) Skipped 2 
previous similar messages

An error message it was getting before.

As an aside, over the weekend we had a large number of client nodes reboot.  
When they came back up they were unable to reach the ost (it showed as 
inactive).  It wasn't displaying this behaviour before, and clients that hadn't 
rebooted were still able to see it.

w/r,
Kurt
- Original Message -
From: "Kurt Strosahl" 
To: "Colin Faber" 
Cc: lustre-discuss@lists.lustre.org
Sent: Monday, May 11, 2015 9:17:44 AM
Subject: Re: [lustre-discuss] lustre issue with OST setting to read-only mode 
as soon as writes are attempted. using Lustre 1.8.8

e2fsck 1.42.3.wc3 (15-Aug-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -1845243648 -1845243668 -(1845243713--1845243714) 
-1845243738 -1845243742 -(1845243751--1845243753) -(1845243756--1845243761) 
-1845243763 -1845243765 -1845243767 -1845243769 -(1845243776--1845243778) 
-(1845243781--1845243786) -1845243790 -1845243793 -(1845243816--1845243817) 
-1845243819 -1845243822 -(1845243824--1845243826) -(1845243829--1845243831) 
-(1845243890--1845243894) -(1845243899--1845243902) -(1845244225--1845244227) 
-1845244247 -1845244275 -1845244290 -1845244294 -1845244296 -1845244301 
-1845244304 -1845244311 -1845244319 -(1845244322--1845244324) -1845244330 
-(1845244348--1845244349) -1845244352 -1845244354 -1845244360 -1845244367 
-1845244371 -1845244374 -1845244381 -(1845244385--1845244386) 
-(1845244395--1845244399) -(1845244409--1845244413)
Fix? no

Free blocks count wrong for group #56312 (4585, counted=4499).
Fix? no

Free blocks count wrong (597852500, counted=597852414).
Fix? no


lustre-OST0060: ** WARNING: Filesystem still has errors **

lustre-OST0060: 451137/22888704 files (39.9% non-contiguous), 
2331868992/2929721492 blocks

After some discussion here we are going to run the check again and let e2fsck 
fix the problems it finds.

w/r,
Kurt


- Original Message -
From: "Colin Faber" 
To: "Kurt Strosahl" 
Cc: lustre-discuss@lists.lustre.org
Sent: Thursday, May 7, 2015 5:05:06 PM
Subject: Re: [lustre-discuss] lustre issue with OST setting to read-only mode 
as soon as writes are attempted. using Lustre 1.8.8

Hi Kurt,

What's e2fsck -fn against the target look like? Does it find issues?

Also, there are a few known fixes for similar issues such as what you
describe above, unfortunately I don't have the bug number handy, maybe
someone from Intel remembers which bug it is.

-cf


On Thu, May 7, 2015 at 11:15 AM, Kurt Strosahl  wrote:

> Nothing presently wrong with sdc2, it is a partition on a raid6 disk array
> so smartctl doesn't see anything (nor does the raid controller report a

Re: [lustre-discuss] lustre issue with OST setting to read-only mode as soon as writes are attempted. using Lustre 1.8.8

2015-05-11 Thread Kurt Strosahl
e2fsck 1.42.3.wc3 (15-Aug-2012)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences:  -1845243648 -1845243668 -(1845243713--1845243714) 
-1845243738 -1845243742 -(1845243751--1845243753) -(1845243756--1845243761) 
-1845243763 -1845243765 -1845243767 -1845243769 -(1845243776--1845243778) 
-(1845243781--1845243786) -1845243790 -1845243793 -(1845243816--1845243817) 
-1845243819 -1845243822 -(1845243824--1845243826) -(1845243829--1845243831) 
-(1845243890--1845243894) -(1845243899--1845243902) -(1845244225--1845244227) 
-1845244247 -1845244275 -1845244290 -1845244294 -1845244296 -1845244301 
-1845244304 -1845244311 -1845244319 -(1845244322--1845244324) -1845244330 
-(1845244348--1845244349) -1845244352 -1845244354 -1845244360 -1845244367 
-1845244371 -1845244374 -1845244381 -(1845244385--1845244386) 
-(1845244395--1845244399) -(1845244409--1845244413)
Fix? no

Free blocks count wrong for group #56312 (4585, counted=4499).
Fix? no

Free blocks count wrong (597852500, counted=597852414).
Fix? no


lustre-OST0060: ** WARNING: Filesystem still has errors **

lustre-OST0060: 451137/22888704 files (39.9% non-contiguous), 
2331868992/2929721492 blocks

After some discussion here we are going to run the check again and let e2fsck 
fix the problems it finds.

w/r,
Kurt


- Original Message -
From: "Colin Faber" 
To: "Kurt Strosahl" 
Cc: lustre-discuss@lists.lustre.org
Sent: Thursday, May 7, 2015 5:05:06 PM
Subject: Re: [lustre-discuss] lustre issue with OST setting to read-only mode 
as soon as writes are attempted. using Lustre 1.8.8

Hi Kurt,

What's e2fsck -fn against the target look like? Does it find issues?

Also, there are a few known fixes for similar issues such as what you
describe above, unfortunately I don't have the bug number handy, maybe
someone from Intel remembers which bug it is.

-cf


On Thu, May 7, 2015 at 11:15 AM, Kurt Strosahl  wrote:

> Nothing presently wrong with sdc2, it is a partition on a raid6 disk array
> so smartctl doesn't see anything (nor does the raid controller report any
> problems).  The raid array did have a failed drive, but the drive was
> replaced, and the rebuild started, over an hour before the first time it
> went to read-only.
>
> Looking back in the logs I see the below error (which I thought I'd put in
> my original email).
> LDISKFS-fs error (device sdc2): ldiskfs_mb_check_ondisk_bitmap: on-disk
> bitmap for group 56312corrupted: 4499 blocks free in bitmap, 4585 - in gd
>
> - Original Message -
> From: "Colin Faber" 
> To: "Kurt Strosahl" 
> Cc: lustre-discuss@lists.lustre.org
> Sent: Thursday, May 7, 2015 11:59:35 AM
> Subject: Re: [lustre-discuss] lustre issue with OST setting to read-only
> mode as soon as writes are attempted. using Lustre 1.8.8
>
> Whoops, meant to respond here...
>
> Anyways, it seems something is wrong with sdc2. What's smart tell you? any
> notices about it in dmesg?
>
> On Thu, May 7, 2015 at 8:54 AM, Kurt Strosahl  wrote:
>
> > Good Morning,
> >
> >  We recently had an ost encounter an issue with what appears to be
> its
> > journal...  The ost is sitting as a partition atop a raid6 array, which
> was
> > rebuilding due to a failed disk.  The ost has a journal on an external
> > mirrored disk.  We unmounted the ost, and ran  the following: e2fsck -y
> -C
> > 0 /dev/sdc2 -j /dev/sdd5
> >
> >  After that we remounted the ost, and as soon as the first client
> > tried to write to it after recover it went back to read-only.  We
> unmounted
> > it again, ran e2fsck again, and again it flipped to read-only the second
> > writes tried to go to it (I had set it to read only in the mds, and let
> it
> > sit for a few minutes before setting it back to read/write to make sure
> > that it was only on a write that the problem happened).
> >
> > May  7 10:28:48  kernel:
> > May  7 10:28:48  kernel: Aborting journal on device sdd5.
> > May  7 10:28:48  kernel: LDISKFS-fs (sdc2): Remounting filesystem
> read-only
> > May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> > ldiskfs_mb_free_blocks: IO failure
> > May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> > ldiskfs_reserve_inode_write: Journal has aborted
> > May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> > ldiskfs_reserve_inode_write: Journal has aborted
> > May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> > ldiskfs_ext_remove_space: Journal has aborted
> > May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> > ldiskfs_reserve_inode_write: Journal has aborted
> > May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> > ldiskfs_orphan_del: Journal has aborted
> > May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> > ldiskfs_reserve_inode_write: Journal has aborted
> > May  7 10:28:48  kernel: LDISKFS-fs error (device sdc2) in
> > ld