Re: [lustre-discuss] [EXTERNAL] Re: Data recovery with lost MDT data

2023-09-22 Thread Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.] via lustre-discuss
I’m only showing you the last 10 directories below but there are about 30 or 40 
directories with a pretty uniform distribution between 6/20 and now.  If it was 
a situation where we had been rolled back to 6/20 but directories were starting 
to be updated again, there should be a big gap with no updates.  The rollback 
(when we deleted the “snapshot”) happened on Monday, 9/18.  We could do another 
snapshot of the MDT, mount it read only and poke around in there if you think 
that would help.  Actually, our backup process (which is running normally 
again) is doing just that.  It takes quite a long time to complete so there is 
opportunity for me to investigate.



From: Andreas Dilger 
Date: Friday, September 22, 2023 at 1:36 AM
To: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" 

Cc: "lustre-discuss@lists.lustre.org" 
Subject: Re: [EXTERNAL] Re: [lustre-discuss] Data recovery with lost MDT data

CAUTION: This email originated from outside of NASA.  Please take care when 
clicking links or opening attachments.  Use the "Report Message" button to 
report suspicious messages to the NASA SOC.


On Sep 21, 2023, at 16:06, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, 
Inc.] mailto:darby.vicke...@nasa.gov>> wrote:

I knew an lfsck would identify the orphaned objects.  That’s great that it will 
move those objects to an area we can triage.  With ownership still intact (and 
I assume time stamps too), I think this will be helpful for at least some of 
the users to recover some of their data.  Thanks Andreas.

I do have another question.  Even with the MDT loss, the top level user 
directories on the file system are still showing current modification times.  I 
was a little surprised to see this – my expectation was that the most current 
time would be from the snapshot that we accidentally reverted to, 6/20/2023 in 
this case.  Does this make sense?

The timestamps of the directories are only stored on the MDT (unlike regular 
files which keep of the timestamp on both the MDT and OST).  Is it possible 
that users (or possibly recovered clients with existing mountpoints) have 
started to access the filesystem in the past few days since it was recovered, 
or an admin was doing something that would have caused the directories to be 
modified?


Is it possible you have a newer copy of the MDT than you thought?


[dvicker@dvicker ~]$ ls -lrt /ephemeral/ | tail
  4 drwx-- 2 abjuarez   abjuarez 4096 Sep 12 
13:24 abjuarez/
  4 drwxr-x--- 2 ksmith29   ksmith29 4096 Sep 13 
15:37 ksmith29/
  4 drwxr-xr-x55 bjjohn10   bjjohn10 4096 Sep 13 
16:36 bjjohn10/
  4 drwxrwx--- 3 cbrownsc   ccp_fast 4096 Sep 14 
12:27 cbrownsc/
  4 drwx-- 3 fgholiza   fgholiza 4096 Sep 18 
06:41 fgholiza/
  4 drwx-- 5 mtfoste2   mtfoste2 4096 Sep 19 
11:35 mtfoste2/
  4 drwx-- 4 abeniniabenini  4096 Sep 19 
15:33 abenini/
  4 drwx-- 9 pdetremp   pdetremp 4096 Sep 19 
16:49 pdetremp/
[dvicker@dvicker ~]$



From: Andreas Dilger mailto:adil...@whamcloud.com>>
Date: Thursday, September 21, 2023 at 2:33 PM
To: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" 
mailto:darby.vicke...@nasa.gov>>
Cc: "lustre-discuss@lists.lustre.org" 
mailto:lustre-discuss@lists.lustre.org>>
Subject: [EXTERNAL] Re: [lustre-discuss] Data recovery with lost MDT data

CAUTION: This email originated from outside of NASA.  Please take care when 
clicking links or opening attachments.  Use the "Report Message" button to 
report suspicious messages to the NASA SOC.



In the absence of backups, you could try LFSCK to link all of the orphan OST 
objects into .lustre/lost+found (see lctl-lfsck_start.8 man page for details).

The data is still in the objects, and they should have UID/GID/PRJID assigned 
(if used) but they have no filenames.  It would be up to you to make e.g. 
per-user lost+found directories in their home directories and move the files 
where they could access them and see if they want to keep or delete the files.

How easy/hard this is to do depends on whether the files have any content that 
can help identify them.

There was a Lustre hackathon project to save the Lustre JobID in a "user.job" 
xattr on every object, exactly to help identify the provenance of files after 
the fact (regardless of whether there is corruption), but it only just landed 
to master and will be in 2.16. That is cold comfort, but would help in the 
future.
Cheers, Andreas



On Sep 20, 2023, at 15:34, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, 
Inc.] via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:
Hello,

We have recently accidentally deleted some of our MDT data.  I think its gone 
for good but looking for advice to see if there is any way to recover.  
Thoughts

Re: [lustre-discuss] [EXTERNAL EMAIL] Re: Lustre 2.15.3: patching the kernel fails

2023-09-22 Thread Andreas Dilger via lustre-discuss
On Sep 22, 2023, at 01:45, Jan Andersen mailto:j...@comind.io>> 
wrote:

Hi Andreas,

Thank you for your insightful reply. I didn't know Rocky; I see there's a 
version 9 as well - is ver 8 better, since it is more mature?

There is an el9.2 ldiskfs series that would likely also apply to the Rocky9.2 
kernel of the same version.  We are currently using el8.8 servers in production 
and I'm not sure how many people are using 9.2 yet.  On the client side, 
Debian/Ubuntu are widely used.

You mention zfs, which I really liked when I worked on Solaris, but when I 
tried it on Linux it seemed to perform poorly, but that was in Ubuntu; is it 
better in Redhat et al.?

I would think Ubuntu/Debian is working with ZFS better (and may even have ZFS 
.deb packages available in the distro, which RHEL will likely never have).  
It's true the ZFS performance is worse than ldiskfs, but can make it easier to 
use.  That is up to you.

Cheers, Andreas


/jan

On 21/09/2023 18:40, Andreas Dilger wrote:
The first yes toon to ask is what is your end goal?  If you just want to build 
only a client that is mounting to an existing server, then you can disable the 
server functionality:
./configure --disable-server
and it should build fine.
If you want to also build a server, and *really* want it to run Debian instead 
of eg. Rocky 8, then you could disable ldiskfs and use ZFS:
./configure --disable-ldiskfs
You need to have installed ZFS first (either pre-packaged or built yourself), 
but it is less kernel-specific than ldiskfs.
Cheers, Andreas
On Sep 21, 2023, at 10:35, Jan Andersen mailto:j...@comind.io>> 
wrote:

My system: Debian 11, kernel version 5.10.0-13-amd64; I have the following 
source code:

# ll /usr/src/
total 117916
drwxr-xr-x  2 root root  4096 Aug 21 09:19 linux-config-5.10/
drwxr-xr-x  4 root root  4096 Jul 25  2022 linux-headers-5.10.0-12-amd64/
drwxr-xr-x  4 root root  4096 Jul 25  2022 linux-headers-5.10.0-12-common/
drwxr-xr-x  4 root root  4096 Jul 25  2022 linux-headers-5.10.0-13-amd64/
drwxr-xr-x  4 root root  4096 Jul 25  2022 linux-headers-5.10.0-13-common/
drwxr-xr-x  4 root root  4096 Aug 11 09:59 linux-headers-5.10.0-24-amd64/
drwxr-xr-x  4 root root  4096 Aug 11 09:59 linux-headers-5.10.0-24-common/
drwxr-xr-x  4 root root  4096 Aug 21 09:19 linux-headers-5.10.0-25-amd64/
drwxr-xr-x  4 root root  4096 Aug 21 09:19 linux-headers-5.10.0-25-common/
lrwxrwxrwx  1 root root24 Jun 30  2022 linux-kbuild-5.10 -> 
../lib/linux-kbuild-5.10
-rw-r--r--  1 root root161868 Aug 16 21:52 linux-patch-5.10-rt.patch.xz
drwxr-xr-x 25 root root  4096 Jul 14 21:24 linux-source-5.10/
-rw-r--r--  1 root root 120529768 Aug 16 21:52 linux-source-5.10.tar.xz
drwxr-xr-x  2 root root  4096 Jan 30  2023 percona-server/
lrwxrwxrwx  1 root root28 Jul 29  2022 vboxhost-6.1.36 -> 
/opt/VirtualBox/src/vboxhost
lrwxrwxrwx  1 root root32 Apr 17 19:32 vboxhost-7.0.8 -> 
../share/virtualbox/src/vboxhost


I have downloaded the source code of lustre 2.15.3:

# git checkout 2.15.3
# git clone git://git.whamcloud.com/fs/lustre-release.git

- and I'm trying to build it, following https://wiki.lustre.org/Compiling_Lustre

I've got through 'autogen.sh' and 'configure' and most of 'make debs', but when 
it comes to patching:

cd linux-stage && quilt push -a -q
Applying patch patches/rhel8/ext4-inode-version.patch
Applying patch patches/linux-5.4/ext4-lookup-dotdot.patch
Applying patch patches/suse15/ext4-print-inum-in-htree-warning.patch
Applying patch patches/linux-5.8/ext4-prealloc.patch
Applying patch patches/ubuntu18/ext4-osd-iop-common.patch
Applying patch patches/linux-5.10/ext4-misc.patch
1 out of 4 hunks FAILED
Patch patches/linux-5.10/ext4-misc.patch does not apply (enforce with -f)
make[2]: *** [autoMakefile:645: sources] Error 1
make[2]: Leaving directory '/root/repos/lustre-release/ldiskfs'
make[1]: *** [autoMakefile:652: all-recursive] Error 1
make[1]: Leaving directory '/root/repos/lustre-release'
make: *** [autoMakefile:524: all] Error 2


My best guess is that it is because the running kernel version doesn't exactly 
match the kernel source tree, but I can't seem to find that version. Am I right 
- and if so, where would I go to download the right kernel tree?

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

Cheers, Andreas
--
Andreas Dilger
Lustre Principal Architect
Whamcloud







___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] [EXTERNAL] Re: Data recovery with lost MDT data

2023-09-22 Thread Andreas Dilger via lustre-discuss
On Sep 21, 2023, at 16:06, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, 
Inc.] mailto:darby.vicke...@nasa.gov>> wrote:

I knew an lfsck would identify the orphaned objects.  That’s great that it will 
move those objects to an area we can triage.  With ownership still intact (and 
I assume time stamps too), I think this will be helpful for at least some of 
the users to recover some of their data.  Thanks Andreas.

I do have another question.  Even with the MDT loss, the top level user 
directories on the file system are still showing current modification times.  I 
was a little surprised to see this – my expectation was that the most current 
time would be from the snapshot that we accidentally reverted to, 6/20/2023 in 
this case.  Does this make sense?

The timestamps of the directories are only stored on the MDT (unlike regular 
files which keep of the timestamp on both the MDT and OST).  Is it possible 
that users (or possibly recovered clients with existing mountpoints) have 
started to access the filesystem in the past few days since it was recovered, 
or an admin was doing something that would have caused the directories to be 
modified?


Is it possible you have a newer copy of the MDT than you thought?

[dvicker@dvicker ~]$ ls -lrt /ephemeral/ | tail
  4 drwx-- 2 abjuarez   abjuarez 4096 Sep 12 
13:24 abjuarez/
  4 drwxr-x--- 2 ksmith29   ksmith29 4096 Sep 13 
15:37 ksmith29/
  4 drwxr-xr-x55 bjjohn10   bjjohn10 4096 Sep 13 
16:36 bjjohn10/
  4 drwxrwx--- 3 cbrownsc   ccp_fast 4096 Sep 14 
12:27 cbrownsc/
  4 drwx-- 3 fgholiza   fgholiza 4096 Sep 18 
06:41 fgholiza/
  4 drwx-- 5 mtfoste2   mtfoste2 4096 Sep 19 
11:35 mtfoste2/
  4 drwx-- 4 abeniniabenini  4096 Sep 19 
15:33 abenini/
  4 drwx-- 9 pdetremp   pdetremp 4096 Sep 19 
16:49 pdetremp/
[dvicker@dvicker ~]$



From: Andreas Dilger mailto:adil...@whamcloud.com>>
Date: Thursday, September 21, 2023 at 2:33 PM
To: "Vicker, Darby J. (JSC-EG111)[Jacobs Technology, Inc.]" 
mailto:darby.vicke...@nasa.gov>>
Cc: "lustre-discuss@lists.lustre.org" 
mailto:lustre-discuss@lists.lustre.org>>
Subject: [EXTERNAL] Re: [lustre-discuss] Data recovery with lost MDT data

CAUTION: This email originated from outside of NASA.  Please take care when 
clicking links or opening attachments.  Use the "Report Message" button to 
report suspicious messages to the NASA SOC.


In the absence of backups, you could try LFSCK to link all of the orphan OST 
objects into .lustre/lost+found (see lctl-lfsck_start.8 man page for details).

The data is still in the objects, and they should have UID/GID/PRJID assigned 
(if used) but they have no filenames.  It would be up to you to make e.g. 
per-user lost+found directories in their home directories and move the files 
where they could access them and see if they want to keep or delete the files.

How easy/hard this is to do depends on whether the files have any content that 
can help identify them.

There was a Lustre hackathon project to save the Lustre JobID in a "user.job" 
xattr on every object, exactly to help identify the provenance of files after 
the fact (regardless of whether there is corruption), but it only just landed 
to master and will be in 2.16. That is cold comfort, but would help in the 
future.
Cheers, Andreas


On Sep 20, 2023, at 15:34, Vicker, Darby J. (JSC-EG111)[Jacobs Technology, 
Inc.] via lustre-discuss 
mailto:lustre-discuss@lists.lustre.org>> wrote:
Hello,

We have recently accidentally deleted some of our MDT data.  I think its gone 
for good but looking for advice to see if there is any way to recover.  
Thoughts appreciated.

We run two LFS’s on the same set of hardware.  We didn’t set out to do this, 
but it kind of evolved.  The original setup was only a single filesystem and 
was all ZFS – MDT and OST’s.  Eventually, we had some small file workflows that 
we wanted to get better performance on.  To address this, we stood up another 
filesystem on the same hardware and used a an ldiskfs MDT.  However, since were 
already using ZFS, under the hood the storage device we build the ldisk MDT on 
comes from ZFS.  That gets presented to the OS as /dev/zd0.  We do a nightly 
backup of the MDT by cloning the ZFS dataset (this creates /dev/zd16, for 
whatever reason), snapshot the clone, mount that as ldiskfs, tar up the data 
and then destroy the snapshot and clone.  Well, occasionally this process gets 
interrupted, leaving the ZFS snapshot and clone hanging around.  This is where 
things go south.  Something happens that swaps the clone with the primary 
dataset.  ZFS says you’re working with the primary but its really the clone, 
and via versa.  This happened about a year ago and we caught it, were able to 
“zfs pro