Goncalo,
run "lsof |grep deleted" on all nodes. If that lists any, it means some
process still has the file open. That file will not get cleaned till the
process exits or closes the file
If that command doesn't list any, there is a way(in ocfs2-1.4.2) to
clean, but it needs an unmount/mount of all nodes but you can do it on
one node at a time. Do it once and see if the orphans are cleaned, if
not do it second time. Second time it should clean.
thanks,
--Srini
Gonçalo Borges wrote:
Hi Karim...
Running the commands (in ALL clients) to identify the application/node
associated with the orphan_dir does not provide me any output.
r...@fw01 ~]# for i in 07 08 09 10 11 12 21 22 23 24 25 26; do echo
"### core$i ###"; ssh core$i "find /proc -name fd -exec ls -l {} \; |
grep deleted; lsof | grep -i deleted"; done
### core07 ###
### core08 ###
### core09 ###
### core10 ###
### core11 ###
### core12 ###
### core21 ###
### core22 ###
### core23 ###
### core24 ###
### core25 ###
### core26 ###
I've also tried "mount -o remount /site06", and several syncs, in all
clients, but without success.
The orphan file continues there... :(
Cheers
Goncalo
On 07/27/2009 04:33 PM, Karim Alkhayer wrote:
Hi Goncalo,
Here're some guidelines to rectify your issue:
*_Identify cluster node and application associated with orphan_dir_*
Run the following command(s) on each cluster node to identify which
node, application or user (holders) are associated with orphan_dir
entries.
|# find /proc -name fd -exec ls -l {} \; | grep deleted|
| or|
|# lsof | grep -i deleted|
Next, review the output of the above command(s) noting any that
relate to the OCFS2 filesystem in question.
At this point, you should be able to determine the holding process id
(pid)
*_Releasing disk space associated with OCFS2 orphan directories_*
The above step allows you to identify the pid associated with
orphaned files.
If the holding process(es) can still be gracefully interacted with
via their user interface, and you are certain that the process is
safe to stop without adverse effect upon your environment, then
shutdown the process(es) in question. Once the process(es) close
their open file descriptors, orphaned files will be deleted and the
associated disk space made available.
If the process(es) in question cannot be interacted with via their
user interface, or if you are certain the processes are no longer
required, then kill the associated process(es) i.e. `kill <pid>`. If
any process(es) are no longer communicable (i.e. zombie) or cannot be
successfully killed, a forced unmount of the OCFS2 volume in question
and/or reboot of the associated cluster node may be necessary in
order to recover the disk space associated with orphaned files.
Let us know how it goes!
Best regards,
Karim Alkhayer
*From:* ocfs2-users-boun...@oss.oracle.com
[mailto:ocfs2-users-boun...@oss.oracle.com] *On Behalf Of *Gonçalo Borges
*Sent:* Monday, July 27, 2009 4:35 PM
*To:* ocfs2-users@oss.oracle.com
*Subject:* [Ocfs2-users] How to clean orphan metadata?
Hi All...
1) I have recently deleted a big 100GB file from an OCFS2 partition.
The problem is that a "df" command still shows that partition with
142 GB of used spaced when it should report ~42Gb of used space (look
to */site06)*:
[r...@core23 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 87G 2.4G 80G 3% /
tmpfs 512M 0 512M 0% /dev/shm
none 512M 104K 512M 1% /var/lib/xenstored
/dev/mapper/iscsi04-lun1p1
851G 63G 788G 8% /site04
/dev/mapper/iscsi05-lun1p1
851G 65G 787G 8% /site05
/dev/mapper/iscsi06-lun2p1
884G 100G 785G 12% /apoio06
/dev/mapper/iscsi06-lun1p1
*851G 142G 709G 17% /site06
*2) Running "debugfs.ocfs2 /dev/mapper/iscsi06-lun1p1", I found the
following relevant file:
debugfs: ls -l //orphan_dir:0001
13 drwxr-xr-x 2 0 0 3896
27-Jul-2009 09:55 .
6 drwxr-xr-x 18 0 0 4096
9-Jul-2009 12:24 ..
524781 -rw-r--r-- 0 0 0 104857600000
24-Jul-2009 16:35 00000000000801ed
3) I need to clean this metadata information, but I can not run
"fsck.ocfs2 -f" because this is a production filesystem being
accessed by 12 clients. To run "fsck.ocfs2 -f" I would have to
unmount the partition from all the clients, and this is not a
solution at the time. The software I'm currently using is:
[r...@core09 log]# cat /etc/redhat-release
Scientific Linux SL release 5.3 (Boron)
[r...@core09 log]# uname -a
Linux core09.ncg.ingrid.pt 2.6.18-128.1.16.el5xen #1 SMP Tue Jun 30
07:06:24 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
[r...@core09 log]# rpm -qa | grep ocfs2
ocfs2-2.6.18-128.1.16.el5xen-1.4.2-1.el5
ocfs2-tools-1.4.2-1.el5
ocfs2console-1.4.2-1.el5
Is there a workaround for this?
Cheers
Goncalo
------------------------------------------------------------------------
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users
_______________________________________________
Ocfs2-users mailing list
Ocfs2-users@oss.oracle.com
http://oss.oracle.com/mailman/listinfo/ocfs2-users