Re: [Lustre-discuss] Time needed to enable quota

2011-06-16 Thread Roland Laifer
On Wed, Jun 15, 2011 at 01:30:08PM +0100, Guy Coates wrote:
 On 15/06/11 13:14, Frank Heckes wrote:
  Hi all,
  
  we're planning to enable quota on our Lustre file systems running with
  version 1.8.4. We like to estimate the downtime needed to run
  quotacheck.
  
 
 
 Hi,
 
 We recently did a quotacheck on a 40M inodes used / 160 TB used
 filesystem. It took ~20 mins. (DDN 9900 backend storage).

Hi, 

I can report similar values: With 20M inodes used / 100 TB used 
lfs quotacheck on a Lustre 1.8.4 filesystem took ~10 mins. 
Backend storage of this filesystem is HP MSA2000. 

Regards, 
  Roland 

-- 
Karlsruhe Institute of Technology (KIT)
Steinbuch Centre for Computing (SCC)

Roland Laifer
Scientific Computing Services (SCS)

Zirkel 2, Building 20.21, Room 209
76131 Karlsruhe, Germany
Phone: +49 721 608 44861
Fax: +49 721 32550
Email: roland.lai...@kit.edu
Web: http://www.scc.kit.edu

KIT – University of the State of Baden-Wuerttemberg and 
National Laboratory of the Helmholtz Association
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Error when mv from Lustre to system

2011-06-16 Thread styr
It works. :)

Thanks Christian and Andreas for your time.


- Mail Original -
De: Christian Becker christian.bec...@math.tu-dortmund.de
À: Andreas Dilger adil...@whamcloud.com
Cc: s...@free.fr, lustre-discuss@lists.lustre.org
Envoyé: Mercredi 15 Juin 2011 19h53:54 GMT +01:00 Amsterdam / Berlin / Berne / 
Rome / Stockholm / Vienne
Objet: Re: [Lustre-discuss] Error when mv from Lustre to system



Andreas Dilger wrote:
 SLES cp and mv try to preserve xattrs, but I suspect they get an error when 
 trying to copy the lustre.lov xattr to a non-Lustre filesystem. 
 
 The message is just letting you know that some attributes are not copied, but 
 hopefully this does not cause the mv to return an error code. 
 

To disable this warning, add the following line

lustre.lov  skip

to the file /etc/xattr.conf. Works fine for us.

best regards,
Christian


 Cheers, Andreas
 
 On 2011-06-15, at 4:17 AM, s...@free.fr wrote:
 
 Hi Lustre subscribers,

 I have a problem which seems related to the discussion 
 http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/1092ff06ae1fb58f/82695528cc76ced6?lnk=gstq=error+mv#82695528cc76ced6

 But it happens on SLES nodes with no selinux.

 When I try to mv a file from Lustre to the system of a node, I have this 
 error : 
 mv: setting attributes for `/tmp/foo': Operation not supported

 But the file or directory is correctly moved from lustre to /tmp/foo.

 I'm using Lustre 1.8.5 on SLES 11SP1 nodes, and SLES 10 OSS and MDS.

 Do you have any clue?

 Thanks,

 --

 Jay N.
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Path lost when accessing files

2011-06-16 Thread styr
Hi Lustre users,

we actually a little problems with jobs running on our cluster and using 
Lustre. Sometimes, we have these errors : 
forrtl: No such file or directory
forrtl: severe (29): file not found, unit 213, file �@/suivi.d000

It does not only happen with forttl but also sometimes with other files. It 
tries to access a file located at : �@/suivi.d000. We also had errors when he 
was trying to access files like there were at the root of the FS, in this 
example /suivi.d000.

It's like it was loosing or corrupting the PWD environment variable.

The funny thing is that when we execute this same job again, it works 
perfectly. We didn't succeed in reproducing the errors but they still happens 
from time to time.

I didn't find any Lustre errors in my logs related the these problems.

We're using Lustre 1.8.5 on SLES 11SP1 nodes, and SLES 10 OSS and MDS.

Do you have any clue?

Thanks, 

Jay N.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] lfs quotacheck -ug /lfs01/ sleeps

2011-06-16 Thread Mohamed Adel
Dear all,

I'm trying to enable quota on my lustre file system.
Issuing the lfs quotacheck -ug /lfs01/ command doesn't produce anything.
And ps aux | grep lfs command shows that the process is sleeping.
I don't know where to go from here.

Any idea to discover what went wrong?

thanks in advance,
M.Adel
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lfs quotacheck -ug /lfs01/ sleeps

2011-06-16 Thread Ashley Pittman

On 16 Jun 2011, at 11:33, Mohamed Adel wrote:

 Dear all,
 
 I'm trying to enable quota on my lustre file system.
 Issuing the lfs quotacheck -ug /lfs01/ command doesn't produce anything.
 And ps aux | grep lfs command shows that the process is sleeping.
 I don't know where to go from here.
 
 Any idea to discover what went wrong?

This is correct, look for a kernel process called quotacheck on the Lustre 
servers, when all those threads have exited then lfs should also exit.  As came 
up yesterday this could take a few tens of minutes.

Ashley.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lfs quotacheck -ug /lfs01/ sleeps

2011-06-16 Thread Mohamed Adel
Dear Ashley,

Thanks for your quick response.
 This is correct, look for a kernel process called quotacheck on the Lustre 
 servers, when all those threads have exited then lfs should also exit.  As 
 came up yesterday this could take a few tens of minutes.

Issuing ps aux | grep quotacheck command on all lustre servers (mds and oss) 
didn't show any quotacheck process running though lfs quotacheck is still 
sleeping on the client from which I issued the lfs quotacheck. Does that mean 
the processes has finished? or something else went wrong?

thanks in advance,
M.Adel
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lfs quotacheck -ug /lfs01/ sleeps

2011-06-16 Thread Ashley Pittman

On 16 Jun 2011, at 11:54, Mohamed Adel wrote:

 Dear Ashley,
 
 Thanks for your quick response.
 This is correct, look for a kernel process called quotacheck on the Lustre 
 servers, when all those threads have exited then lfs should also exit.  As 
 came up yesterday this could take a few tens of minutes.
 
 Issuing ps aux | grep quotacheck command on all lustre servers (mds and 
 oss) didn't show any quotacheck process running though lfs quotacheck is 
 still sleeping on the client from which I issued the lfs quotacheck. Does 
 that mean the processes has finished? or something else went wrong?

This is what I see when I test it on our cluster here, this is a demo 
filesystem so very small hence the quotacheck happens very quickly, on a real 
fs you should see more processes.

$ pdsh -a ps auwx | grep quota
sabina-client0: root  6621  0.0  0.0   4280   464 pts/1S+   13:36   
0:00 lfs quotacheck /lustre/sab/client
sabina-oss1: root 30278  0.0  0.0  0 0 ?D13:36   0:00 
[quotacheck]
sabina-mds1: root 12310  0.0  0.0  61160   776 pts/0S+   13:36   0:00 
grep quota

This is on a 1.8 filesystem, I believe it used to work differently on 1.6.

Ashley.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] $MOUNT2 in acc-sm

2011-06-16 Thread Brian J. Murrell
On 11-06-15 05:58 PM, Jay Lan wrote:
 
 I found my problem!
 
 I defined MOUNT=/mnt/nbp0 and MOUNT2=/mnt/nbp0-2.
 Bad idea!!!
 
 The sanity_mount_check* scripts use `grep` to search for
 $MOUNT and $MOUNT2. Since $MOUNT is a substring
 of $MOUNT2, `grep` on situations return wrong count!

That sounds like a bug.  Can you please file a ticket at
http://jira.whamcould.com/ detailing your problem and solution?

Thanx,
b.

-- 
Brian J. Murrell
Senior Software Engineer
Whamcloud, Inc.



signature.asc
Description: OpenPGP digital signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] $MOUNT2 in acc-sm

2011-06-16 Thread Peter Jones
A slight typo - http://jira.whamcloud.com

On 11-06-16 5:07 AM, Brian J. Murrell wrote:
 snip
 That sounds like a bug.  Can you please file a ticket at
 http://jira.whamcould.com/ detailing your problem and solution?



-- 
Peter Jones
Whamcloud, Inc.
www.whamcloud.com

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] $MOUNT2 in acc-sm

2011-06-16 Thread Brian J. Murrell
On 11-06-16 10:15 AM, Peter Jones wrote:
 A slight typo - http://jira.whamcloud.com

Thanks Peter.

 On 11-06-16 5:07 AM, Brian J. Murrell wrote:
 snip
 That sounds like a bug.  Can you please file a ticket at
 http://jira.whamcould.com/ detailing your problem and solution?
   ^
LOL.

b.


-- 
Brian J. Murrell
Senior Software Engineer
Whamcloud, Inc.



signature.asc
Description: OpenPGP digital signature
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] LustreError: 26019:0:(file.c:3143:ll_inode_revalidate_fini()) failure -2 inode

2011-06-16 Thread fenix . serega
Hi

Lustre 1.8

A lot of LustreErrors on client:

ustreError: 8747:0:(file.c:3143:ll_inode_revalidate_fini()) Skipped 6
previous similar messages
LustreError: 8747:0:(file.c:3143:ll_inode_revalidate_fini()) failure -2
inode 63486047
LustreError: 8747:0:(file.c:3143:ll_inode_revalidate_fini()) Skipped 4
previous similar messages
LustreError: 26019:0:(file.c:3143:ll_inode_revalidate_fini()) failure -2
inode 54366423
LustreError: 26019:0:(file.c:3143:ll_inode_revalidate_fini()) Skipped 7
previous similar messages
LustreError: 26019:0:(file.c:3143:ll_inode_revalidate_fini()) failure -2
inode 43338388
LustreError: 26019:0:(file.c:3143:ll_inode_revalidate_fini()) Skipped 3
previous similar messages
LustreError: 26019:0:(file.c:3143:ll_inode_revalidate_fini()) failure -2
inode 14273218
LustreError: 26019:0:(file.c:3143:ll_inode_revalidate_fini()) Skipped 10
previous similar messages
LustreError: 26019:0:(file.c:3143:ll_inode_revalidate_fini()) failure -2
inode 10272497
LustreError: 26019:0:(file.c:3143:ll_inode_revalidate_fini()) failure -2
inode 32001327
LustreError: 26019:0:(file.c:3143:ll_inode_revalidate_fini()) Skipped 1
previous similar message
LustreError: 8747:0:(file.c:3143:ll_inode_revalidate_fini()) failure -2
inode 50378921
LustreError: 8747:0:(file.c:3143:ll_inode_revalidate_fini()) Skipped 1
previous similar message

What does it mean !? failure -2 !? It seems all working correctly, no errors
on OSS's or MDS.


Thanks
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Unexpect file system error during normal system works

2011-06-16 Thread Piotr Przybylo
We have a problem with lustre, in connection with this I wanted to ask 
you, can you help us ?

We have a unexpect file system error during normal system working.
/
Jun 13 15:00:30 ossw12 kernel: LDISKFS-fs error (device dm-9):
mb_free_blocks: double-free of inode 82041293's block 346591170(bit 4034
in group 10577)
Jun 13 15:00:30 ossw12 kernel:
Jun 13 15:00:30 ossw12 kernel: Aborting journal on device dm-9.
Jun 13 15:00:30 ossw12 kernel: Remounting filesystem read-only
Jun 13 15:00:30 ossw12 kernel: LDISKFS-fs error (device dm-9):
mb_free_blocks: 3LustreError:
4026:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) error starting
handle for op 8 (106 credits): rc -30
Jun 13 15:00:30 ossw12 kernel: double-free of inode 82041293's block
346591171(bit 4035 in group 10577)/


Jun 13 15:06:53 ossw12 kernel: LDISKFS-fs error (device dm-12):
mb_free_blocks: double-free of inode 90143054's block 125314561(bit 9729
in group 3824)
Jun 13 15:06:53 ossw12 kernel:
Jun 13 15:06:53 ossw12 kernel: Aborting journal on device dm-12.
Jun 13 15:06:53 ossw12 kernel: Remounting filesystem read-only
Jun 13 15:06:53 ossw12 kernel: ldiskfs_abort called.
Jun 13 15:06:53 ossw12 kernel: LDISKFS-fs error (device dm-12):
ldiskfs_journal_start_sb: Detected aborted journal
Jun 13 15:06:53 ossw12 kernel: Remounting filesystem read-only


Another try to mount file system:
/
Jun 13 15:12:24 ossw12 kernel: kjournald starting.  Commit interval 5
seconds
Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs warning (device dm-9):
ldiskfs_clear_journal_err: Filesystem error recorded from previous
mount: IO failure
Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs warning (device dm-9):
ldiskfs_clear_journal_err: Marking fs in need of filesystem check.
Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs warning: mounting fs with
errors, running e2fsck is recommended
Jun 13 15:12:24 ossw12 kernel: LDISKFS FS on dm-9, internal journal
Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs: recovery complete.
Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs: mounted filesystem with
ordered data mode./

/
Jun 13 15:16:48 ossw12 kernel: kjournald starting.  Commit interval 5
seconds
Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs warning (device dm-12):
ldiskfs_clear_journal_err: Filesystem error recorded from previous
mount: IO failure
Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs warning (device dm-12):
ldiskfs_clear_journal_err: Marking fs in need of filesystem check.
Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs warning: mounting fs with
errors, running e2fsck is recommended
Jun 13 15:16:48 ossw12 kernel: LDISKFS FS on dm-12, internal journal
Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs: recovery complete.
Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs: mounted filesystem with
ordered data mode./

How we can recover or repair data from this devices ?
fsck repair some errors, but then we try mount files system we have errors:
/
Jun 13 18:39:17 ossw12 kernel: LDISKFS-fs error (device dm-9): 
mb_free_blocks: double-free of inode 82041293's block 346591170(bit 4034 
in group 10577)

Jun 13 18:39:17 ossw12 kernel:
Jun 13 18:39:17 ossw12 kernel: Aborting journal on device dm-9.
Jun 13 18:39:17 ossw12 kernel: Remounting filesystem read-only
Jun 13 18:39:17 ossw12 kernel: LDISKFS-fs error (device dm-9): 
mb_free_blocks: double-free of inode 82041293's block 346591171(bit 4035 
in group 10577)

Jun 13 18:39:17 ossw12 kernel:
Jun 13 18:39:17 ossw12 kernel: LDISKFS-fs error (device dm-9): 
mb_free_blocks: double-free of inode 82041293's block 346591172(bit 4036 
in group 10577)/


Hardware doesnt report any problems.

--

Regards

| Piotr Przybylo | Technical Support Engineer | Polcom Sp z o.o. |
| ul. Krakowska 43 | 32-050 Skawina, Poland |
| mobile: +48609539945 | tel: +48 12 652 8682 |

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Unexpect file system error during normal system works

2011-06-16 Thread Wojciech Turek
Hi Piotr,

Which lustre version is this? Also which version of e2fsprogs are you using?
Is the back end disk a software RAID or HW raid? If you can not see any
errors on your hardware I would recommend to run fsck few times until it
does does not find any problems. I also highly recommend to collect logs
from each fsck run in case they are needed for further debugging. If you are
not sure that your hardware is OK then you may want to run fsck with -n
switch and send output to mailing list.

Best regards,

Wojciech

On 16 June 2011 13:33, Piotr Przybylo piotr_przyb...@polcom.com.pl wrote:

  We have a problem with lustre, in connection with this I wanted to ask
 you, can you help us ?
 We have a unexpect file system error during normal system working.
 *
 Jun 13 15:00:30 ossw12 kernel: LDISKFS-fs error (device dm-9):
 mb_free_blocks: double-free of inode 82041293's block 346591170(bit 4034
 in group 10577)
 Jun 13 15:00:30 ossw12 kernel:
 Jun 13 15:00:30 ossw12 kernel: Aborting journal on device dm-9.
 Jun 13 15:00:30 ossw12 kernel: Remounting filesystem read-only
 Jun 13 15:00:30 ossw12 kernel: LDISKFS-fs error (device dm-9):
 mb_free_blocks: 3LustreError:
 4026:0:(fsfilt-ldiskfs.c:280:fsfilt_ldiskfs_start()) error starting
 handle for op 8 (106 credits): rc -30
 Jun 13 15:00:30 ossw12 kernel: double-free of inode 82041293's block
 346591171(bit 4035 in group 10577)*


 Jun 13 15:06:53 ossw12 kernel: LDISKFS-fs error (device dm-12):
 mb_free_blocks: double-free of inode 90143054's block 125314561(bit 9729
 in group 3824)
 Jun 13 15:06:53 ossw12 kernel:
 Jun 13 15:06:53 ossw12 kernel: Aborting journal on device dm-12.
 Jun 13 15:06:53 ossw12 kernel: Remounting filesystem read-only
 Jun 13 15:06:53 ossw12 kernel: ldiskfs_abort called.
 Jun 13 15:06:53 ossw12 kernel: LDISKFS-fs error (device dm-12):
 ldiskfs_journal_start_sb: Detected aborted journal
 Jun 13 15:06:53 ossw12 kernel: Remounting filesystem read-only


 Another try to mount file system:
 *
 Jun 13 15:12:24 ossw12 kernel: kjournald starting.  Commit interval 5
 seconds
 Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs warning (device dm-9):
 ldiskfs_clear_journal_err: Filesystem error recorded from previous
 mount: IO failure
 Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs warning (device dm-9):
 ldiskfs_clear_journal_err: Marking fs in need of filesystem check.
 Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs warning: mounting fs with
 errors, running e2fsck is recommended
 Jun 13 15:12:24 ossw12 kernel: LDISKFS FS on dm-9, internal journal
 Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs: recovery complete.
 Jun 13 15:12:24 ossw12 kernel: LDISKFS-fs: mounted filesystem with
 ordered data mode.*

 *
 Jun 13 15:16:48 ossw12 kernel: kjournald starting.  Commit interval 5
 seconds
 Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs warning (device dm-12):
 ldiskfs_clear_journal_err: Filesystem error recorded from previous
 mount: IO failure
 Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs warning (device dm-12):
 ldiskfs_clear_journal_err: Marking fs in need of filesystem check.
 Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs warning: mounting fs with
 errors, running e2fsck is recommended
 Jun 13 15:16:48 ossw12 kernel: LDISKFS FS on dm-12, internal journal
 Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs: recovery complete.
 Jun 13 15:16:48 ossw12 kernel: LDISKFS-fs: mounted filesystem with
 ordered data mode.*

 How we can recover or repair data from this devices ?
 fsck repair some errors, but then we try mount files system we have errors:
 *
 Jun 13 18:39:17 ossw12 kernel: LDISKFS-fs error (device dm-9):
 mb_free_blocks: double-free of inode 82041293's block 346591170(bit 4034 in
 group 10577)
 Jun 13 18:39:17 ossw12 kernel:
 Jun 13 18:39:17 ossw12 kernel: Aborting journal on device dm-9.
 Jun 13 18:39:17 ossw12 kernel: Remounting filesystem read-only
 Jun 13 18:39:17 ossw12 kernel: LDISKFS-fs error (device dm-9):
 mb_free_blocks: double-free of inode 82041293's block 346591171(bit 4035 in
 group 10577)
 Jun 13 18:39:17 ossw12 kernel:
 Jun 13 18:39:17 ossw12 kernel: LDISKFS-fs error (device dm-9):
 mb_free_blocks: double-free of inode 82041293's block 346591172(bit 4036 in
 group 10577)*

 Hardware doesnt report any problems.

 --

 Regards

 | Piotr Przybylo | Technical Support Engineer | Polcom Sp z o.o. |
 | ul. Krakowska 43 | 32-050 Skawina, Poland |
 | mobile: +48609539945 | tel: +48 12 652 8682 |


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




-- 
Wojciech Turek
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] What exactly is punch statistic?

2011-06-16 Thread Mervini, Joseph A
Hi,

I have been covertly trying for a long time to find out what punch means as far 
a lustre llobdstat output but have not really found anything definitive.

Can someone answer that for me? (BTW: I am not alone in my ignorance... :) )

Thanks.


Joe Mervini
Sandia National Laboratories
High Performance Computing
505.844.6770
jame...@sandia.gov




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] What exactly is punch statistic?

2011-06-16 Thread Cliff White
It is called when truncating a file - afaik it is showing you the number of
truncates, more or less.
cliffw



On Thu, Jun 16, 2011 at 10:52 AM, Mervini, Joseph A jame...@sandia.govwrote:

 Hi,

 I have been covertly trying for a long time to find out what punch means as
 far a lustre llobdstat output but have not really found anything definitive.

 Can someone answer that for me? (BTW: I am not alone in my ignorance... :)
 )

 Thanks.
 

 Joe Mervini
 Sandia National Laboratories
 High Performance Computing
 505.844.6770
 jame...@sandia.gov




 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




-- 
cliffw
Support Guy
WhamCloud, Inc.
www.whamcloud.com
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Path lost when accessing files

2011-06-16 Thread Peter Kjellström
On Thursday, June 16, 2011 03:30:38 PM Sebastien Piechurski wrote:
 Hi,
 
 This problem is documented in bug 23978
 (http://bugzilla.lustre.org/show_bug.cgi?id=23978). To summarize: the
 fortran runtime is making a call to getcwd() to get the full path to a
 file which was given as a relative path. Lustre sometimes fail to answer
 to this syscall, which returns a non initialized buffer and an error code,
 BUT the fortran runtime does not test the getcwd() return code, and uses
 the buffer as-is.
 
 The uninitialized buffer is what you see as  @, followed by the relative
 path.

 A patch is currently inspected.

Perfectly summarized I'll just add two things.

1) The patch didn't help :-(
 
2) There are two work-arounds listed in the bz, patch the kernel to retry the 
getcwd or build and use an LD_PRELOAD wrapper to retry the getcwd.

/Peter

  From: lustre-discuss-boun...@lists.lustre.org
  we actually a little problems with jobs running on our
  cluster and using Lustre. Sometimes, we have these errors :
  forrtl: No such file or directory
  
  forrtl: severe (29): file not found, unit 213, file  @/suivi.d000
...


signature.asc
Description: This is a digitally signed message part.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss