Re: [lustre-discuss] refresh file layout error
On Wed, Sep 2, 2015 at 8:47 PM, Wahl, Edward wrote: > I've seen this kind of error before when doing samba to do something > stupid (and let's face it, that most everything with samba) It was a > locking issue I think. Things were being changed/deleted/ (unlinked in > actuality) as the client was trying to do something with it. > > Is the Apache process or it's spawned app(s) still working on the files > in question while serving them up? > Not as far as I know, these are result files that were generated days ago (possibly more) and should be static by now But I'll double check with the people behind the app > That would be my guess here. Any chance this is across NFS? Seen that a > great deal with this error, it used to cause crashes. > Strictly speaking it is not, but it may be because a part of the path the server 'sees'/'knows' is a symlink to the lustre filesystem which lives on nfs... Thanks, Eli > > Ed Wahl > OSC > > > -- > *From:* lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on > behalf of E.S. Rosenberg [esr+lus...@mail.hebrew.edu] > *Sent:* Wednesday, September 02, 2015 7:57 AM > *To:* lustre-discuss@lists.lustre.org > *Subject:* [lustre-discuss] refresh file layout error > > Hi all, > > I am seeing an interesting/annoying problem with lustre and am not really > sure what/where to look. > > When a webserver (galaxy using wsgi/apache2) tries to server (large) files > stored on lustre it fails to send the full file and I see the following > errors in syslog: > > Sep 2 11:50:17 hm-02 kernel: LustreError: > 6973:0:(vvp_io.c:1197:vvp_io_init()) fs01: refresh file layout > [0x28815:0x217e:0x0] error -13. > Sep 2 11:50:17 hm-02 kernel: LustreError: > 6973:0:(file.c:179:ll_close_inode_openhandle()) inode 144115772543738238 > mdc close failed: rc = -13 > > If I try to access the files through their direct path (copying to > tmp/md5sum/sha512sum) it seems to work without a problem (full file is > copied and sums agree, from different nodes). > > When we switched the storage backend to NFS the server worked fine, so my > guess is that there is an issue with the way python tries to read from the > 'disk'... > > Is anyone familiar with the error above? > > Thanks, > Eli > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] refresh file layout error
On Wed, Sep 2, 2015 at 6:51 PM, Stearman, Marc wrote: > -13 is permission denied (EACCES). Does your webserver have access to > those files? If it is running as nobody, do you have the proper > ownership/permissions set on all the directories and files? > The files were created by the web/python-server to begin with, though the apache process is running as a different user, but they are set to 644. Also if it were permissions shouldn't it not succeed to return any data? Here it's returning chunks of varying sizes every time Thanks, Eli > > -Marc > > > D. Marc Stearman > Lustre Operations Lead > stearm...@llnl.gov > 925.423.9670 > > > > > On Sep 2, 2015, at 4:57 AM, E.S. Rosenberg > wrote: > > > Hi all, > > > > I am seeing an interesting/annoying problem with lustre and am not > really sure what/where to look. > > > > When a webserver (galaxy using wsgi/apache2) tries to server (large) > files stored on lustre it fails to send the full file and I see the > following errors in syslog: > > > > Sep 2 11:50:17 hm-02 kernel: LustreError: > 6973:0:(vvp_io.c:1197:vvp_io_init()) fs01: refresh file layout > [0x28815:0x217e:0x0] error -13. > > Sep 2 11:50:17 hm-02 kernel: LustreError: > 6973:0:(file.c:179:ll_close_inode_openhandle()) inode 144115772543738238 > mdc close failed: rc = -13 > > > > If I try to access the files through their direct path (copying to > tmp/md5sum/sha512sum) it seems to work without a problem (full file is > copied and sums agree, from different nodes). > > > > When we switched the storage backend to NFS the server worked fine, so > my guess is that there is an issue with the way python tries to read from > the 'disk'... > > > > Is anyone familiar with the error above? > > > > Thanks, > > Eli > > ___ > > lustre-discuss mailing list > > lustre-discuss@lists.lustre.org > > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org > > ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] Are there any published results of how Lustre performance varies with changing Stripe size with File size
Hi, I wanted to know if there are any published results on how Lustre performance changes with changes in Stripe size for different file sizes for varying number of processes. Any help would be appreciated. Thanks and Regards, Prakrati ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] dne2: lfs setdirstripe
Olaf, I can explain the rationale for the restrictions, though I have not verified if the root only one applies to striped as well as remote directories. (It's a simple test, though. I'm just not where I can reach a test system.) Note, to be clear: DNE 2 does not replace DNE 1. Remote directories and striped directories are different things and can coexist. For enable_remote_dir, it applies only to remote directories - not striped directories. As for the rationale: If enabled, it complicates things notably from an administrative perspective... If you have multiple MDT changes in a path, it makes it harder to know what is where, and can cause files on, for example, MDT2 or MDT0, to become unreachable if MDT1 is lost. Also, if you think carefully, it doesn't really enable any use cases that can't be done otherwise - at least, none that we could find that seemed practical. As far as the root only thing: Imagine you are trying to split the load between your MDTs by assigning particular users to particular MDTs. If your users can create their own remote directories, they can escape this restriction. Also, you can open up permission by setting it to -1. I learned this by a mix of reading design docs, experimenting, and being at least tangentially involved via the PAC. I'd suggest design docs as a good place to look for more. - Patrick Farrell From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of Faaland, Olaf P. [faala...@llnl.gov] Sent: Wednesday, September 02, 2015 5:21 PM To: lustre-discuss@lists.lustre.org Subject: Re: [lustre-discuss] dne2: lfs setdirstripe The lustre we are testing with is built from commit ea383222e031cdceffbdf2e3afab3b6d1fd53c8e which is after tag 2.7.57 but before 2.7.59; so recent but not entirely current. Olaf P. Faaland Livermore Computing phone : 925-422-2263 From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of Faaland, Olaf P. [faala...@llnl.gov] Sent: Wednesday, September 02, 2015 3:17 PM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] dne2: lfs setdirstripe Hi, We have begun work on testing DNE with ZFS backend. So far we've only done the installation of the filesystem and begun educating ourselves. I see in man lfs, that "lfs setdirstripe" has some restrictions by default - only executable by root unless "mdt.*.enable_remote_dir_gid" is set - only directories on MDT can contain directories that are not on the same MDT unless "mdt.*.enable_remote_dir" 1. Are those restrictions still current, or do they refer to DNE phase 1 restrictions that no longer apply? 2. If the first, allowing only root to invoke "lfs setdirstripe" is current, what is the rationale? 3. Is there documentation, or a mailing list thread, that we should read prior to posting questions? Thanks, Olaf P. Faaland Livermore Computing phone : 925-422-2263 ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] dne2: lfs setdirstripe
The lustre we are testing with is built from commit ea383222e031cdceffbdf2e3afab3b6d1fd53c8e which is after tag 2.7.57 but before 2.7.59; so recent but not entirely current. Olaf P. Faaland Livermore Computing phone : 925-422-2263 From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of Faaland, Olaf P. [faala...@llnl.gov] Sent: Wednesday, September 02, 2015 3:17 PM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] dne2: lfs setdirstripe Hi, We have begun work on testing DNE with ZFS backend. So far we've only done the installation of the filesystem and begun educating ourselves. I see in man lfs, that "lfs setdirstripe" has some restrictions by default - only executable by root unless "mdt.*.enable_remote_dir_gid" is set - only directories on MDT can contain directories that are not on the same MDT unless "mdt.*.enable_remote_dir" 1. Are those restrictions still current, or do they refer to DNE phase 1 restrictions that no longer apply? 2. If the first, allowing only root to invoke "lfs setdirstripe" is current, what is the rationale? 3. Is there documentation, or a mailing list thread, that we should read prior to posting questions? Thanks, Olaf P. Faaland Livermore Computing phone : 925-422-2263 ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] dne2: lfs setdirstripe
Hi, We have begun work on testing DNE with ZFS backend. So far we've only done the installation of the filesystem and begun educating ourselves. I see in man lfs, that "lfs setdirstripe" has some restrictions by default - only executable by root unless "mdt.*.enable_remote_dir_gid" is set - only directories on MDT can contain directories that are not on the same MDT unless "mdt.*.enable_remote_dir" 1. Are those restrictions still current, or do they refer to DNE phase 1 restrictions that no longer apply? 2. If the first, allowing only root to invoke "lfs setdirstripe" is current, what is the rationale? 3. Is there documentation, or a mailing list thread, that we should read prior to posting questions? Thanks, Olaf P. Faaland Livermore Computing phone : 925-422-2263 ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
Re: [lustre-discuss] refresh file layout error
I've seen this kind of error before when doing samba to do something stupid (and let's face it, that most everything with samba) It was a locking issue I think. Things were being changed/deleted/ (unlinked in actuality) as the client was trying to do something with it. Is the Apache process or it's spawned app(s) still working on the files in question while serving them up? That would be my guess here. Any chance this is across NFS? Seen that a great deal with this error, it used to cause crashes. Ed Wahl OSC From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of E.S. Rosenberg [esr+lus...@mail.hebrew.edu] Sent: Wednesday, September 02, 2015 7:57 AM To: lustre-discuss@lists.lustre.org Subject: [lustre-discuss] refresh file layout error Hi all, I am seeing an interesting/annoying problem with lustre and am not really sure what/where to look. When a webserver (galaxy using wsgi/apache2) tries to server (large) files stored on lustre it fails to send the full file and I see the following errors in syslog: Sep 2 11:50:17 hm-02 kernel: LustreError: 6973:0:(vvp_io.c:1197:vvp_io_init()) fs01: refresh file layout [0x28815:0x217e:0x0] error -13. Sep 2 11:50:17 hm-02 kernel: LustreError: 6973:0:(file.c:179:ll_close_inode_openhandle()) inode 144115772543738238 mdc close failed: rc = -13 If I try to access the files through their direct path (copying to tmp/md5sum/sha512sum) it seems to work without a problem (full file is copied and sums agree, from different nodes). When we switched the storage backend to NFS the server worked fine, so my guess is that there is an issue with the way python tries to read from the 'disk'... Is anyone familiar with the error above? Thanks, Eli ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
[lustre-discuss] refresh file layout error
Hi all, I am seeing an interesting/annoying problem with lustre and am not really sure what/where to look. When a webserver (galaxy using wsgi/apache2) tries to server (large) files stored on lustre it fails to send the full file and I see the following errors in syslog: Sep 2 11:50:17 hm-02 kernel: LustreError: 6973:0:(vvp_io.c:1197:vvp_io_init()) fs01: refresh file layout [0x28815:0x217e:0x0] error -13. Sep 2 11:50:17 hm-02 kernel: LustreError: 6973:0:(file.c:179:ll_close_inode_openhandle()) inode 144115772543738238 mdc close failed: rc = -13 If I try to access the files through their direct path (copying to tmp/md5sum/sha512sum) it seems to work without a problem (full file is copied and sums agree, from different nodes). When we switched the storage backend to NFS the server worked fine, so my guess is that there is an issue with the way python tries to read from the 'disk'... Is anyone familiar with the error above? Thanks, Eli ___ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org