Re: [lustre-discuss] refresh file layout error

2015-09-02 Thread E.S. Rosenberg
On Wed, Sep 2, 2015 at 8:47 PM, Wahl, Edward  wrote:

> I've seen this kind of error before when doing samba to do something
> stupid (and let's face it, that most everything with samba)   It was a
> locking issue I think.   Things were being changed/deleted/ (unlinked in
> actuality)  as the client was trying to do something with it.
>
>  Is the Apache process or it's spawned app(s)  still working on the files
> in question while serving them up?
>
Not as far as I know, these are result files that were generated days ago
(possibly more) and should be static by now
But I'll double check with the people behind the app

> That would be my guess here.  Any chance this is across NFS?  Seen that a
> great deal with this error, it used to cause crashes.
>
Strictly speaking it is not, but it may be because a part of the path the
server 'sees'/'knows' is a symlink to the lustre filesystem which lives on
nfs...


Thanks,
Eli

>
> Ed Wahl
> OSC
>
>
> --
> *From:* lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on
> behalf of E.S. Rosenberg [esr+lus...@mail.hebrew.edu]
> *Sent:* Wednesday, September 02, 2015 7:57 AM
> *To:* lustre-discuss@lists.lustre.org
> *Subject:* [lustre-discuss] refresh file layout error
>
> Hi all,
>
> I am seeing an interesting/annoying problem with lustre and am not really
> sure what/where to look.
>
> When a webserver (galaxy using wsgi/apache2) tries to server (large) files
> stored on lustre it fails to send the full file and I see the following
> errors in syslog:
>
> Sep  2 11:50:17 hm-02 kernel: LustreError:
> 6973:0:(vvp_io.c:1197:vvp_io_init()) fs01: refresh file layout
> [0x28815:0x217e:0x0] error -13.
> Sep  2 11:50:17 hm-02 kernel: LustreError:
> 6973:0:(file.c:179:ll_close_inode_openhandle()) inode 144115772543738238
> mdc close failed: rc = -13
>
> If I try to access the files through their direct path (copying to
> tmp/md5sum/sha512sum) it seems to work without a problem (full file is
> copied and sums agree, from different nodes).
>
> When we switched the storage backend to NFS the server worked fine, so my
> guess is that there is an issue with the way python tries to read from the
> 'disk'...
>
> Is anyone familiar with the error above?
>
> Thanks,
> Eli
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] refresh file layout error

2015-09-02 Thread E.S. Rosenberg
On Wed, Sep 2, 2015 at 6:51 PM, Stearman, Marc  wrote:

> -13 is permission denied (EACCES).  Does your webserver have access to
> those files?  If it is running as nobody, do you have the proper
> ownership/permissions set on all the directories and files?
>
The files were created by the web/python-server to begin with, though the
apache process is running as a different user, but they are set to 644.
Also if it were permissions shouldn't it not succeed to return any data?
Here it's returning chunks of varying sizes every time

Thanks,
Eli

>
> -Marc
>
> 
> D. Marc Stearman
> Lustre Operations Lead
> stearm...@llnl.gov
> 925.423.9670
>
>
>
>
> On Sep 2, 2015, at 4:57 AM, E.S. Rosenberg 
> wrote:
>
> > Hi all,
> >
> > I am seeing an interesting/annoying problem with lustre and am not
> really sure what/where to look.
> >
> > When a webserver (galaxy using wsgi/apache2) tries to server (large)
> files stored on lustre it fails to send the full file and I see the
> following errors in syslog:
> >
> > Sep  2 11:50:17 hm-02 kernel: LustreError:
> 6973:0:(vvp_io.c:1197:vvp_io_init()) fs01: refresh file layout
> [0x28815:0x217e:0x0] error -13.
> > Sep  2 11:50:17 hm-02 kernel: LustreError:
> 6973:0:(file.c:179:ll_close_inode_openhandle()) inode 144115772543738238
> mdc close failed: rc = -13
> >
> > If I try to access the files through their direct path (copying to
> tmp/md5sum/sha512sum) it seems to work without a problem (full file is
> copied and sums agree, from different nodes).
> >
> > When we switched the storage backend to NFS the server worked fine, so
> my guess is that there is an issue with the way python tries to read from
> the 'disk'...
> >
> > Is anyone familiar with the error above?
> >
> > Thanks,
> > Eli
> > ___
> > lustre-discuss mailing list
> > lustre-discuss@lists.lustre.org
> > http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>
>
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Are there any published results of how Lustre performance varies with changing Stripe size with File size

2015-09-02 Thread Prakrati.Agrawal
Hi,

I wanted to know if there are any published results on how Lustre performance 
changes with changes in Stripe size for different file sizes for varying number 
of processes.
Any help would be appreciated.

Thanks and Regards,
Prakrati
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] dne2: lfs setdirstripe

2015-09-02 Thread Patrick Farrell
Olaf,

I can explain the rationale for the restrictions, though I have not verified if 
the root only one applies to striped as well as remote directories.  (It's a 
simple test, though.  I'm just not where I can reach a test system.)

Note, to be clear: DNE 2 does not replace DNE 1.  Remote directories and 
striped directories are different things and can coexist.

For enable_remote_dir, it applies only to remote directories - not striped 
directories.  

As for the rationale: If enabled, it complicates things notably from an 
administrative perspective...  If you have multiple MDT changes in a path, it 
makes it harder to know what is where, and can cause files on, for example, 
MDT2 or MDT0, to become unreachable if MDT1 is lost.  Also, if you think 
carefully, it doesn't really enable any use cases that can't be done otherwise 
- at least, none that we could find that seemed practical.

As far as the root only thing:
Imagine you are trying to split the load between your MDTs by assigning 
particular users to particular MDTs.  If your users can create their own remote 
directories, they can escape this restriction.  Also, you can open up 
permission by setting it to -1.

I learned this by a mix of reading design docs, experimenting, and being at 
least tangentially involved via the PAC.
I'd suggest design docs as a good place to look for more.

- Patrick Farrell

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
Faaland, Olaf P. [faala...@llnl.gov]
Sent: Wednesday, September 02, 2015 5:21 PM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] dne2: lfs setdirstripe

The lustre we are testing with is built from commit

ea383222e031cdceffbdf2e3afab3b6d1fd53c8e

which is after tag 2.7.57 but before 2.7.59; so recent but not entirely current.

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
Faaland, Olaf P. [faala...@llnl.gov]
Sent: Wednesday, September 02, 2015 3:17 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] dne2: lfs setdirstripe

Hi,

We have begun work on testing DNE with ZFS backend.  So far we've only done the 
installation of the filesystem and begun educating ourselves.

I see in man lfs, that "lfs setdirstripe" has some restrictions by default
 - only executable by root unless "mdt.*.enable_remote_dir_gid" is set
 - only directories on MDT can contain directories that are not on the same 
MDT unless "mdt.*.enable_remote_dir"

1. Are those restrictions still current, or do they refer to DNE phase 1 
restrictions that no longer apply?

2. If the first, allowing only root to invoke "lfs setdirstripe" is current, 
what is the rationale?

3. Is there documentation, or a mailing list thread, that we should read prior 
to posting questions?

Thanks,

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] dne2: lfs setdirstripe

2015-09-02 Thread Faaland, Olaf P.
The lustre we are testing with is built from commit

ea383222e031cdceffbdf2e3afab3b6d1fd53c8e

which is after tag 2.7.57 but before 2.7.59; so recent but not entirely current.

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
Faaland, Olaf P. [faala...@llnl.gov]
Sent: Wednesday, September 02, 2015 3:17 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] dne2: lfs setdirstripe

Hi,

We have begun work on testing DNE with ZFS backend.  So far we've only done the 
installation of the filesystem and begun educating ourselves.

I see in man lfs, that "lfs setdirstripe" has some restrictions by default
 - only executable by root unless "mdt.*.enable_remote_dir_gid" is set
 - only directories on MDT can contain directories that are not on the same 
MDT unless "mdt.*.enable_remote_dir"

1. Are those restrictions still current, or do they refer to DNE phase 1 
restrictions that no longer apply?

2. If the first, allowing only root to invoke "lfs setdirstripe" is current, 
what is the rationale?

3. Is there documentation, or a mailing list thread, that we should read prior 
to posting questions?

Thanks,

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] dne2: lfs setdirstripe

2015-09-02 Thread Faaland, Olaf P.
Hi,

We have begun work on testing DNE with ZFS backend.  So far we've only done the 
installation of the filesystem and begun educating ourselves.

I see in man lfs, that "lfs setdirstripe" has some restrictions by default
 - only executable by root unless "mdt.*.enable_remote_dir_gid" is set
 - only directories on MDT can contain directories that are not on the same 
MDT unless "mdt.*.enable_remote_dir"

1. Are those restrictions still current, or do they refer to DNE phase 1 
restrictions that no longer apply?

2. If the first, allowing only root to invoke "lfs setdirstripe" is current, 
what is the rationale?

3. Is there documentation, or a mailing list thread, that we should read prior 
to posting questions?

Thanks,

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] refresh file layout error

2015-09-02 Thread Wahl, Edward
I've seen this kind of error before when doing samba to do something stupid 
(and let's face it, that most everything with samba)   It was a locking issue I 
think.   Things were being changed/deleted/ (unlinked in actuality)  as the 
client was trying to do something with it.

 Is the Apache process or it's spawned app(s)  still working on the files in 
question while serving them up?
That would be my guess here.  Any chance this is across NFS?  Seen that a great 
deal with this error, it used to cause crashes.

Ed Wahl
OSC



From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
E.S. Rosenberg [esr+lus...@mail.hebrew.edu]
Sent: Wednesday, September 02, 2015 7:57 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] refresh file layout error

Hi all,

I am seeing an interesting/annoying problem with lustre and am not really sure 
what/where to look.

When a webserver (galaxy using wsgi/apache2) tries to server (large) files 
stored on lustre it fails to send the full file and I see the following errors 
in syslog:

Sep  2 11:50:17 hm-02 kernel: LustreError: 6973:0:(vvp_io.c:1197:vvp_io_init()) 
fs01: refresh file layout [0x28815:0x217e:0x0] error -13.
Sep  2 11:50:17 hm-02 kernel: LustreError: 
6973:0:(file.c:179:ll_close_inode_openhandle()) inode 144115772543738238 mdc 
close failed: rc = -13

If I try to access the files through their direct path (copying to 
tmp/md5sum/sha512sum) it seems to work without a problem (full file is copied 
and sums agree, from different nodes).

When we switched the storage backend to NFS the server worked fine, so my guess 
is that there is an issue with the way python tries to read from the 'disk'...

Is anyone familiar with the error above?

Thanks,
Eli
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] refresh file layout error

2015-09-02 Thread E.S. Rosenberg
Hi all,

I am seeing an interesting/annoying problem with lustre and am not really
sure what/where to look.

When a webserver (galaxy using wsgi/apache2) tries to server (large) files
stored on lustre it fails to send the full file and I see the following
errors in syslog:

Sep  2 11:50:17 hm-02 kernel: LustreError:
6973:0:(vvp_io.c:1197:vvp_io_init()) fs01: refresh file layout
[0x28815:0x217e:0x0] error -13.
Sep  2 11:50:17 hm-02 kernel: LustreError:
6973:0:(file.c:179:ll_close_inode_openhandle()) inode 144115772543738238
mdc close failed: rc = -13

If I try to access the files through their direct path (copying to
tmp/md5sum/sha512sum) it seems to work without a problem (full file is
copied and sums agree, from different nodes).

When we switched the storage backend to NFS the server worked fine, so my
guess is that there is an issue with the way python tries to read from the
'disk'...

Is anyone familiar with the error above?

Thanks,
Eli
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org