Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems

2017-05-19 Thread Jessica Otey

Hi Megan (et al.),

I don't understand the behavior, either... I've worked successfully with 
changelogs in the past, and indeed it is very lightweight. (Since 
robinhood has not been running anywhere, I'd already removed all the 
changelog readers from the various MDTs for the reasons you noted.)


Whatever my problem is does not manifest as a load issue, on either 
client or MDT side. It manifests rather as some sort of connection 
failure. Here's the most recent example, which maybe will generate more 
ideas as to cause.


On our third lustre fs (one we use for backups), I was able to complete 
a file system scan to populate the database, but then when I activated 
changelogs, the client almost immediately experienced the disconnections 
we've seen on the other two systems.


Here's the log from the MDT (heinlein, 10.7.17.126). The robinhood 
client is akebono (10.7.17.122):

May 16 16:05:51 heinlein kernel: Lustre: lard-MDD: changelog on
May 16 16:05:51 heinlein kernel: Lustre: Modifying parameter 
general.mdd.lard-MDT*.changelog_mask in log params
May 16 16:13:16 heinlein kernel: Lustre: lard-MDT: Client 
2d1aedc0-1f5e-2741-689a-169922a2593b (at 10.7.17.122@o2ib) reconnecting
May 16 16:13:17 heinlein kernel: Lustre: lard-MDT: Client 
2d1aedc0-1f5e-2741-689a-169922a2593b (at 10.7.17.122@o2ib) reconnecting
May 16 16:13:17 heinlein kernel: Lustre: Skipped 7458 previous similar 
messages


Here's what akebono (10.7.17.122) reported:

May 16 16:13:16 akebono kernel: LustreError: 11-0: 
lard-MDT-mdc-880fd68d7000: Communicating with 10.7.17.126@o2ib, 
operation llog_origin_handle_destroy failed with -19.
May 16 16:13:16 akebono kernel: Lustre: 
lard-MDT-mdc-880fd68d7000: Connection to lard-MDT (at 
10.7.17.126@o2ib) was lost; in progress operations using this service 
will wait for recovery to complete
May 16 16:13:16 akebono kernel: Lustre: 
lard-MDT-mdc-880fd68d7000: Connection restored to lard-MDT 
(at 10.7.17.126@o2ib)
May 16 16:13:17 akebono kernel: LustreError: 11-0: 
lard-MDT-mdc-880fd68d7000: Communicating with 10.7.17.126@o2ib, 
operation llog_origin_handle_destroy failed with -19.
May 16 16:13:17 akebono kernel: LustreError: Skipped 7458 previous 
similar messages
May 16 16:13:17 akebono kernel: Lustre: 
lard-MDT-mdc-880fd68d7000: Connection to lard-MDT (at 
10.7.17.126@o2ib) was lost; in progress operations using this service 
will wait for recovery to complete
May 16 16:13:17 akebono kernel: Lustre: Skipped 7458 previous similar 
messages
May 16 16:13:17 akebono kernel: Lustre: 
lard-MDT-mdc-880fd68d7000: Connection restored to lard-MDT 
(at 10.7.17.126@o2ib)
May 16 16:13:17 akebono kernel: Lustre: Skipped 7458 previous similar 
messages
May 16 16:13:18 akebono kernel: LustreError: 11-0: 
lard-MDT-mdc-880fd68d7000: Communicating with 10.7.17.126@o2ib, 
operation llog_origin_handle_destroy failed with -19.
May 16 16:13:18 akebono kernel: LustreError: Skipped 14924 previous 
similar messages


Jessica

On 5/19/17 8:58 AM, Ms. Megan Larko wrote:

Greetings Jessica,

I'm not sure I am correctly understanding the behavior "robinhood 
activity floods the MDT".   The robinhood program as you (and I) are 
using it is consuming the MDT CHANGELOG via a reader_id which was 
assigned when the CHANGELOG was enabled on the MDT.   You can check 
the MDS for these readers via "lctl get_param mdd.*.changelog_users".  
Each CHANGELOG reader must either be consumed by a process or 
destroyed otherwise the CHANGELOG will grow until it consumes 
sufficient space to stop the MDT from functioning correctly.  So 
robinhood should consume and then clear the CHANGELOG via this 
reader_id.  This implementation of robinhood is actually a rather 
light-weight process as far as the MDS is concerned.   The load issues 
I encountered were on the robinhood server itself which is a separate 
server from the Lustre MGS/MDS server.


Just curious, have you checked for multiple reader_id's on your MDS 
for this Lustre file system?


P.S. My robinhood configuration file is using nb_threads = 8, just for 
a data point.


Cheers,
megan





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems

2017-05-19 Thread Jessica Otey
I think that may be a red herring related to rsyslog?  When we most 
recently rebooted the MDT, this is the log (still on the box, not on the 
log server):


May  3 14:24:22 asimov kernel: LNet: HW CPU cores: 12, npartitions: 4
May  3 14:24:30 asimov kernel: LNet: Added LNI 10.7.17.8@o2ib [8/256/0/180]

And lctl list_nids gives it once:

[root@asimov ~]# lctl list_nids
10.7.17.8@o2ib

Jessica

On 5/19/17 10:13 AM, Jeff Johnson wrote:

Jessica,

You are getting a NID registering twice. Doug noticed and pointed it 
out. I'd look to see if that is one machine doing something twice or 
two machines with the same NID.


--Jeff

On Fri, May 19, 2017 at 05:58 Ms. Megan Larko <dobsonu...@gmail.com 
<mailto:dobsonu...@gmail.com>> wrote:


Greetings Jessica,

I'm not sure I am correctly understanding the behavior "robinhood
activity floods the MDT".   The robinhood program as you (and I)
are using it is consuming the MDT CHANGELOG via a reader_id which
was assigned when the CHANGELOG was enabled on the MDT. You can
check the MDS for these readers via "lctl get_param
mdd.*.changelog_users".  Each CHANGELOG reader must either be
consumed by a process or destroyed otherwise the CHANGELOG will
grow until it consumes sufficient space to stop the MDT from
functioning correctly.  So robinhood should consume and then clear
the CHANGELOG via this reader_id.  This implementation of
robinhood is actually a rather light-weight process as far as the
MDS is concerned.   The load issues I encountered were on the
robinhood server itself which is a separate server from the Lustre
MGS/MDS server.

Just curious, have you checked for multiple reader_id's on your
MDS for this Lustre file system?

P.S. My robinhood configuration file is using nb_threads = 8, just
for a data point.

Cheers,
megan

On Thu, May 18, 2017 at 2:36 PM, Jessica Otey <jo...@nrao.edu
<mailto:jo...@nrao.edu>> wrote:

Hi Megan,

Thanks for your input. We use percona, a drop-in replacement
for mysql... The robinhood activity floods the MDT, but it
does not seem to produce any excessive load on the robinhood
box...

Anyway, FWIW...

~]# mysql --version
mysql  Ver 14.14 Distrib 5.5.54-38.6, for Linux (x86_64) using
readline 5.1

Product: robinhood
Version: 3.0-1
Build:   2017-03-13 10:29:26

Compilation switches:
Lustre filesystems
Lustre Version: 2.5
Address entries by FID
MDT Changelogs supported

Database binding: MySQL

RPM: robinhood-lustre-3.0-1.lustre2.5.el6.x86_64

Lustre rpms:

lustre-client-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64
lustre-client-modules-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64


On 5/18/17 11:55 AM, Ms. Megan Larko wrote:

With regards to (WRT) Subject "Robinhood exhausting RPC
resources against 2.5.5  lustre file systems", what version
of robinhood and what version of MySQL database?   I mention
this because I have been working with robinhood-3.0-0.rc1 and
initially MySQL-5.5.32 and Lustre 2.5.42.1 on
kernel-2.6.32-573 and had issues in which the robinhood
server consumed more than the total amount of 32 CPU cores on
the robinhood server (with 128 G RAM) and would functionally
hang the robinhood server.   The issue was solved for me by
changing to MySQL-5.6.35.   It was the "sort" command in
robinhood that was not working well with the MySQL-5.5.32.

Cheers,
megan




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
<mailto:lustre-discuss@lists.lustre.org>
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

--
--
Jeff Johnson
Co-Founder
Aeon Computing

jeff.john...@aeoncomputing.com <mailto:jeff.john...@aeoncomputing.com>
www.aeoncomputing.com <http://www.aeoncomputing.com>
t: 858-412-3810 x1001   f: 858-412-3845
m: 619-204-9061

4170 Morena Boulevard, Suite D - San Diego, CA 92117

High-Performance Computing / Lustre Filesystems / Scale-out Storage


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems

2017-05-18 Thread Jessica Otey

Hi Megan,

Thanks for your input. We use percona, a drop-in replacement for 
mysql... The robinhood activity floods the MDT, but it does not seem to 
produce any excessive load on the robinhood box...


Anyway, FWIW...

~]# mysql --version
mysql  Ver 14.14 Distrib 5.5.54-38.6, for Linux (x86_64) using readline 5.1

Product: robinhood
Version: 3.0-1
Build:   2017-03-13 10:29:26

Compilation switches:
Lustre filesystems
Lustre Version: 2.5
Address entries by FID
MDT Changelogs supported

Database binding: MySQL

RPM: robinhood-lustre-3.0-1.lustre2.5.el6.x86_64

Lustre rpms:

lustre-client-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64
lustre-client-modules-2.5.5-2.6.32_642.15.1.el6.x86_64_g22a210f.x86_64


On 5/18/17 11:55 AM, Ms. Megan Larko wrote:
With regards to (WRT) Subject "Robinhood exhausting RPC resources 
against 2.5.5   lustre file systems", what version of robinhood and 
what version of MySQL database?   I mention this because I have been 
working with robinhood-3.0-0.rc1 and initially MySQL-5.5.32 and Lustre 
2.5.42.1 on kernel-2.6.32-573 and had issues in which the robinhood 
server consumed more than the total amount of 32 CPU cores on the 
robinhood server (with 128 G RAM) and would functionally hang the 
robinhood server.   The issue was solved for me by changing to 
MySQL-5.6.35.   It was the "sort" command in robinhood that was not 
working well with the MySQL-5.5.32.


Cheers,
megan



___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems

2017-05-17 Thread Jessica Otey
Update #1. Robinhood change log consumption is also producing the same 
effect against a native 2.x file system instance. So the 'legacy' aspect 
of our two production instances does not seem to be a factor...


Update #2. Currently running, per Colin Faber's suggestion: find 
/mnt/lustre -exec lfs path2fid {} \;


This does not (so far) provoke a disconnection.

Jessica

On 5/17/17 2:04 PM, Jessica Otey wrote:


We also have a third Lustre file system that originated as 2.4.3, and 
has since been upgraded to 2.5.5, against which Robinhood is currently 
operating as expected. This leads me to suppose that the issue may 
have to do the interaction between Robinhood and a 
legacy-1.8.x-now-lustre-2.5.5 system. But I don't know.





___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Robinhood exhausting RPC resources against 2.5.5 lustre file systems

2017-05-17 Thread Jessica Otey

All,

We have observed an unfortunate interaction between Robinhood and two 
Lustre 2.5.5 file systems (both of which originated as 1.8.9 file systems).


Robinhood was used successfully against these file systems when they 
were both 1.8.9, 2.4.3, and then 2.5.3 (a total time span of about 11 
months).


We also have a third Lustre file system that originated as 2.4.3, and 
has since been upgraded to 2.5.5, against which Robinhood is currently 
operating as expected. This leads me to suppose that the issue may have 
to do the interaction between Robinhood and a 
legacy-1.8.x-now-lustre-2.5.5 system. But I don't know.


The problem manifests itself as follows: Either a Robinhood file scan or 
the initiation of the consumption of changelogs results in the 
consumption all the available RPC resources on the MDT. This in turn 
leads to the MDT not being able to satisfy any other requests from 
clients, which in turn leads to client disconnections (the MDT thinks 
they are dead and evicts them). Meanwhile, Robinhood itself is unable to 
traverse the file system to gather the information it seeks, and so its 
scans either hang (due to the client disconnect) or run at a rate such 
that they would never complete (less than 1 file per second).


If we don't run robinhood at all, the file system performs (after a 
remount of the MDT) as expected.


Initially, we thought that the difficulty might be that we neglected to 
activate the FID-in-direct feature when we upgraded to 2.4.3. We did so 
on one of these systems, and ran an lfsck oi_scrub, but that did not 
ameliorate the problem.


Any thoughts on this matter would be appreciated. (We miss using Robinhood!)

Thanks,

Jessica



More data for those who cannot help themselves:

April 2016 - Robinhood comes into production use against both our 1.8.9 
file systems.


July 2016 - Upgrade to 2.4.3 (on both production lustre file systems) -- 
Robinhood rebuilt against 2.4.3 client; changelog consumption now included.


Lustre "reconnects" (from /var/log/messages on one of the MDTs):

July 2016: 4

Aug 2016: 20

Sept 2016: 8

Oct 2016: 8

Nov 4-6, 2016 - Upgrade to 2.5.3 (on both production lustre file 
systems) -- Robinhood rebuilt against 2.5.3 client.


Lustre "reconnects":

Nov. 2016: 180

Dec. 2016: 62

Jan. 2017: 96

Feb 1-24, 2017: 2

Feb 24, 2017 - Upgrade to 2.5.5 (on both production lustre file systems)

 NAASC-Lustre MDT coming back 

Feb 24 20:46:44 10.7.7.8 kernel: Lustre: Lustre: Build Version: 
2.5.5-g22a210f-CHANGED-2.6.32-642.6.2.el6_lustre.2.5.5.x86_64
Feb 24 20:46:44 10.7.7.8 kernel: Lustre: Lustre: Build Version: 
2.5.5-g22a210f-CHANGED-2.6.32-642.6.2.el6_lustre.2.5.5.x86_64
Feb 24 20:46:44 10.7.7.8 kernel: LNet: Added LNI 10.7.17.8@o2ib 
[8/256/0/180]
Feb 24 20:46:44 10.7.7.8 kernel: LNet: Added LNI 10.7.17.8@o2ib 
[8/256/0/180]
Feb 24 20:46:45 10.7.7.8 kernel: LDISKFS-fs (md127): mounted filesystem 
with ordered data mode. quota=off. Opts:
Feb 24 20:46:45 10.7.7.8 kernel: LDISKFS-fs (md127): mounted filesystem 
with ordered data mode. quota=off. Opts:
Feb 24 20:46:46 10.7.7.8 kernel: Lustre: MGC10.7.17.8@o2ib: Connection 
restored to MGS (at 0@lo)
Feb 24 20:46:46 10.7.7.8 kernel: Lustre: MGC10.7.17.8@o2ib: Connection 
restored to MGS (at 0@lo)
Feb 24 20:46:47 10.7.7.8 kernel: Lustre: naaschpc-MDT: used disk, 
loading
Feb 24 20:46:47 10.7.7.8 kernel: Lustre: naaschpc-MDT: used disk, 
loading


The night after this upgrade, a regular rsync to the backup Lustre 
system provokes a failure/client disconnect. (Unfortunately, I don't 
have the logs to look at Robinhood activity from this time, but I 
believe I restarted the service after the system came back.)


Feb 25 02:14:24 10.7.7.8 kernel: LustreError: 
25103:0:(service.c:2020:ptlrpc_server_handle_request()) @@@ Dropping 
timed-out request from 12345-10.7.17.123@o2ib: deadline 6:11s ago
Feb 25 02:14:24 10.7.7.8 kernel: LustreError: 
25103:0:(service.c:2020:ptlrpc_server_handle_request()) @@@ Dropping 
timed-out request from 12345-10.7.17.123@o2ib: deadline 6:11s ago
Feb 25 02:14:24 10.7.7.8 kernel:  req@88045b3a2050 
x1560271381909936/t0(0) 
o103->bb228923-4216-cc59-d847-38b543af1ae2@10.7.17.123@o2ib:0/0 lens 
3584/0 e 0 to 0 dl 1488006853 ref 1 fl Interpret:/0/ rc 0/-1
Feb 25 02:14:24 10.7.7.8 kernel:  req@88045b3a2050 
x1560271381909936/t0(0) 
o103->bb228923-4216-cc59-d847-38b543af1ae2@10.7.17.123@o2ib:0/0 lens 
3584/0 e 0 to 0 dl 1488006853 ref 1 fl Interpret:/0/ rc 0/-1
Feb 25 02:14:24 10.7.7.8 kernel: Lustre: 
25111:0:(service.c:2052:ptlrpc_server_handle_request()) @@@ Request took 
longer than estimated (6:11s); client may timeout. req@88045b3a2850 
x1560271381909940/t0(0) 
o103->bb228923-4216-cc59-d847-38b543af1ae2@10.7.17.123@o2ib:0/0 lens 
3584/0 e 0 to 0 dl 1488006853 ref 1 fl Interpret:/0/ rc 0/-1
Feb 25 02:14:24 10.7.7.8 kernel: 

[lustre-discuss] using an lnet router for certain connections... but not for others

2017-03-14 Thread Jessica Otey
We have three lustre systems, two "production" and one "recovery" 
system, which is in essence a (partial) backup of ONE of the production 
systems.


The main difference between the two production systems is that one uses 
IB and one uses 10G ethernet among the OSSes.


I am in the process of moving my robinhood instance from one box to 
another. The previous box was a normal "production" client, meaning it 
accessed both clients via our lnet routers (which are identical, and are 
used by both production lustre file systems).


The new box is a "data mover" for the backup lustre file system, which 
was configured WITHOUT an lnet router. Because this box has IB, I am 
able to connect directly to production lustre file system that also has 
IB. The thing is, I would also like to keep using robinhood for the 
other file system, the one using 10gig ethernet.


Is there a way to specify in the lustre.conf configuration a set up 
whereby the lnet routers could be used to access only the production 
file systems but not the backup file system?


Any leads appreciated.

Thanks,

Jessica


--
Jessica Otey
System Administrator II
North American ALMA Science Center (NAASC)
National Radio Astronomy Observatory (NRAO)
Charlottesville, Virginia (USA)

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] replacing empty OST

2017-01-19 Thread Jessica Otey

Patrick,

If I'm understanding you correctly, I think you just need to pass 
--replace when you run mkfs.lustre.


from man mkfs.lustre

  --replace
  Used  to  initialize a target with the same --index as a 
previously used target if the old target was permanently lost for some 
reason (e.g.
  multiple disk failure or massive corruption).  This 
avoids having the target try to register as a new target with the MGS.


This will allow you to specify the index number you used previously. You 
will probably need --reformat as well if it detects a filesystem already 
there.


I've done this and it doesn't require unmounting anything.

Jessica


On 01/19/2017 05:35 PM, Patrick Shopbell wrote:


Thank you very much for the reply, Marion. But I did indeed use
that option. In fact, I think that option is a safety that keeps one
from inadvertently overwriting a Lustre volume. If mkfs.lustre
detects a Lustre file system on the volume being formatted, it
notes that you have to specify "--reformat" to force the format
to overwrite the old filesystem.

It doesn't seem to have reset the info on the MGS, from what I
can tell.

I was hoping there is some way to do this without having to
unmount the entire filesystem everywhere... But perhaps there
is not. Maybe I will just skip OST 9 and move on to 10...

Thanks anyway.
Patrick


On 1/18/17 5:58 PM, Marion Hakanson wrote:

Patrick,

I'm no guru, but there's a "--reformat" option to the mkfs.lustre 
command

which you can use when re-creating a lost/destroyed OSS.  That should
tell the MGS that you intend to re-use the index.

Regards,

Marion



To: <lustre-discuss@lists.lustre.org>
From: Patrick Shopbell <p...@astro.caltech.edu>
Date: Wed, 18 Jan 2017 17:18:05 -0800
Subject: [lustre-discuss] replacing empty OST

Hi Lustre gurus -
Quick question about replacing a new, empty OST: I installed an
OST (#9) briefly from a specific machine, and then ended up
aborting that install. (It didn't work due to some version mismatch
errors.) I've since solved all those problems, reinstalled the OSS,
and reformatted the OST. (Maybe I should not have done that...)

Anyway, now I can add the OST, and it sort of works, except it
notes that OST #9 is already assigned. So I get an error like this:

[date] astrolmgs kernel: LustreErrorL 140-S: Server lustre-OST0009
requested index 9, but that index is already in use. Use --writeconf to
force.

Since I don't care about the data on there (because there isn't
any), is there any shortcut to getting this to work? Or do I just
need to shut everything down, run writeconf on the MGS and
OSS units, then start everything back up? Is there any way to
make the system think that this OST #9 volume is the same as
the earlier failed volume - since it really is the same thing,
meaning a new empy OST.

I know I could just disable OST 9 everywhere and call this one
OST 10, but I'd rather not...

I am running the old Lustre 2.5.2.

Thanks a lot,
Patrick

--

**
| Patrick Shopbell   Department of Astronomy |
| p...@astro.caltech.edu  Mail Code 249-17|
| (626) 395-4097 California Institute of Technology  |
| (626) 568-9352  (FAX)  Pasadena, CA 91125 |
| WWW: http://www.astro.caltech.edu/~pls/ |
**

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org








--
Jessica Otey
System Administrator II
North American ALMA Science Center (NAASC)
National Radio Astronomy Observatory (NRAO)
Charlottesville, Virginia (USA)

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Round robin allocation (in general and in buggy 2.5.3)

2016-12-20 Thread Jessica Otey

All,

I am looking for a more complete understanding of how the two settings 
qos_prio_free and qos_threshold_rr function together.


My current understanding, which may be inaccurate, is the following:

*qos_prio_free**
*
This setting controls how much Lustre prioritizes free space (versus 
location for the sake of performance) in allocation.
The higher this number, the more Lustre takes empty space on an OST into 
consideration for its allocation.
When set to 100%, Lustre uses ONLY empty space as the deciding factor 
for writes.


*qos_threshold_rr**
*
This setting controls how much consideration should be given to QoS in 
allocation

The higher this number, the more QOS is taken into consideration.
When set to 100%, Lustre ignores the QoS variable and hits all OSTs equally

I'm looking for several answers:

1) Is my basic understanding of the above settings correct?

2) How does lustre deal with OSTs that are 100% full? I'm curious about 
this under two conditions.


2a) When you set qos_threshold_rr=100 -- meaning, go and hit all the 
OSTs the same amount.


On one of our 2.5.3 lustre filesystems, the allocator is not working (a 
known bug, but why it seems to be behaving fine on the other one, I 
couldn't say...) and so we have configured qos_threshold_rr=100. Since 
our OSTs are pretty dramatically unbalanced, it has happened that 
attempts to write to full OSTs have caused write failures. Data deletes 
have gotten us below 90% on all OSTs now, and while I can certainly take 
the fullest OSTs them out of write mode if that is needed, it would seem 
to me that lustre should, no matter what your qos_threshold_rr setting, 
treat OSTs that are 100% full differently, meaning, it should no longer 
attempt to write to them. In short, this seems like a bug to me... 
although, granted, I suppose if you are overriding the allocator, it's 
caveat user at that point.


2b) When you set qos_threshold_rr != 100 -- meaning, the allocator is 
working


On the other lustre 2.5.3 system, the system defaults 
(qos_prio_free=91%; qos_threshold_rr=17%) are hitting all the OSTs when 
I run my test*, so I have not changed them. Several of the OSTs in this 
file system are at 100%. I get that we are not seeing write failures 
because the allocator is not allocating to these OSTs as frequently, 
based on how full they are. But I know from my test that these OSTs are 
still in the mix... so that implies to me that it would be possible, 
although less likely, to see a write failure if a write stream is opened 
on one of the 100% OSTs. I'd love to be able to quantify that "less likely".


Basically, I guess my question is: is taking an OST out of write mode 
the only (or best) way of preventing the fs from attempting to write to 
it when it is nearly full?


Thanks,
Jessica

--

*To test file allocation on your lustre system, you can use this 
one-liner from a lustre client. USE IT IN ITS OWN, NEW DIRECTORY!


touch t.{1..2000}; lfs getstripe t.*|fgrep -A1 obdidx|fgrep -v 
obdidx|fgrep -v -- --|awk '{ print $1 }'|sort|uniq -c; rm -f t.*



--
Jessica Otey
System Administrator II
North American ALMA Science Center (NAASC)
National Radio Astronomy Observatory (NRAO)
Charlottesville, Virginia (USA)

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] syntax/guidance on lfsck

2016-12-13 Thread Jessica Otey

All,
If there is anyone who has run as lfsck on a 2.5.3 system, I would 
appreciate some guidance. The manual has only been of limited help.


The situation: We were draining and OST unaware that a but in 2.5.3 
causes the inodes to remain--thus rendering the df readout inaccurate 
(and therefore not useful).


The solution: This should be fixable via lfsck--it would simply compared 
the inodes and the files and bring them back into the correct alignment 
(i.e., delete the inodes that no longer correspond to data on the OST).


The specific issue: I don't know:

1) What exact lfsck command to issue
2) Where to issue it (mds, oss)
3) What to expect as far as output/how to interact with it

What I have done:

I had issued (on the oss) the following command:

[root@naasc-oss-6 ~]# lctl lfsck_start --device naaschpc-OST0014
Started LFSCK on the device naaschpc-OST0014.

But the prompt just returns, as if it either done or it is doing 
something in the background?


I don't understand how to tell what it is doing. When I tried to stop 
it, I got:


[root@naasc-oss-6 ~]# lctl lfsck_stop --device naaschpc-OST0014
Fail to stop LFSCK: Operation already in progress

Any help interpreting these messages as well as coming up with the 
proper command to run to correct our inode issue would be helpful. 
Please keep in mind that many features described in the 2.x manual 
aren't available because we are only using 2.5.3. (For instance, the 
--type layout option, which seems to be what we want (check and repair 
MDT-OST inconsistency) is not available until 2.6.


Thanks,
Jessica



--
Jessica Otey
System Administrator II
North American ALMA Science Center (NAASC)
National Radio Astronomy Observatory (NRAO)
Charlottesville, Virginia (USA)

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] FOLLOW UP: MDT filling up with 4 MB files

2016-10-14 Thread Jessica Otey

All,
My colleagues in Chile now believe that both of their 2.5.3 file systems 
are experiencing this same problem with the MDTs filling up with files. 
We have also come across a report from another user from early 2015 
denoting the same issue, also with a 2.5.3 system.


See: 
https://www.mail-archive.com/search?l=lustre-discuss@lists.lustre.org=subject:%22Re%5C%3A+%5C%5Blustre%5C-discuss%5C%5D+MDT+partition+getting+full%22=newest


We are confident that these files are not related to the changelog feature.

Does anyone have any other suggestions as to what the cause of this 
problem could be?


I'm intrigued that the Lustre version involved in all 3 reports is 
2.5.3. Could this be a bug?


Thanks,
Jessica


On Thu, Sep 29, 2016 at 8:58 AM, Jessica Otey <jo...@nrao.edu 
<mailto:jo...@nrao.edu>> wrote:


Hello all,
I write on behalf of my colleagues in Chile, who are experiencing
a bizarre problem with their MDT, namely, it is filling up with 4
MB files. There is no issue with the number of inodes, of which
there are hundreds of millions unused. Â

[root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode
device /dev/sdb2 mounted by lustre
Filesystem features: Â  Â  Â has_journal ext_attr resize_inode
dir_index filetype needs_recovery flex_bg dirdata sparse_super
large_file huge_file uninit_bg dir_nlink quota
Inode count: Â  Â  Â  Â  Â  Â  Â 239730688
Free inodes: Â  Â  Â  Â  Â  Â  Â 223553405
Inodes per group: Â  Â  Â  Â  32768
Inode blocks per group: Â  4096
First inode: Â  Â  Â  Â  Â  Â  Â 11
Inode size:Â  Â  Â  Â  Â 512
Journal inode: Â  Â  Â  Â  Â  Â 8
Journal backup: Â  Â  Â  Â  Â  inode blocks
User quota inode: Â  Â  Â  Â  3
Group quota inode: Â  Â  Â  Â 4

Has anyone ever encountered such a problem? The only thing unusual
about this cluster is that it is using 2.5.3 MDS/OSSes while still
using 1.8.9 clients—something I didn't actually believe was
possible, as I thought the last version to work effectively with
1.8.9 clients was 2.4.3. However, for all I know, the version gap
may have nothing to do with this phenomena.

Any and all advice is appreciated. Any general information on the
structure of the MDT also welcome, as such info is in short supply
on the internet.

Thanks,
Jessica

Below is a look inside the O folder at the root of the MDT, where
there are about 48,000 4MB files:

[root@jaopost-mds O]# pwd
/lustrebackup/O
[root@jaopost-mds O]# tree -L 1
.
├── 1
├── 10
└── 20003

3 directories, 0 files

[root@jaopost-mds O]# ls -l 1
total 2240
drwx-- 2 root root 69632 sep 16 16:25 d0
drwx-- 2 root root 69632 sep 16 16:25 d1
drwx-- 2 root root 61440 sep 16 17:46 d10
drwx-- 2 root root 69632 sep 16 17:46 d11
drwx-- 2 root root 69632 sep 16 18:04 d12
drwx-- 2 root root 65536 sep 16 18:04 d13
drwx-- 2 root root 65536 sep 16 18:04 d14
drwx-- 2 root root 69632 sep 16 18:04 d15
drwx-- 2 root root 61440 sep 16 18:04 d16
drwx-- 2 root root 61440 sep 16 18:04 d17
drwx-- 2 root root 69632 sep 16 18:04 d18
drwx-- 2 root root 69632 sep 16 18:04 d19
drwx-- 2 root root 65536 sep 16 16:25 d2
drwx-- 2 root root 69632 sep 16 18:04 d20
drwx-- 2 root root 69632 sep 16 18:04 d21
drwx-- 2 root root 61440 sep 16 18:04 d22
drwx-- 2 root root 69632 sep 16 18:04 d23
drwx-- 2 root root 61440 sep 16 16:11 d24
drwx-- 2 root root 69632 sep 16 16:11 d25
drwx-- 2 root root 69632 sep 16 16:11 d26
drwx-- 2 root root 69632 sep 16 16:11 d27
drwx-- 2 root root 69632 sep 16 16:25 d28
drwx-- 2 root root 69632 sep 16 16:25 d29
drwx-- 2 root root 69632 sep 16 16:25 d3
drwx-- 2 root root 65536 sep 16 16:25 d30
drwx-- 2 root root 65536 sep 16 16:25 d31
drwx-- 2 root root 69632 sep 16 16:25 d4
drwx-- 2 root root 61440 sep 16 16:25 d5
drwx-- 2 root root 69632 sep 16 16:25 d6
drwx-- 2 root root 73728 sep 16 16:25 d7
drwx-- 2 root root 65536 sep 16 17:46 d8
drwx-- 2 root root 69632 sep 16 17:46 d9
-rw-r--r-- 1 root root 8 ene  4  2016 LAST_ID

[root@jaopost-mds d0]# ls -ltr | more
total 5865240
-rw-r--r-- 1 root root  252544 ene  4  2016 32
-rw-r--r-- 1 root root 2396224 ene  9  2016 2720
-rw-r--r-- 1 root root 4153280 ene  9  2016 2752
-rw-r--r-- 1 root root 4153280 ene 10  2016 2784
-rw-r--r-- 1 root root 4153280 ene 10  2016 2816
-rw-r--r-- 1 root root 4153280 ene 10  2016 2848
-rw-r--r-- 1 root root 4153280 ene 10  2016 2880
-rw-r--r-- 1 root root 4153280 ene 10  2016 2944
-rw-r--r-- 1 root root 4153280 ene 10  2016 2976
-rw-r--r-- 1 root root 4153280 ene 10  2016 3008
-rw-r--r-- 1 root root 4153280 ene 10  2016 3040

[lustre-discuss] Fwd: Re: MDT filling up with 4 MB files

2016-09-29 Thread Jessica Otey


[Sent on behalf of maxs.simmo...@alma.cl]

Colin,

We cleared the changelogs on the MDT, but see no space clearance.

Any idea how the 4MB files are produced?

Thanks.


On 29/09/16 13:25, Colin Faber wrote:
Yes, if you're not consuming the records, you're going to see them eat 
up space on the MDT.


On Thu, Sep 29, 2016 at 10:04 AM, Jessica Otey <jo...@nrao.edu 
<mailto:jo...@nrao.edu>> wrote:




On 9/29/16 12:36 PM, Colin Faber wrote:

Is the changelogs feature enabled?


Yes, and.. the output of lfs changelogs gives us 360,000 lines...
Do you think that is the source of all the 'extra' data?


On Thu, Sep 29, 2016 at 8:58 AM, Jessica Otey <jo...@nrao.edu
<mailto:jo...@nrao.edu>> wrote:

Hello all,
I write on behalf of my colleagues in Chile, who are
experiencing a bizarre problem with their MDT, namely, it is
filling up with 4 MB files. There is no issue with the number
of inodes, of which there are hundreds of millions unused.

[root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode
device /dev/sdb2 mounted by lustre
Filesystem features:  has_journal ext_attr resize_inode
dir_index filetype needs_recovery flex_bg dirdata
sparse_super large_file huge_file uninit_bg dir_nlink quota
Inode count:  239730688
Free inodes:  223553405
Inodes per group: 32768
Inode blocks per group:   4096
First inode:  11
Inode size:   512
Journal inode:8
Journal backup:   inode blocks
User quota inode: 3
Group quota inode:4

Has anyone ever encountered such a problem? The only thing
unusual about this cluster is that it is using 2.5.3
MDS/OSSes while still using 1.8.9 clients—something I didn't
actually believe was possible, as I thought the last version
to work effectively with 1.8.9 clients was 2.4.3. However,
for all I know, the version gap may have nothing to do with
this phenomena.

Any and all advice is appreciated. Any general information on
the structure of the MDT also welcome, as such info is in
short supply on the internet.

Thanks,
Jessica

Below is a look inside the O folder at the root of the MDT,
where there are about 48,000 4MB files:

[root@jaopost-mds O]# pwd
/lustrebackup/O
[root@jaopost-mds O]# tree -L 1
.
├── 1
├── 10
└── 20003

3 directories, 0 files

[root@jaopost-mds O]# ls -l 1
total 2240
drwx-- 2 root root 69632 sep 16 16:25 d0
drwx-- 2 root root 69632 sep 16 16:25 d1
drwx-- 2 root root 61440 sep 16 17:46 d10
drwx-- 2 root root 69632 sep 16 17:46 d11
drwx-- 2 root root 69632 sep 16 18:04 d12
drwx-- 2 root root 65536 sep 16 18:04 d13
drwx-- 2 root root 65536 sep 16 18:04 d14
drwx-- 2 root root 69632 sep 16 18:04 d15
drwx-- 2 root root 61440 sep 16 18:04 d16
drwx-- 2 root root 61440 sep 16 18:04 d17
drwx-- 2 root root 69632 sep 16 18:04 d18
drwx-- 2 root root 69632 sep 16 18:04 d19
drwx-- 2 root root 65536 sep 16 16:25 d2
drwx-- 2 root root 69632 sep 16 18:04 d20
drwx-- 2 root root 69632 sep 16 18:04 d21
drwx-- 2 root root 61440 sep 16 18:04 d22
drwx-- 2 root root 69632 sep 16 18:04 d23
drwx-- 2 root root 61440 sep 16 16:11 d24
drwx-- 2 root root 69632 sep 16 16:11 d25
drwx-- 2 root root 69632 sep 16 16:11 d26
drwx-- 2 root root 69632 sep 16 16:11 d27
drwx-- 2 root root 69632 sep 16 16:25 d28
drwx-- 2 root root 69632 sep 16 16:25 d29
drwx-- 2 root root 69632 sep 16 16:25 d3
drwx-- 2 root root 65536 sep 16 16:25 d30
drwx-- 2 root root 65536 sep 16 16:25 d31
drwx-- 2 root root 69632 sep 16 16:25 d4
drwx-- 2 root root 61440 sep 16 16:25 d5
drwx-- 2 root root 69632 sep 16 16:25 d6
drwx-- 2 root root 73728 sep 16 16:25 d7
drwx-- 2 root root 65536 sep 16 17:46 d8
drwx-- 2 root root 69632 sep 16 17:46 d9
-rw-r--r-- 1 root root 8 ene  4  2016 LAST_ID

[root@jaopost-mds d0]# ls -ltr | more
total 5865240
-rw-r--r-- 1 root root  252544 ene  4  2016 32
-rw-r--r-- 1 root root 2396224 ene  9  2016 2720
-rw-r--r-- 1 root root 4153280 ene  9  2016 2752
-rw-r--r-- 1 root root 4153280 ene 10  2016 2784
-rw-r--r-- 1 root root 4153280 ene 10  2016 2816
-rw-r--r-- 1 root root 4153280 ene 10  2016 2848
-rw-r--r-- 1 root root 4153280 ene 10  2016 2880
-rw-r--r-- 1 root root 4153280 ene 10  20

Re: [lustre-discuss] MDT filling up with 4 MB files

2016-09-29 Thread Jessica Otey



On 9/29/16 12:36 PM, Colin Faber wrote:

Is the changelogs feature enabled?

Yes, and.. the output of lfs changelogs gives us 360,000 lines... Do you 
think that is the source of all the 'extra' data?
On Thu, Sep 29, 2016 at 8:58 AM, Jessica Otey <jo...@nrao.edu 
<mailto:jo...@nrao.edu>> wrote:


Hello all,
I write on behalf of my colleagues in Chile, who are experiencing
a bizarre problem with their MDT, namely, it is filling up with 4
MB files. There is no issue with the number of inodes, of which
there are hundreds of millions unused.

[root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode
device /dev/sdb2 mounted by lustre
Filesystem features:  has_journal ext_attr resize_inode
dir_index filetype needs_recovery flex_bg dirdata sparse_super
large_file huge_file uninit_bg dir_nlink quota
Inode count:  239730688
Free inodes:  223553405
Inodes per group: 32768
Inode blocks per group:   4096
First inode:  11
Inode size: 512
Journal inode:8
Journal backup:   inode blocks
User quota inode: 3
Group quota inode:4

Has anyone ever encountered such a problem? The only thing unusual
about this cluster is that it is using 2.5.3 MDS/OSSes while still
using 1.8.9 clients—something I didn't actually believe was
possible, as I thought the last version to work effectively with
1.8.9 clients was 2.4.3. However, for all I know, the version gap
may have nothing to do with this phenomena.

Any and all advice is appreciated. Any general information on the
structure of the MDT also welcome, as such info is in short supply
on the internet.

Thanks,
Jessica

Below is a look inside the O folder at the root of the MDT, where
there are about 48,000 4MB files:

[root@jaopost-mds O]# pwd
/lustrebackup/O
[root@jaopost-mds O]# tree -L 1
.
├── 1
├── 10
└── 20003

3 directories, 0 files

[root@jaopost-mds O]# ls -l 1
total 2240
drwx-- 2 root root 69632 sep 16 16:25 d0
drwx-- 2 root root 69632 sep 16 16:25 d1
drwx-- 2 root root 61440 sep 16 17:46 d10
drwx-- 2 root root 69632 sep 16 17:46 d11
drwx-- 2 root root 69632 sep 16 18:04 d12
drwx-- 2 root root 65536 sep 16 18:04 d13
drwx-- 2 root root 65536 sep 16 18:04 d14
drwx-- 2 root root 69632 sep 16 18:04 d15
drwx-- 2 root root 61440 sep 16 18:04 d16
drwx-- 2 root root 61440 sep 16 18:04 d17
drwx-- 2 root root 69632 sep 16 18:04 d18
drwx-- 2 root root 69632 sep 16 18:04 d19
drwx-- 2 root root 65536 sep 16 16:25 d2
drwx-- 2 root root 69632 sep 16 18:04 d20
drwx-- 2 root root 69632 sep 16 18:04 d21
drwx-- 2 root root 61440 sep 16 18:04 d22
drwx-- 2 root root 69632 sep 16 18:04 d23
drwx-- 2 root root 61440 sep 16 16:11 d24
drwx-- 2 root root 69632 sep 16 16:11 d25
drwx-- 2 root root 69632 sep 16 16:11 d26
drwx-- 2 root root 69632 sep 16 16:11 d27
drwx-- 2 root root 69632 sep 16 16:25 d28
drwx-- 2 root root 69632 sep 16 16:25 d29
drwx-- 2 root root 69632 sep 16 16:25 d3
drwx-- 2 root root 65536 sep 16 16:25 d30
drwx-- 2 root root 65536 sep 16 16:25 d31
drwx-- 2 root root 69632 sep 16 16:25 d4
drwx-- 2 root root 61440 sep 16 16:25 d5
drwx-- 2 root root 69632 sep 16 16:25 d6
drwx-- 2 root root 73728 sep 16 16:25 d7
drwx-- 2 root root 65536 sep 16 17:46 d8
drwx-- 2 root root 69632 sep 16 17:46 d9
-rw-r--r-- 1 root root 8 ene  4  2016 LAST_ID

[root@jaopost-mds d0]# ls -ltr | more
total 5865240
-rw-r--r-- 1 root root  252544 ene  4  2016 32
-rw-r--r-- 1 root root 2396224 ene  9  2016 2720
-rw-r--r-- 1 root root 4153280 ene  9  2016 2752
-rw-r--r-- 1 root root 4153280 ene 10  2016 2784
-rw-r--r-- 1 root root 4153280 ene 10  2016 2816
-rw-r--r-- 1 root root 4153280 ene 10  2016 2848
-rw-r--r-- 1 root root 4153280 ene 10  2016 2880
-rw-r--r-- 1 root root 4153280 ene 10  2016 2944
-rw-r--r-- 1 root root 4153280 ene 10  2016 2976
-rw-r--r-- 1 root root 4153280 ene 10  2016 3008
-rw-r--r-- 1 root root 4153280 ene 10  2016 3040
-rw-r--r-- 1 root root 4153280 ene 10  2016 3072
-rw-r--r-- 1 root root 4153280 ene 10  2016 3104
-rw-r--r-- 1 root root 4153280 ene 10  2016 3136
-rw-r--r-- 1 root root 4153280 ene 10  2016 3168
-rw-r--r-- 1 root root 4153280 ene 10  2016 3200
-rw-r--r-- 1 root root 4153280 ene 10  2016 3232
-rw-r--r-- 1 root root 4153280 ene 10  2016 3264
-rw-r--r-- 1 root root 4153280 ene 10  2016 3296
-rw-r--r-- 1 root root 4153280 ene 10  2016 3328



___
lustre-discuss mailing list
lustre-discuss@list

[lustre-discuss] MDT filling up with 4 MB files

2016-09-29 Thread Jessica Otey

Hello all,
I write on behalf of my colleagues in Chile, who are experiencing a 
bizarre problem with their MDT, namely, it is filling up with 4 MB 
files. There is no issue with the number of inodes, of which there are 
hundreds of millions unused.


[root@jaopost-mds ~]# tune2fs -l /dev/sdb2 | grep -i inode
device /dev/sdb2 mounted by lustre
Filesystem features:  has_journal ext_attr resize_inode dir_index 
filetype needs_recovery flex_bg dirdata sparse_super large_file 
huge_file uninit_bg dir_nlink quota

Inode count:  239730688
Free inodes:  223553405
Inodes per group: 32768
Inode blocks per group:   4096
First inode:  11
Inode size: 512
Journal inode:8
Journal backup:   inode blocks
User quota inode: 3
Group quota inode:4

Has anyone ever encountered such a problem? The only thing unusual about 
this cluster is that it is using 2.5.3 MDS/OSSes while still using 1.8.9 
clients—something I didn't actually believe was possible, as I thought 
the last version to work effectively with 1.8.9 clients was 2.4.3. 
However, for all I know, the version gap may have nothing to do with 
this phenomena.


Any and all advice is appreciated. Any general information on the 
structure of the MDT also welcome, as such info is in short supply on 
the internet.


Thanks,
Jessica

Below is a look inside the O folder at the root of the MDT, where there 
are about 48,000 4MB files:


[root@jaopost-mds O]# pwd
/lustrebackup/O
[root@jaopost-mds O]# tree -L 1
.
├── 1
├── 10
└── 20003

3 directories, 0 files

[root@jaopost-mds O]# ls -l 1
total 2240
drwx-- 2 root root 69632 sep 16 16:25 d0
drwx-- 2 root root 69632 sep 16 16:25 d1
drwx-- 2 root root 61440 sep 16 17:46 d10
drwx-- 2 root root 69632 sep 16 17:46 d11
drwx-- 2 root root 69632 sep 16 18:04 d12
drwx-- 2 root root 65536 sep 16 18:04 d13
drwx-- 2 root root 65536 sep 16 18:04 d14
drwx-- 2 root root 69632 sep 16 18:04 d15
drwx-- 2 root root 61440 sep 16 18:04 d16
drwx-- 2 root root 61440 sep 16 18:04 d17
drwx-- 2 root root 69632 sep 16 18:04 d18
drwx-- 2 root root 69632 sep 16 18:04 d19
drwx-- 2 root root 65536 sep 16 16:25 d2
drwx-- 2 root root 69632 sep 16 18:04 d20
drwx-- 2 root root 69632 sep 16 18:04 d21
drwx-- 2 root root 61440 sep 16 18:04 d22
drwx-- 2 root root 69632 sep 16 18:04 d23
drwx-- 2 root root 61440 sep 16 16:11 d24
drwx-- 2 root root 69632 sep 16 16:11 d25
drwx-- 2 root root 69632 sep 16 16:11 d26
drwx-- 2 root root 69632 sep 16 16:11 d27
drwx-- 2 root root 69632 sep 16 16:25 d28
drwx-- 2 root root 69632 sep 16 16:25 d29
drwx-- 2 root root 69632 sep 16 16:25 d3
drwx-- 2 root root 65536 sep 16 16:25 d30
drwx-- 2 root root 65536 sep 16 16:25 d31
drwx-- 2 root root 69632 sep 16 16:25 d4
drwx-- 2 root root 61440 sep 16 16:25 d5
drwx-- 2 root root 69632 sep 16 16:25 d6
drwx-- 2 root root 73728 sep 16 16:25 d7
drwx-- 2 root root 65536 sep 16 17:46 d8
drwx-- 2 root root 69632 sep 16 17:46 d9
-rw-r--r-- 1 root root 8 ene  4  2016 LAST_ID

[root@jaopost-mds d0]# ls -ltr | more
total 5865240
-rw-r--r-- 1 root root  252544 ene  4  2016 32
-rw-r--r-- 1 root root 2396224 ene  9  2016 2720
-rw-r--r-- 1 root root 4153280 ene  9  2016 2752
-rw-r--r-- 1 root root 4153280 ene 10  2016 2784
-rw-r--r-- 1 root root 4153280 ene 10  2016 2816
-rw-r--r-- 1 root root 4153280 ene 10  2016 2848
-rw-r--r-- 1 root root 4153280 ene 10  2016 2880
-rw-r--r-- 1 root root 4153280 ene 10  2016 2944
-rw-r--r-- 1 root root 4153280 ene 10  2016 2976
-rw-r--r-- 1 root root 4153280 ene 10  2016 3008
-rw-r--r-- 1 root root 4153280 ene 10  2016 3040
-rw-r--r-- 1 root root 4153280 ene 10  2016 3072
-rw-r--r-- 1 root root 4153280 ene 10  2016 3104
-rw-r--r-- 1 root root 4153280 ene 10  2016 3136
-rw-r--r-- 1 root root 4153280 ene 10  2016 3168
-rw-r--r-- 1 root root 4153280 ene 10  2016 3200
-rw-r--r-- 1 root root 4153280 ene 10  2016 3232
-rw-r--r-- 1 root root 4153280 ene 10  2016 3264
-rw-r--r-- 1 root root 4153280 ene 10  2016 3296
-rw-r--r-- 1 root root 4153280 ene 10  2016 3328


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] resolution of LU-4397 [Permanently disabled OST causes clients to hang on df (statfs)]

2016-08-23 Thread Jessica Otey

All,
This is a bit of a complex scenario but I am hoping that someone out 
there can provide some relevant experience.


We have a production lustre system whose servers we have just recently 
upgraded from 1.8.9 to 2.4.3.


In testing a few clients (before upgrading them all), we encountered 
this (known) bug:


https://jira.hpdd.intel.com/browse/LU-4397

This bug was actually discovered by one of my NRAO colleagues, Wolfgang, 
who works in Green Bank, WV (whereas I work in Charlottesville, VA).


There are two things with which I would appreciate the list's help:

1) Identifying a version where this bug is FOR SURE fixed. If you read 
the ticket, it appears that the change was landed for 2.5.2 and 2.6, but 
that users have reported the bug existing (still?/again?) in 2.5.3. We 
absolutely need upgrade beyond 2.4.3, but it would be nice to know how 
far we need to go in order to have functional clients, ideally out of 
the box. (In addition to the df command, we have critical software that 
uses statfs).


2) Identifying another person who has experienced this bug IN A LEGACY 
ENVIRONMENT, i.e., in a system that started a 1.8.x, in which OSTs were 
made permanently inactive, and then the system upgraded to 2.5.3. In 
this case, I'd be curious as to whether the workaround described in the 
ticket by Wolfgang (who used it on 2.5.3) works for you, too. (It DOES 
NOT work for us on 2.4.3.) At least this way, if the bug still exists in 
2.5.3, we'll be more confident that we can use the workaround successfully.


Thanks for your time,
Jessica

--
Jessica Otey
System Administrator II
North American ALMA Science Center (NAASC)
National Radio Astronomy Observatory (NRAO)
Charlottesville, Virginia (USA)

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] difficulties mounting client via an lnet router--SOLVED!

2016-07-11 Thread Jessica Otey

All,

Thanks to the repliers who contributed to the solution. Here's a rundown:

First, here's a way to see if you have connectivity via the router 
between client and mdt, etc., using NIDS (to list NIDS, use lctl list_nids)


ltcl ping 

If the ping between client and the mdt works, you have connectivity. And 
indeed, in our case, the problem wasn't router configs or connectivity, 
but rather our lustre filesystem, which needed a --writeconf because the 
client had previously connected without the router (meaning directly via 
tcp rather than o2ib).


On 07/11/2016 12:11 PM, Oucharek, Doug S wrote:
The router is the not the issue and would be working fine “if” the 
file system were using the correct NID.  So, for example, before 
having an IB network for the servers, I suspect you had an MDT 
accessed via NID: 10.7.29.130@tcp.  When moving it to the IB network, 
it should become something like: 10.7.129.130@o2ib0.  The file system 
configuration will still try to get the clients to use 10.7.29.130@tcp 
until it is updated to the new NID.  But I can say that your LNet 
configurations are correct and will work once the file system starts 
using the correct NIDs. 


Here's how to do that (from this link: 
http://wiki.old.lustre.org/manual/LustreManual20_HTML/LustreMaintenance.html#50438199_31353) 



Section 14.5 Changing a Server NID
To change a server NID:

1.Update the LNET configuration in the /etc/modprobe.conf file so 
the list of server NIDs (lctl list_nids) is correct.The lctl list_nids 
command indicates which network(s) are configured to work with Lustre.

2.Shut down the file system in this order:
a.Unmount the clients.
b. Unmount the MDT.
c. Unmount all OSTs.
3.Run the writeconf command on all servers.
Run writeconf on the MDT first, and then the OSTs.
a.On the MDT, run:
$ tunefs.lustre --writeconf 
b. On each OST, run:
$ tunefs.lustre --writeconf 
c. If the NID on the MGS was changed, communicate the new MGS location 
to each server. Run:

tunefs.lustre --erase-param --mgsnode=<new_nid(s)> --writeconf /dev/..
4.Restart the file system in this order:
a.Mount the MGS (or the combined MGS/MDT).
b. Mount the MDT.
c. Mount the OSTs.
d. Mount the clients.
After the writeconf command is run, the configuration logs are 
re-generated as servers restart, and server NIDs in the updated 
list_nids file are used.


This worked for us, and we were at last able to mount the client via the 
router!


Thanks lustre experts for being there!!!

Jessica

On 07/11/2016 10:34 AM, Jessica Otey wrote:

All,
I am, as before, working on a small test lustre setup (RHEL 6.8, 
lustre v. 2.4.3) to prepare for upgrading at 1.8.9 lustre production 
system to 2.4.3 (first the servers and lnet routers, then at a 
subsequent time, the clients). Lustre servers have IB connections, but 
the clients are 1G ethernet only.


For the life of me, I cannot get the client to mount via the router on 
this test system. (Client will mount fine when router is taken out of 
the equation.) This is the error I am seeing in the syslog from the 
mount attempt:


Jul 11 10:15:37 tlclient kernel: Lustre: 
3605:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent 
has timed out for slow reply: [sent 1468246532/real 1468246532]  
req@88032a3f9400 x1539566484848752/t0(0) 
o38->tlustre-MDT-mdc-88032ad20400@10.7.29.130@tcp:12/10 lens 
400/544 e 0 to 1 dl 1468246537 ref 1 fl Rpc:XN/0/ rc 0/-1
Jul 11 10:16:07 tlclient kernel: Lustre: 
3605:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent 
has timed out for slow reply: [sent 1468246557/real 1468246557]  
req@880629819000 x1539566484848764/t0(0) 
o38->tlustre-MDT-mdc-88032ad20400@10.7.29.130@tcp:12/10 lens 
400/544 e 0 to 1 dl 1468246567 ref 1 fl Rpc:XN/0/ rc 0/-1
Jul 11 10:16:37 tlclient kernel: Lustre: 
3605:0:(client.c:1868:ptlrpc_expire_one_request()) @@@ Request sent 
has timed out for slow reply: [sent 1468246582/real 1468246582]  
req@88062a371000 x1539566484848772/t0(0) 
o38->tlustre-MDT-mdc-88032ad20400@10.7.29.130@tcp:12/10 lens 
400/544 e 0 to 1 dl 1468246597 ref 1 fl Rpc:XN/0/ rc 0/-1
Jul 11 10:16:44 tlclient kernel: LustreError: 
2511:0:(lov_obd.c:937:lov_cleanup()) lov tgt 0 not cleaned! 
deathrow=0, lovrc=1

Jul 11 10:16:44 tlclient kernel: Lustre: Unmounted tlustre-client
Jul 11 10:16:44 tlclient kernel: LustreError: 
4881:0:(obd_mount.c:1289:lustre_fill_super()) Unable to mount (-4)


More than one pair of eyes has looked at the configs and confirmed 
they look okay. But frankly we've got to be missing something since 
this should (like lustre on a good day) 'just work'.


If anyone has seen this issue before and could give some advice, it'd 
be appreciated. One major question I have is whether the problem is a 
configuration issue or a procedure issue--perhaps the order in which I 
am doing things is causing the failure? The order I'm following 
currently is:


1)

[lustre-discuss] difficulties mounting client via an lnet router

2016-07-11 Thread Jessica Otey
ve_router_check_interval="60" dead_router_check_interval="60"


tloss ifconfig
[root@tloss ~]# ifconfig #lo omitted
em1   Link encap:Ethernet  HWaddr 78:2B:CB:4A:7A:F8
  inet addr:10.7.29.131  Bcast:10.7.29.255 Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500 Metric:1
  RX packets:7939328 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4920595 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:7016088640 (6.5 GiB)  TX bytes:447490407 (426.7 MiB)
ib0   Link encap:InfiniBand  HWaddr 
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

  inet addr:10.7.129.131  Bcast:10.7.129.255 Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:2044 Metric:1
  RX packets:484688 errors:0 dropped:0 overruns:0 frame:0
  TX packets:62465 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:256
  RX bytes:845062706 (805.9 MiB)  TX bytes:919378780 (876.7 MiB)

tlmds ifconfig
[root@tlmds ~]# ifconfig #lo omitted
em1   Link encap:Ethernet  HWaddr 78:2B:CB:28:1D:00
  inet addr:10.7.29.130  Bcast:10.7.29.255 Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500 Metric:1
  RX packets:7849519 errors:0 dropped:0 overruns:0 frame:0
  TX packets:4847566 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:7049031324 (6.5 GiB)  TX bytes:484594569 (462.1 MiB)

ib0   Link encap:InfiniBand  HWaddr 
80:00:00:48:FE:80:00:00:00:00:00:00:00:00:00:00:00:00:00:00

  inet addr:10.7.129.130  Bcast:10.7.129.255 Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:2044 Metric:1
  RX packets:532171 errors:0 dropped:0 overruns:0 frame:0
  TX packets:64114 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:256
  RX bytes:946230130 (902.3 MiB)  TX bytes:821297144 (783.2 MiB)

--
Jessica Otey
System Administrator II
North American ALMA Science Center (NAASC)
National Radio Astronomy Observatory (NRAO)
Charlottesville, Virginia (USA)

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org