[lustre-discuss] Fw: [lwg] LUG 2023 Call for Presentations

2023-01-31 Thread Faaland, Olaf P. via lustre-discuss
Hi All,

The Lustre User Group meeting this year in in San Diego, during the first week 
of May.  For details, see:

https://na.eventscloud.com/website/48957/

If you would like to present, information on how to submit an abstract is 
below.  I'm happy to help you navigate the process, answer questions, give 
feedback, etc.

If you're interested in attending, registration opens March 1st.  I'll attend 
in-person.  

If you have any questions, let me know.

-Olaf

Sent: Thursday, January 19, 2023 12:51 PM
To: lustre-annou...@lists.lustre.org

Hello Lustre Community,

The Lustre User Group (LUG) 
conference
 is the high performance computing industry’s primary venue for discussion on 
the open source Lustre file system and other technologies. The conference 
focuses on the latest Lustre developments and allows attendees to network with 
peers.   Please consider submitting a technical presentation for LUG 2023.

The 2023 Lustre User Group (LUG) conference will be held the week of May 1-4, 
2023 at UC San Diego.

Call for Presentations

The LUG Program Committee is particularly seeking presentations on:

  *   Experiences running the newer community releases (2.15+)
  *   Experiences using the latest Lustre features (Client Data Encryption, OST 
Pool Quotas, DNEv3, PCC, SSK, DoM, Multi-Rail LNet, etc.)
  *   Best practices and practical experiences in deploying, monitoring, and 
operating Lustre (on-prem and in the cloud)
  *   Pushing the boundaries with non-traditional deployments
  *   Performance & feature comparison with other filesystems

Submission Guidelines

You only need an abstract for the submission process; we will request 
presentation materials once abstracts are reviewed and selected. Abstracts 
should be a minimum of 250 words and should provide a clear description of the 
planned presentation and its goals. All LUG presentations will be 30 minutes 
(including questions).  All presentations must be original and not 
simultaneously submitted to another conference.

The abstract submission deadline is set to Tuesday, February 28, 2023, 23:59, 
AoE "Anywhere on Earth"

The submission details web page for LUG 2023 is available at:
https://easychair.org/cfp/LUG2023

You will need to create a user account on EasyChair if you don't already have 
one.
https://easychair.org/account/signin?l=hcMNev4sI1COjDxz66gEFd

We look forward to seeing you at LUG 2023!

The LUG 2023 Program Committee

___
lwg mailing list
l...@lists.opensfs.org
https://urldefense.us/v3/__http://lists.opensfs.org/listinfo.cgi/lwg-opensfs.org__;!!G2kpM7uM-TzIFchu!j-6mTZ13hZz7a8frHQlUep4h1HTANXKuw1vxihQcymyxqFyKbjITmrUQyR-InrJAqg$
 
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] intermittently can't start ll_sa thread and can't start ll_sa thread, and sysctl kernel.pid_max

2022-06-08 Thread Faaland, Olaf P. via lustre-discuss
Hi All,

This is not a Lustre problem proper, but others might run into it with a 64-bit 
Lustre client on RHEL 7, and I hope to save others the time it took us to nail 
it down.  We saw it on a node running the "Starfish" policy engine, which reads 
through the entire file system tree repeatedly and consumes changelogs.  
Starfish itself creates and destroys processes frequently, and the workload 
causes Lustre to create and destroy threads as well, by triggering statahead 
thread creation and changelog thread creation.

For the impatient, the fix was to increase pid_max.  We used:
kernel.pid_max=524288

The symptoms are:

1) console log messages like
LustreError: 10525:0:(statahead.c:970:ll_start_agl()) can't start ll_agl 
thread, rc: -12
LustreError: 15881:0:(statahead.c:1614:start_statahead_thread()) can't start 
ll_sa thread, rc: -12
LustreError: 15881:0:(statahead.c:1614:start_statahead_thread()) Skipped 45 
previous similar messages
LustreError: 15878:0:(statahead.c:1614:start_statahead_thread()) can't start 
ll_sa thread, rc: -12
LustreError: 15878:0:(statahead.c:1614:start_statahead_thread()) Skipped 17 
previous similar messages 

Note the return codes are -12, which is -ENOMEM.

Attempts to create new user space processes are also intermittently failing.

sf_lustre.liblustreCmds 10983 'MainThread' : ("can't start new thread",) 
[liblustreCmds.py:216]

and

[faaland1@solfish2 lustre]$git fetch llnlstash
Enter passphrase for key '/g/g0/faaland1/.ssh/swdev': 
Enter passphrase for key '/g/g0/faaland1/.ssh/swdev': 
remote: Enumerating objects: 1377, done.
remote: Counting objects: 100% (1236/1236), done.
remote: Compressing objects: 100% (271/271), done.
error: cannot fork() for index-pack: Cannot allocate memory
fatal: fetch-pack: unable to fork off index-pack

We wasted a lot of time chasing the idea that this was in fact due to 
insufficient free memory on the node, but the actual problem was that sysctl 
kernel.pid_max was too low.

When a new process must be created via fork() or kthread_create(), or similar, 
the kernel has to allocate a PID.  It has a data structure for keeping track of 
which PIDs are available, and there is some delay after a process is destroyed 
before its PID may be reused.

We found that on this node, that the kernel would occasionally find no PIDs 
available when it was creating the process.  Specifically, copy_process() would 
call alloc_pidmap(), which would return -1.  This tended to be when the system 
was processing a large number of changes on the file system, so both Lustre and 
Starifish were suddenly doing a lot of work and both would have been creating 
new threads in response to the load.   This node has about 700-800 processes 
running normally according to top(1).  At the time these errors occurred, I 
don't know many processes were running or how quickly they were being created 
and destroyed.

Ftrace showed this:

|copy_namespaces();
|copy_thread();
|alloc_pid() {
|  kmem_cache_alloc() {
|__might_sleep();
|_cond_resched();
|  }
|  kmem_cache_free();
|}
|exit_task_namespaces() {
|  switch_task_namespaces() {

On this particular node, with 32 cores, running RHEL 7, arch x86_64, pid_max 
was 36K.We added
kernel.pid_max=524288
to our sysctl.conf which resolved the issue.

I don't expect this to be an issue under RHEL 8 (or clone of your choice), 
because in RHEL 8.2 systemd puts a config file in place that sets pid_max to 
2^22.

-Olaf
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] ZFS w/Lustre problem

2020-11-18 Thread Faaland, Olaf P.
Hi Steve,

You mentioned you may have a fix for zfs_send.c in ZFS.   Although Lustre 
tickles the bug, it's not likely that is the only way to tickle it.

Is there already a bug report for your issue at 
https://github.com/openzfs/zfs/issues?  If not, can you create one, even if 
your patch isn't successful?  That's the place to get your patch landed, and/or 
get help with the issue.

thanks,
-Olaf


From: lustre-discuss  on behalf of 
Steve Thompson 
Sent: Tuesday, November 10, 2020 5:06 AM
To: Hans Henrik Happe
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] ZFS w/Lustre problem

On Mon, 9 Nov 2020, Hans Henrik Happe wrote:

> I sounds like this issue, but I'm not sure what your dnodesize is:
>
> https://github.com/openzfs/zfs/issues/8458
>
> ZFS 0.8.1+ on the receiving side should fix it. Then again ZFS 0.8 is
> not supported in Lustre 2.12, so it's a bit hard to restore, without
> copying the underlying devices.

Hans Henrik,

Many thanks for your input. I had in fact known about the dnodesize issue,
and tested a workaround. Unfortunately, it turned out not to be this.
Instead, I have tested a patch to zfs_send.c, which does appear to have
solved the issue. The zfs send/recv is still running, however; if it
completes successfully, I will post again with details of the patch.

Steve
--

Steve Thompson E-mail:  smt AT vgersoft DOT com
Voyager Software LLC   Web: http://www DOT vgersoft DOT com
3901 N Charles St  VSW Support: support AT vgersoft DOT com
Baltimore MD 21218
   "186,282 miles per second: it's not just a good idea, it's the law"

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] definition of a peer

2020-08-27 Thread Faaland, Olaf P.
Hi Amir,

Thanks for the explanation.  I noticed an address I didn't expect in a peer 
list, and so was trying to understand how it got there.  This helps.

It's no problem for me that peer blocks are persistent.  I saw that I can 
remove one for the case of those created by erroneous configurations, etc.

-Olaf


From: Amir Shehata 
Sent: Thursday, August 27, 2020 9:12 AM
To: Faaland, Olaf P.
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] definition of a peer

Hi Olaf,
A peer block on a local node is always created whenever you either try to 
communicate with it or it tries to communicate with you. So let's look at the 
active/passive cases

Active case
1. You try to send a message to @
2. Lnet creates a peer block to identify that peer
3. OPTION 1: The message succeeds and you receive a response. Peer block 
remains.
4. OPTION 2: The message fails. Although you have not received a response that 
peer block remains

Passive case
1. You receive a message from @
2. Lnet creates a peer block to identify that peer
3. OPTION 1: Response succeeds. peer block remains
4. OPTION 2: Response fails. In both 3&4 the peer block remains.

When you do a "lnetctl peer show" on the local net, what you're seeing is the 
set of peers since the startup of LNet which you either tried to communicate 
with or they tried to communicate with you, irregardless of success.

Question: Are you having problems with the fact that peers remain even after 
the physical node has gone away (or potentially peer blocks which have been 
created due to erroneous operations - bad config, etc)?

thanks
amir


On Wed, 26 Aug 2020 at 16:00, Faaland, Olaf P. 
mailto:faala...@llnl.gov>> wrote:
FWIW, I now think an lnet peer is just "a node I've communicated with at some 
time".

-Olaf


From: lustre-discuss  on behalf of 
Faaland, Olaf P. 
Sent: Wednesday, August 26, 2020 11:53 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] definition of a peer

Hi All,

What is the definition of an lnet peer (in Lustre 2.12 and in master, if the 
definition has changed over time)?

I had believed it was a node that could be communicated with directly from 
LNet's perspective, e.g. without requiring transit through a lustre router.   
But that seems not to be true on a node I'm looking at.

thanks,
-Olaf
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre 2.12 routing with MR and discovery off

2020-08-26 Thread Faaland, Olaf P.
Does Lustre 2.12 require that routes for every intermediate network are 
defined, on every node on a path?

For example, given this Lustre network, where:
  A-D are nodes and 1-6 are addresses
  network tcp2 has only routers, no clients and no servers

A(1) -tcp1- (2)B(3) -tcp2- (4)C(5) -tcp3- (6)D

And configured routes:

A: options lnet routes="tcp3 2@tcp1"
B: options lnet routes="tcp3 4@tcp2"
C: options lnet routes="tcp1 3@tcp2"
D: options lnet routes="tcp1 5@tcp3"

With Lustre <= 2.10 we configured only these routes.  The only nodes that need 
to know tcp2 exist are attached to it, and so there are no routes to tcp2 
defined anywhere.

It looks to me like Lustre 2.12 attempts to send error notifications back to 
the original sender, and so nodes A and D may end up receiving messages from 
nids on tcp2.  This then requires nodes A and D to have routes to tcp2 defined, 
so they can reply to the messages.

Is that correct?

thanks,
Olaf


-Olaf
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] definition of a peer

2020-08-26 Thread Faaland, Olaf P.
Hi All,

What is the definition of an lnet peer (in Lustre 2.12 and in master, if the 
definition has changed over time)?

I had believed it was a node that could be communicated with directly from 
LNet's perspective, e.g. without requiring transit through a lustre router.   
But that seems not to be true on a node I'm looking at.

thanks,
-Olaf
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Not able to start changelog in MDS

2019-10-10 Thread Faaland, Olaf P.
Hi Arnab,

Are you running "lfs changelog" on the MDS or on the client?  It needs to be 
run on the client.

-Olaf


From: lustre-discuss  on behalf of 
Arnab Kumar Paul 
Sent: Thursday, October 10, 2019 10:42 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Not able to start changelog in MDS

Hello,

I have a Lustre system with version 2.10.3 with 1 MDS and 8 OSS

On the client system: lfs df -h gives this output:

UUID   bytesUsed   Available Use% Mounted on
scratch0-MDT_UUID   27.8G   61.0M   25.2G   0% 
/mnt/lustre[MDT:0]
scratch0-OST0001_UUID9.2G   37.2M8.7G   0% 
/mnt/lustre[OST:1]
scratch0-OST0002_UUID9.2G   37.2M8.7G   0% 
/mnt/lustre[OST:2]
.
.
.

On MDS:
$ lctl set_param mdt.scratch0-MDT.hsm_control=enabled
mdt.scratch0-MDT.hsm_control=enabled

$ lctl --device scratch0-MDT changelog_register
scratch0-MDT: Registered changelog userid 'cl1'

$ lctl get_param mdd.scratch0-MDT.changelog_users 
mdd.scratch0-MDT.changelog_size
mdd.scratch0-MDT.changelog_users=
current index: 1
IDindex (idle seconds)
cl1   0 (56886)
mdd.scratch0-MDT.changelog_size=33112

$ lfs changelog scratch0-MDT
Can't start changelog: No such file or directory

Why does this happen? Any suggestions how to start changelog?

- Arnab Paul
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] LMT 3.2.6

2019-05-06 Thread Faaland, Olaf P.
Hello,

LMT 3.2.6 has been released.  The only change from LMT 3.2.5 is support for 
Lustre 2.12.

https://github.com/LLNL/lmt/releases/tag/3.2.6

Happy Travels to those going to LUG.
-Olaf
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] inotify

2019-04-23 Thread Faaland, Olaf P.
Hi,

Does inotify work on Lustre and if so, are there any caveats (performance, 
functionality, or otherwise)?

Thanks,
-Olaf
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre stability problem (2.12.0)

2019-04-10 Thread Faaland, Olaf P.
Hello Bernd,

I've seen this behavior as well, in our case with Lustre 2.10.6+patches and 
zfs-0.7.11+patches.  I'm looking into it, but do not yet have any more 
information.

-Olaf


From: lustre-discuss  on behalf of 
Bernd Melchers 
Sent: Tuesday, April 9, 2019 4:41 AM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] lustre stability problem (2.12.0)

Hi All,
we have stability problems with our lustre on zfs installation:
CentOS 7.6, kernel 3.10.0-957.5.1.el7.x86_64, lustre 2.12.0, zfs 0.7.12

The ods servers have hanging kernel threads ll_ost_io. The threads locks
up the cpu and were killed after 22 sec, please have a look at the
attachement.
Is this a known problem and is this a zfs problem?


Mit freundlichen Grüßen
Bernd Melchers

--
Archiv- und Backup-Service | fab-serv...@zedat.fu-berlin.de
Freie Universität Berlin   | Tel. +49-30-838-55905
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] lustre vs. lustre-client

2018-08-09 Thread Faaland, Olaf P.
Hi,

What is the reason for naming the package "lustre" if it includes both client 
and server binaries, but "lustre-client" if it includes only the client?

= (from 
# Set the package name prefix
%if %{undefined lustre_name}
%if %{with servers}
%global lustre_name lustre
%else
%global lustre_name lustre-client
%endif
%endif
=

Are there sites that build both with and without servers, and need to keep 
track which is installed on a given machine?  The size of the RPMs isn't that 
different, so it's not obvious to me why one would do that.

thanks,

Olaf P. Faaland
Livermore Computing
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.11 lnet troubleshooting

2018-04-19 Thread Faaland, Olaf P.
I haven't tested 2.10 yet, but I may get a chance to today.  I created ticket

https://jira.hpdd.intel.com/browse/LU-10930

thanks,

Olaf P. Faaland
Livermore Computing


From: Dilger, Andreas <andreas.dil...@intel.com>
Sent: Wednesday, April 18, 2018 8:37:43 PM
To: Faaland, Olaf P.
Cc: lustre-discuss@lists.lustre.org; Shehata, Amir
Subject: Re: [lustre-discuss] Lustre 2.11 lnet troubleshooting

On Apr 17, 2018, at 19:00, Faaland, Olaf P. <faala...@llnl.gov> wrote:
>
> So the problem was inded that "routing" was disabled on the router node.  I 
> added "routing: 1" to the lnet.conf file for the routers and lctl ping works 
> as expected.
>
> The question about the lnet module option "forwarding" still stands.  The 
> lnet module still accepts a parameter, "forwarding", but it doesn't do what 
> it used to.   Is that just a leftover that needs to be cleaned up?

I would say that the module parameter should continue to work, and be 
equivalent to the "routing: 1" YAML parameter.  This facilitates upgrades.

Did you try this with 2.10 (which also has LNet Multi-Rail), or are you coming 
from 2.7 or 2.8?

I'd recommend to file a ticket in Jira for this.  I suspect it might also be 
broken in 2.10, and the fix should be backported there as well.

Cheers, Andreas

> 
> From: Faaland, Olaf P.
> Sent: Tuesday, April 17, 2018 5:05 PM
> To: lustre-discuss@lists.lustre.org
> Subject: Re: Lustre 2.11 lnet troubleshooting
>
> Update:
>
> Joe pointed out "lnetctl set routing 1".  After invoking that on the router 
> node, the compute node reports the route as up:
>
> [root@ulna66:lustre-211]# lnetctl route show -v
> route:
>- net: o2ib100
>  gateway: 192.168.128.4@o2ib33
>  hop: -1
>  priority: 0
>  state: up
>
> Does this replace the lnet module parameter "forwarding"?
>
> Olaf P. Faaland
> Livermore Computing
>
>
> 
> From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
> Faaland, Olaf P. <faala...@llnl.gov>
> Sent: Tuesday, April 17, 2018 4:34:22 PM
> To: lustre-discuss@lists.lustre.org
> Subject: [lustre-discuss] Lustre 2.11 lnet troubleshooting
>
> Hi,
>
> I've got a cluster running 2.11 with 2 routers and 68  compute nodes.  It's 
> the first time I've used a post-multi-rail version of Lustre.
>
> The problem I'm trying to troubleshoot is that my sample compute node 
> (ulna66) seems to think the router I configured (ulna4) is down, and so an 
> attempt to ping outside the cluster results in failure and "no route to XXX" 
> on the console.  I can lctl ping the router from the compute node and 
> vice-versa.   Forwarding is enabled on the router node via modprobe argument.
>
> lnetctl route show reports that the route is down.  Where I'm stuck is 
> figuring out what in userspace (e.g. lnetctl or lctl) can tell me why.
>
> The compute node's lnet configuration is:
>
> [root@ulna66:lustre-211]# cat /etc/lnet.conf
> ip2nets:
>  - net-spec: o2ib33
>interfaces:
> 0: hsi0
>ip-range:
> 0: 192.168.128.*
> route:
>- net: o2ib100
>  gateway: 192.168.128.4@o2ib33
>
> After I start lnet, systemctl reports success and the state is as follows:
>
> [root@ulna66:lustre-211]# lnetctl net show
> net:
>- net type: lo
>  local NI(s):
>- nid: 0@lo
>  status: up
>- net type: o2ib33
>  local NI(s):
>- nid: 192.168.128.66@o2ib33
>  status: up
>  interfaces:
>  0: hsi0
>
> [root@ulna66:lustre-211]# lnetctl peer show --verbose
> peer:
>- primary nid: 192.168.128.4@o2ib33
>  Multi-Rail: False
>  peer ni:
>- nid: 192.168.128.4@o2ib33
>  state: up
>  max_ni_tx_credits: 8
>  available_tx_credits: 8
>  min_tx_credits: 7
>  tx_q_num_of_buf: 0
>  available_rtr_credits: 8
>  min_rtr_credits: 8
>  refcount: 4
>  statistics:
>  send_count: 2
>  recv_count: 2
>  drop_count: 0
>
> [root@ulna66:lustre-211]# lnetctl route show --verbose
> route:
>- net: o2ib100
>  gateway: 192.168.128.4@o2ib33
>  hop: -1
>  priority: 0
>  state: down
>
> I can instrument the code, but I figure there must be someplace available to 
> normal users to look, that I'm unaware of.
>
> thanks,
>
> Olaf P. Faaland
> Livermore Computing
> ___
> lustre-discuss mailing li

Re: [lustre-discuss] Lustre 2.11 lnet troubleshooting

2018-04-17 Thread Faaland, Olaf P.
So the problem was inded that "routing" was disabled on the router node.  I 
added "routing: 1" to the lnet.conf file for the routers and lctl ping works as 
expected.

The question about the lnet module option "forwarding" still stands.  The lnet 
module still accepts a parameter, "forwarding", but it doesn't do what it used 
to.   Is that just a leftover that needs to be cleaned up?

thanks,

Olaf P. Faaland
Livermore Computing

____________
From: Faaland, Olaf P.
Sent: Tuesday, April 17, 2018 5:05 PM
To: lustre-discuss@lists.lustre.org
Subject: Re: Lustre 2.11 lnet troubleshooting

Update:

Joe pointed out "lnetctl set routing 1".  After invoking that on the router 
node, the compute node reports the route as up:

[root@ulna66:lustre-211]# lnetctl route show -v
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33
  hop: -1
  priority: 0
  state: up

Does this replace the lnet module parameter "forwarding"?

Olaf P. Faaland
Livermore Computing



From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Faaland, Olaf P. <faala...@llnl.gov>
Sent: Tuesday, April 17, 2018 4:34:22 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Lustre 2.11 lnet troubleshooting

Hi,

I've got a cluster running 2.11 with 2 routers and 68  compute nodes.  It's the 
first time I've used a post-multi-rail version of Lustre.

The problem I'm trying to troubleshoot is that my sample compute node (ulna66) 
seems to think the router I configured (ulna4) is down, and so an attempt to 
ping outside the cluster results in failure and "no route to XXX" on the 
console.  I can lctl ping the router from the compute node and vice-versa.   
Forwarding is enabled on the router node via modprobe argument.

lnetctl route show reports that the route is down.  Where I'm stuck is figuring 
out what in userspace (e.g. lnetctl or lctl) can tell me why.

The compute node's lnet configuration is:

[root@ulna66:lustre-211]# cat /etc/lnet.conf
ip2nets:
  - net-spec: o2ib33
interfaces:
 0: hsi0
ip-range:
 0: 192.168.128.*
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33

After I start lnet, systemctl reports success and the state is as follows:

[root@ulna66:lustre-211]# lnetctl net show
net:
- net type: lo
  local NI(s):
- nid: 0@lo
  status: up
- net type: o2ib33
  local NI(s):
- nid: 192.168.128.66@o2ib33
  status: up
  interfaces:
  0: hsi0

[root@ulna66:lustre-211]# lnetctl peer show --verbose
peer:
- primary nid: 192.168.128.4@o2ib33
  Multi-Rail: False
  peer ni:
- nid: 192.168.128.4@o2ib33
  state: up
  max_ni_tx_credits: 8
  available_tx_credits: 8
  min_tx_credits: 7
  tx_q_num_of_buf: 0
  available_rtr_credits: 8
  min_rtr_credits: 8
  refcount: 4
  statistics:
  send_count: 2
  recv_count: 2
  drop_count: 0

[root@ulna66:lustre-211]# lnetctl route show --verbose
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33
  hop: -1
  priority: 0
  state: down

I can instrument the code, but I figure there must be someplace available to 
normal users to look, that I'm unaware of.

thanks,

Olaf P. Faaland
Livermore Computing
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.11 lnet troubleshooting

2018-04-17 Thread Faaland, Olaf P.
Update:

Joe pointed out "lnetctl set routing 1".  After invoking that on the router 
node, the compute node reports the route as up:

[root@ulna66:lustre-211]# lnetctl route show -v
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33
  hop: -1
  priority: 0
  state: up

Does this replace the lnet module parameter "forwarding"?

Olaf P. Faaland
Livermore Computing



From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Faaland, Olaf P. <faala...@llnl.gov>
Sent: Tuesday, April 17, 2018 4:34:22 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Lustre 2.11 lnet troubleshooting

Hi,

I've got a cluster running 2.11 with 2 routers and 68  compute nodes.  It's the 
first time I've used a post-multi-rail version of Lustre.

The problem I'm trying to troubleshoot is that my sample compute node (ulna66) 
seems to think the router I configured (ulna4) is down, and so an attempt to 
ping outside the cluster results in failure and "no route to XXX" on the 
console.  I can lctl ping the router from the compute node and vice-versa.   
Forwarding is enabled on the router node via modprobe argument.

lnetctl route show reports that the route is down.  Where I'm stuck is figuring 
out what in userspace (e.g. lnetctl or lctl) can tell me why.

The compute node's lnet configuration is:

[root@ulna66:lustre-211]# cat /etc/lnet.conf
ip2nets:
  - net-spec: o2ib33
interfaces:
 0: hsi0
ip-range:
 0: 192.168.128.*
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33

After I start lnet, systemctl reports success and the state is as follows:

[root@ulna66:lustre-211]# lnetctl net show
net:
- net type: lo
  local NI(s):
- nid: 0@lo
  status: up
- net type: o2ib33
  local NI(s):
- nid: 192.168.128.66@o2ib33
  status: up
  interfaces:
  0: hsi0

[root@ulna66:lustre-211]# lnetctl peer show --verbose
peer:
- primary nid: 192.168.128.4@o2ib33
  Multi-Rail: False
  peer ni:
- nid: 192.168.128.4@o2ib33
  state: up
  max_ni_tx_credits: 8
  available_tx_credits: 8
  min_tx_credits: 7
  tx_q_num_of_buf: 0
  available_rtr_credits: 8
  min_rtr_credits: 8
  refcount: 4
  statistics:
  send_count: 2
  recv_count: 2
  drop_count: 0

[root@ulna66:lustre-211]# lnetctl route show --verbose
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33
  hop: -1
  priority: 0
  state: down

I can instrument the code, but I figure there must be someplace available to 
normal users to look, that I'm unaware of.

thanks,

Olaf P. Faaland
Livermore Computing
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre 2.11 lnet troubleshooting

2018-04-17 Thread Faaland, Olaf P.
Hi,

I've got a cluster running 2.11 with 2 routers and 68  compute nodes.  It's the 
first time I've used a post-multi-rail version of Lustre.  

The problem I'm trying to troubleshoot is that my sample compute node (ulna66) 
seems to think the router I configured (ulna4) is down, and so an attempt to 
ping outside the cluster results in failure and "no route to XXX" on the 
console.  I can lctl ping the router from the compute node and vice-versa.   
Forwarding is enabled on the router node via modprobe argument.

lnetctl route show reports that the route is down.  Where I'm stuck is figuring 
out what in userspace (e.g. lnetctl or lctl) can tell me why.

The compute node's lnet configuration is:

[root@ulna66:lustre-211]# cat /etc/lnet.conf
ip2nets:
  - net-spec: o2ib33
interfaces:
 0: hsi0
ip-range:
 0: 192.168.128.*
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33

After I start lnet, systemctl reports success and the state is as follows:

[root@ulna66:lustre-211]# lnetctl net show
net:
- net type: lo
  local NI(s):
- nid: 0@lo
  status: up
- net type: o2ib33
  local NI(s):
- nid: 192.168.128.66@o2ib33
  status: up
  interfaces:
  0: hsi0

[root@ulna66:lustre-211]# lnetctl peer show --verbose
peer:
- primary nid: 192.168.128.4@o2ib33
  Multi-Rail: False
  peer ni:
- nid: 192.168.128.4@o2ib33
  state: up
  max_ni_tx_credits: 8
  available_tx_credits: 8
  min_tx_credits: 7
  tx_q_num_of_buf: 0
  available_rtr_credits: 8
  min_rtr_credits: 8
  refcount: 4
  statistics:
  send_count: 2
  recv_count: 2
  drop_count: 0

[root@ulna66:lustre-211]# lnetctl route show --verbose
route:
- net: o2ib100
  gateway: 192.168.128.4@o2ib33
  hop: -1
  priority: 0
  state: down

I can instrument the code, but I figure there must be someplace available to 
normal users to look, that I'm unaware of.

thanks,

Olaf P. Faaland
Livermore Computing
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] llapi_changelog_start()

2018-04-13 Thread Faaland, Olaf P.
I'm asking WRT Lustre 2.8 as well as later versions if the mechanisms changed.

thanks


From: lustre-discuss <lustre-discuss-boun...@lists.lustre.org> on behalf of 
Faaland, Olaf P. <faala...@llnl.gov>
Sent: Friday, April 13, 2018 5:26 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] llapi_changelog_start()

How is an application supposed to determine the appropriate value for

long long startrec

Do you just start at 0 normally, and possibly get a set of changelog records 
whose IDs are much higher?

thanks,

Olaf P. Faaland
Livermore Computing
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] llapi_changelog_start()

2018-04-13 Thread Faaland, Olaf P.
How is an application supposed to determine the appropriate value for 

long long startrec

Do you just start at 0 normally, and possibly get a set of changelog records 
whose IDs are much higher?

thanks,

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] /proc/fs/lustre/llite/*/stats

2018-04-05 Thread Faaland, Olaf P.
Hi,

I have a couple of questions about these stats.  If these are documented 
somewhere, by all means point me to them.  What I found in the operations 
manual and on the web did not answer my questions.

What do

read_bytes25673 samples [bytes] 1 3366225 145121869
write_bytes   13641 samples [bytes] 1 3366225 468230469

mean in more detail?  I understand that the last three values are MIN/MAX/SUM, 
and that their units are bytes, and that they reflect activity since the file 
system was mounted or since the stats were last cleared.  But more specifically:

samples:  Is this the number of requests issued to servers, e.g. RPC issued 
with opcode OST_READ?  

So if the user called read() 200 times on the same 1K file, which didn't ever 
change and remained cached by the lustre client, and all the data was fetched 
in a single RPC in the first place, then samples would be 1?  

And in that case, would the sum be 1K rather than 200K?

Thanks,

Olaf P. Faaland
Livermore Computing

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre OSS and clients on same physical server

2016-07-15 Thread Faaland, Olaf P.
Cory,

For what it¹s worth, the existing tests and framework run in the
single-node configuration without any special steps (or at least did
within the last year or so).  You just build lustre, run llmount to get
servers up and client mounted, and then run tests/sanity.sh.

You then get varying results each time you do this.  Some tests are
themselves flawed (ie racy), other tests are themselves fine but fail
intermittently because of some more general problem like memory management
issues.  The issues that arise typically aren¹t easy to diagnose in my
experience.

The problem is resources - using resources to investigate this behavior
instead of better testing in the typical multi-node configuration, or
implementing new features, or doing code cleanup, etc.  In other words,
sadly, the usual problem.

-Olaf

On 7/15/16, 1:38 PM, "lustre-discuss on behalf of Cory Spitz"

wrote:

>Good input, Chris.  Thanks.
>
>It sounds like we might need to move this over to lustre-devel.
>
>Someday, I¹d like to see us address some of these things and then add
>some test framework tests that co-locate clients with servers.  Not
>necessarily because we expect co-located services, but because it could
>be a useful driver of keeping Lustre a good memory manager.
>
>-Cory
>
>-- 
>
>
>On 7/15/16, 3:17 PM, "Christopher J. Morrone"  wrote:
>
>On 07/15/2016 12:11 PM, Cory Spitz wrote:
>> Chris,
>> 
>> On 7/13/16, 2:00 PM, "lustre-discuss on behalf of Christopher J.
>>Morrone" >morro...@llnl.gov> wrote:
>> 
>>> If you put both the client and server code on the same node and do any
>>> serious amount of IO, it has been pretty easy in the past to get that
>>> node to go completely out to lunch thrashing on memory issues
>> 
>> Chris, you wrote ³in the past.²  How current is your experience?  I¹m
>>sure it is still a good word of caution, but I¹d venture that modern
>>Lustre (on a modern kernel) might fare a tad bit better.  Does anyone
>>have experience on current releases?
>
>Pretty recent.
>
>We have had memory management issues with servers and clients
>independently at pretty much all periods of time, recent history
>included.  Putting the components together only exacerbates the issues.
>
>Lustre still has too many of its own caches with fixed, or nearly fixed
>caches size, and places where it does not play well with the kernel
>memory reclaim mechanisms.  There are too many places where lustre
>ignores the kernels requests for memory reclaim, and often goes on to
>use even more memory.  That significantly impedes the kernel's ability
>to keep things responsive when memory contention arises.
>
>> I understand that it isn¹t a design goal for us, but perhaps we should
>>pay some attention to this possibility?  Perhaps we¹ll have interest in
>>co-locating clients on servers in the near future as part of a
>>replication, network striping, or archiving capability?
>
>There is going to need to be a lot of work to have Lustre's memory usage
>be more dynamic, more aware of changing conditions on the system, and
>more responsive to the kernel's requests to free memory.  I imagine it
>won't be terribly easy, especially in areas such as dirty and unstable
>data which cannot be freed until it is safe on disk.  But even for that,
>there are no doubt ways to make things better.
>
>Chris
>
>
>
>___
>lustre-discuss mailing list
>lustre-discuss@lists.lustre.org
>http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

2016-07-12 Thread Faaland, Olaf P.
You don't need to modify the initramfs as you describe, although that would 
likely work.

It should be sufficient to rebuild the initramfs.  It's built on your system, 
for example after a kernel installation.  Most likely, at the time the 
initramfs was last built on your system, the zfs-dracut package was installed, 
and that caused the initramfs file to be built with zfs.  Rebuilding now should 
remove the zfs module. 

Redhat has an article that says how to rebuild the initramfs:
https://access.redhat.com/solutions/1958

I can't see their instructions, as I don't have an account for their portal.  I 
suspect it just comes down to running mkinitrd.  Or you can reinstall your 
kernel and let it rebuild the initramfs as a side-affect:

rpm -i --replacepkgs /path/to/your/kernel/rpm

To answer your question about why this is necessary: the kernel first boots and 
loads some drivers etc., using the initramfs, before your normal root 
filesystem is mounted.  During this first boot stage, your hand-edited zfs.conf 
is not present (it's in the root filesystem that hasn't yet been mounted).  
Therefore the zfs module does not get those options passed to it during load.

Olaf P. Faaland
Livermore Computing


From: Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
Sent: Tuesday, July 12, 2016 1:03 PM
To: Faaland, Olaf P.; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

Hello,

lsinitrd | grep zfs
-rwxr--r--   1 root root   493080 Jul 12 12:56
usr/lib/modules/3.10.0-327.18.2.el7.x86_64/extra/kernel/fs/lustre/osd_zfs.ko
-rw-r--r--   1 root root  2245128 Jul 12 12:56
usr/lib/modules/3.10.0-327.18.2.el7.x86_64/extra/zfs.ko
-rw-r--r--   1 root root4 May 12 20:18
usr/lib/modules-load.d/zfs.conf


if I do

lsinitrd -f usr/lib/modules-load.d/zfs.conf
zfs

so looks like zfs is loaded by initramfs

do I have to add a etc/modprobe.d/zfs.conf inside the initramfs ?

I mean /etc/modprobe.d/zfs.conf  from the root partition should be
loaded anyway.

thanks






On 12/07/16 11:51, Faaland, Olaf P. wrote:
> Actually, looks like you can use simpler syntax:
>
> lsinitrd | grep zfs
>
> Olaf P. Faaland
> Livermore Computing
>
> 
> From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
> Faaland, Olaf P. [faala...@llnl.gov]
> Sent: Tuesday, July 12, 2016 11:48 AM
> To: Riccardo Veraldi; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs 
> fs
>
> Riccardo,
>
> Does your initramfs still include zfs-related files?  What is the output of
>
> lsinitrd /boot/init*$(uname -r)* | grep zfs
>
> Olaf P. Faaland
> Livermore Computing
>
> 
> From: Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
> Sent: Monday, July 11, 2016 5:25 PM
> To: Faaland, Olaf P.; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs 
> fs
>
> On 11/07/16 17:02, Faaland, Olaf P. wrote:
>> Riccardo,
>>
>> If you are not booting from a zpool, you do not need the "zfs-dracut" 
>> package.  This package causes ZFS to be loaded very early in the boot 
>> process, most likely before your /etc/modprobe.d/zfs.conf files is visible 
>> to the kernel.
>>
>> As long as you are not booting from a zpool, remove that package (rpm -e) 
>> and boot again to see if your problem is fixed.
> I did it I removed zfs-dracut since I do not boot from ZFS.
>
>cat /etc/modprobe.d/zfs.conf
> options zfs zfs_prefetch_disable=1
> options zfs zfs_txg_history=120
> options zfs metaslab_debug_unload=1
>
> after reboot
>
> cat /sys/module/zfs/parameters/zfs_prefetch_disable
> 0
>
> cat /sys/module/zfs/parameters/zfs_txg_history
> 0
>
> so it looks like the parameters inside zfs.conf are ignored
>
> ls -la /etc/modprobe.d/zfs.conf
> -rw-r--r-- 1 root root 103 Jul 11 17:18 /etc/modprobe.d/zfs.conf
>
>
>
>> -Olaf
>>
>> 
>> From: Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
>> Sent: Monday, July 11, 2016 4:55 PM
>> To: Faaland, Olaf P.; lustre-discuss@lists.lustre.org
>> Subject: Re: [lustre-discuss] problem loading zfs properties for a 
>> lustre/zfs fs
>>
>> On 11/07/16 16:15, Faaland, Olaf P. wrote:
>>> 1) What is the output of:
>>>
>>> rpm -qa | grep zfs
>> libzfs2-0.6.5.7-1.el7.centos.x86_64
>> zfs-dkms-0.6.5.7-1.el7.centos.noarch
>> lustre-osd-zfs-mount-2.8.0-3.10.0_327.18.2.el7.x86_64.x86_64
>> zfs-0.6.5.7-1.el7.centos.x86_64
>> zfs-dracut-0.6.5.7-1.e

Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

2016-07-12 Thread Faaland, Olaf P.
Actually, looks like you can use simpler syntax:

lsinitrd | grep zfs

Olaf P. Faaland
Livermore Computing


From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
Faaland, Olaf P. [faala...@llnl.gov]
Sent: Tuesday, July 12, 2016 11:48 AM
To: Riccardo Veraldi; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

Riccardo,

Does your initramfs still include zfs-related files?  What is the output of

lsinitrd /boot/init*$(uname -r)* | grep zfs

Olaf P. Faaland
Livermore Computing


From: Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
Sent: Monday, July 11, 2016 5:25 PM
To: Faaland, Olaf P.; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

On 11/07/16 17:02, Faaland, Olaf P. wrote:
> Riccardo,
>
> If you are not booting from a zpool, you do not need the "zfs-dracut" 
> package.  This package causes ZFS to be loaded very early in the boot 
> process, most likely before your /etc/modprobe.d/zfs.conf files is visible to 
> the kernel.
>
> As long as you are not booting from a zpool, remove that package (rpm -e) and 
> boot again to see if your problem is fixed.
I did it I removed zfs-dracut since I do not boot from ZFS.

  cat /etc/modprobe.d/zfs.conf
options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1

after reboot

cat /sys/module/zfs/parameters/zfs_prefetch_disable
0

cat /sys/module/zfs/parameters/zfs_txg_history
0

so it looks like the parameters inside zfs.conf are ignored

ls -la /etc/modprobe.d/zfs.conf
-rw-r--r-- 1 root root 103 Jul 11 17:18 /etc/modprobe.d/zfs.conf



> -Olaf
>
> 
> From: Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
> Sent: Monday, July 11, 2016 4:55 PM
> To: Faaland, Olaf P.; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs 
> fs
>
> On 11/07/16 16:15, Faaland, Olaf P. wrote:
>> 1) What is the output of:
>>
>> rpm -qa | grep zfs
> libzfs2-0.6.5.7-1.el7.centos.x86_64
> zfs-dkms-0.6.5.7-1.el7.centos.noarch
> lustre-osd-zfs-mount-2.8.0-3.10.0_327.18.2.el7.x86_64.x86_64
> zfs-0.6.5.7-1.el7.centos.x86_64
> zfs-dracut-0.6.5.7-1.el7.centos.x86_64
> lustre-osd-zfs-2.8.0-3.10.0_327.18.2.el7.x86_64.x86_64
>> from that system after it boots?
>>
>> 2) How do those values get into the /etc/modprobe.d/zfs.conf file?  Are they 
>> there before the node boots, or are modifying that file somehow during the 
>> boot process?
> I edited those values myself inside zfs.conf then rebooted the system,
> but they are not loaded
>
> thank you
>
> Riccardo
>
>
>> Olaf P. Faaland
>> Livermore Computing
>> 
>> From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
>> Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
>> Sent: Monday, July 11, 2016 3:53 PM
>> To: lustre-discuss@lists.lustre.org
>> Subject: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs
>>
>> Hello,
>> I am tailoring my system for lustre on ZFS and I am not able to set
>> these parameters
>> writing the config file /etc/modprobe.d/zfs.conf with the following options
>>
>> options zfs zfs_prefetch_disable=1
>> options zfs zfs_txg_history=120
>> options zfs metaslab_debug:unload=1
>>
>> when I check the parameters after boot in /sys/module/zfs/parameters
>> they are still the defaults, and my modification are not taken.
>>
>> zfs.conf is being ignored at boot time I do not know why.
>>
>> I had to do a
>>
>> echo 1 > /sys/module/zfs/parameters/options zfs zfs_prefetch_disable
>>
>> inside rc.local
>>
>> to have it working
>>
>> I am running RHEL7 with latest kernel  and zfs 0.6.5.7-1
>>
>> any hints ?
>>
>> thank you
>>
>> Riccardo
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

2016-07-12 Thread Faaland, Olaf P.
Riccardo,

Does your initramfs still include zfs-related files?  What is the output of

lsinitrd /boot/init*$(uname -r)* | grep zfs

Olaf P. Faaland
Livermore Computing


From: Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
Sent: Monday, July 11, 2016 5:25 PM
To: Faaland, Olaf P.; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

On 11/07/16 17:02, Faaland, Olaf P. wrote:
> Riccardo,
>
> If you are not booting from a zpool, you do not need the "zfs-dracut" 
> package.  This package causes ZFS to be loaded very early in the boot 
> process, most likely before your /etc/modprobe.d/zfs.conf files is visible to 
> the kernel.
>
> As long as you are not booting from a zpool, remove that package (rpm -e) and 
> boot again to see if your problem is fixed.
I did it I removed zfs-dracut since I do not boot from ZFS.

  cat /etc/modprobe.d/zfs.conf
options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug_unload=1

after reboot

cat /sys/module/zfs/parameters/zfs_prefetch_disable
0

cat /sys/module/zfs/parameters/zfs_txg_history
0

so it looks like the parameters inside zfs.conf are ignored

ls -la /etc/modprobe.d/zfs.conf
-rw-r--r-- 1 root root 103 Jul 11 17:18 /etc/modprobe.d/zfs.conf



> -Olaf
>
> 
> From: Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
> Sent: Monday, July 11, 2016 4:55 PM
> To: Faaland, Olaf P.; lustre-discuss@lists.lustre.org
> Subject: Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs 
> fs
>
> On 11/07/16 16:15, Faaland, Olaf P. wrote:
>> 1) What is the output of:
>>
>> rpm -qa | grep zfs
> libzfs2-0.6.5.7-1.el7.centos.x86_64
> zfs-dkms-0.6.5.7-1.el7.centos.noarch
> lustre-osd-zfs-mount-2.8.0-3.10.0_327.18.2.el7.x86_64.x86_64
> zfs-0.6.5.7-1.el7.centos.x86_64
> zfs-dracut-0.6.5.7-1.el7.centos.x86_64
> lustre-osd-zfs-2.8.0-3.10.0_327.18.2.el7.x86_64.x86_64
>> from that system after it boots?
>>
>> 2) How do those values get into the /etc/modprobe.d/zfs.conf file?  Are they 
>> there before the node boots, or are modifying that file somehow during the 
>> boot process?
> I edited those values myself inside zfs.conf then rebooted the system,
> but they are not loaded
>
> thank you
>
> Riccardo
>
>
>> Olaf P. Faaland
>> Livermore Computing
>> 
>> From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
>> Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
>> Sent: Monday, July 11, 2016 3:53 PM
>> To: lustre-discuss@lists.lustre.org
>> Subject: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs
>>
>> Hello,
>> I am tailoring my system for lustre on ZFS and I am not able to set
>> these parameters
>> writing the config file /etc/modprobe.d/zfs.conf with the following options
>>
>> options zfs zfs_prefetch_disable=1
>> options zfs zfs_txg_history=120
>> options zfs metaslab_debug:unload=1
>>
>> when I check the parameters after boot in /sys/module/zfs/parameters
>> they are still the defaults, and my modification are not taken.
>>
>> zfs.conf is being ignored at boot time I do not know why.
>>
>> I had to do a
>>
>> echo 1 > /sys/module/zfs/parameters/options zfs zfs_prefetch_disable
>>
>> inside rc.local
>>
>> to have it working
>>
>> I am running RHEL7 with latest kernel  and zfs 0.6.5.7-1
>>
>> any hints ?
>>
>> thank you
>>
>> Riccardo
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

2016-07-11 Thread Faaland, Olaf P.
Riccardo,

If you are not booting from a zpool, you do not need the "zfs-dracut" package.  
This package causes ZFS to be loaded very early in the boot process, most 
likely before your /etc/modprobe.d/zfs.conf files is visible to the kernel.

As long as you are not booting from a zpool, remove that package (rpm -e) and 
boot again to see if your problem is fixed.

-Olaf


From: Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
Sent: Monday, July 11, 2016 4:55 PM
To: Faaland, Olaf P.; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

On 11/07/16 16:15, Faaland, Olaf P. wrote:
> 1) What is the output of:
>
> rpm -qa | grep zfs
libzfs2-0.6.5.7-1.el7.centos.x86_64
zfs-dkms-0.6.5.7-1.el7.centos.noarch
lustre-osd-zfs-mount-2.8.0-3.10.0_327.18.2.el7.x86_64.x86_64
zfs-0.6.5.7-1.el7.centos.x86_64
zfs-dracut-0.6.5.7-1.el7.centos.x86_64
lustre-osd-zfs-2.8.0-3.10.0_327.18.2.el7.x86_64.x86_64
>
> from that system after it boots?
>
> 2) How do those values get into the /etc/modprobe.d/zfs.conf file?  Are they 
> there before the node boots, or are modifying that file somehow during the 
> boot process?
I edited those values myself inside zfs.conf then rebooted the system,
but they are not loaded

thank you

Riccardo


>
> Olaf P. Faaland
> Livermore Computing
> 
> From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
> Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
> Sent: Monday, July 11, 2016 3:53 PM
> To: lustre-discuss@lists.lustre.org
> Subject: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs
>
> Hello,
> I am tailoring my system for lustre on ZFS and I am not able to set
> these parameters
> writing the config file /etc/modprobe.d/zfs.conf with the following options
>
> options zfs zfs_prefetch_disable=1
> options zfs zfs_txg_history=120
> options zfs metaslab_debug:unload=1
>
> when I check the parameters after boot in /sys/module/zfs/parameters
> they are still the defaults, and my modification are not taken.
>
> zfs.conf is being ignored at boot time I do not know why.
>
> I had to do a
>
> echo 1 > /sys/module/zfs/parameters/options zfs zfs_prefetch_disable
>
> inside rc.local
>
> to have it working
>
> I am running RHEL7 with latest kernel  and zfs 0.6.5.7-1
>
> any hints ?
>
> thank you
>
> Riccardo
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

2016-07-11 Thread Faaland, Olaf P.
1) What is the output of:

rpm -qa | grep zfs

from that system after it boots?

2) How do those values get into the /etc/modprobe.d/zfs.conf file?  Are they 
there before the node boots, or are modifying that file somehow during the boot 
process?

Olaf P. Faaland
Livermore Computing

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
Riccardo Veraldi [riccardo.vera...@cnaf.infn.it]
Sent: Monday, July 11, 2016 3:53 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] problem loading zfs properties for a lustre/zfs fs

Hello,
I am tailoring my system for lustre on ZFS and I am not able to set
these parameters
writing the config file /etc/modprobe.d/zfs.conf with the following options

options zfs zfs_prefetch_disable=1
options zfs zfs_txg_history=120
options zfs metaslab_debug:unload=1

when I check the parameters after boot in /sys/module/zfs/parameters
they are still the defaults, and my modification are not taken.

zfs.conf is being ignored at boot time I do not know why.

I had to do a

echo 1 > /sys/module/zfs/parameters/options zfs zfs_prefetch_disable

inside rc.local

to have it working

I am running RHEL7 with latest kernel  and zfs 0.6.5.7-1

any hints ?

thank you

Riccardo

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] default directory striping with Lustre 2.8

2016-05-10 Thread Faaland, Olaf P.
Hello Juan,

No, I haven't seen the problem you describe.  Our testing configuration has 
only a single target per server - each MDS has a single MDT, and each OSS has a 
single OST.

We've encountered some issues, but so far stability has been good.  Our testing 
has been on relatively small scale, though.

Olaf P. Faaland
Livermore Computing


From: Juan PC [pier...@ditec.um.es]
Sent: Monday, May 09, 2016 4:25 AM
To: Faaland, Olaf P.; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] default directory striping with Lustre 2.8

Dear Olaf,

I would like to hear about your experience with stripped directories in
Lustre 2.8. Mine is that this feature is still unstable and a lot of
"BUG: soft lockup..." errors start appearing. Maybe the problem is the
setup that I use: as many OSTs as MDTs, with a pair OST-MDT per server.

Have you faced the same problem?

Regards,

Juan


El 05/05/16 a las 01:04, Faaland, Olaf P. escribió:
> Hi,
>
> Suppose you have m MDTs in your filesystem, and create a new directory
> and set default directory striping using
>
> lfs mkdir --count=c --index=k  && lfs setdirstripe --default
> --count=c 
>
> Suppose that c < m and m > 2.
>
> Then you make subdirectories, like
>
> mkdir /child.{1,2,3,...}
>
> a) By design, do the child directories have the same starting index as
> ?
> b) By design, are the child directories all striped across the same set
> of MDTs as ?
>
> I didn't see that specified one way or the other in the DNE phase 2 high
> level design document at
> http://wiki.opensfs.org/DNE_StripedDirectories_HighLevelDesign_wiki_version.
> If I should look elsewhere, let me know.
>
> In a test I was doing today, I noticed that neither (a) nor (b) were
> true in practice.  I'm wondering whether that's a bug or a feature.
> Here's partial output from my test.
>
> $ lfs mkdir --count=6 --index=2 /p/lustre/faaland1/count6_index2
> $ lfs setdirstripe -D --count=6 /p/lustre/faaland1/count6_index2
> $ mkdir
> /p/lustre/faaland1/count6_index2/subdir.{1,2,3,4,5,6,7,8,9,10,11,12,13,14}
> $ lfs getdirstripe /p/lustre/faaland1/count6_index2
> /p/lustre/faaland1/count6_index2
> lmv_stripe_count: 6 lmv_stripe_offset: 2
> mdtidx   FID[seq:oid:ver]
>  2   [0x28400:0x33f3:0x0]
>  3   [0x2c404:0x33f3:0x0]
>  4   [0x30402:0x33f2:0x0]
>  5   [0x34407:0x33f1:0x0]
>  6   [0x38406:0x33f0:0x0]
>  7   [0x3c404:0x33ef:0x0]
> /p/lustre/faaland1/count6_index2/subdir.4
> lmv_stripe_count: 6 lmv_stripe_offset: 2
> mdtidx   FID[seq:oid:ver]
>  2   [0x28400:0x33f5:0x0]
>  3   [0x2c404:0x33f5:0x0]
>  4   [0x30402:0x33f4:0x0]
>  5   [0x34407:0x33f3:0x0]
>  6   [0x38406:0x33f2:0x0]
>  7   [0x3c404:0x33f1:0x0]
> /p/lustre/faaland1/count6_index2/subdir.9
> lmv_stripe_count: 6 lmv_stripe_offset: 5
> mdtidx   FID[seq:oid:ver]
>  5   [0x34400:0x37a1:0x0]
>  6   [0x38405:0x37a1:0x0]
>  7   [0x3c402:0x37a0:0x0]
>  8   [0x4040e:0x379f:0x0]
>  9   [0x44403:0x379e:0x0]
>  0   [0x20405:0x379d:0x0]
> /p/lustre/faaland1/count6_index2/subdir.3
> lmv_stripe_count: 6 lmv_stripe_offset: 5
> mdtidx   FID[seq:oid:ver]
>  5   [0x34400:0x37a0:0x0]
>  6   [0x38405:0x37a0:0x0]
>  7   [0x3c402:0x379f:0x0]
>  8   [0x4040e:0x379e:0x0]
>  9   [0x44403:0x379d:0x0]
>  0   [0x20405:0x379c:0x0]
> /p/lustre/faaland1/count6_index2/subdir.14
> lmv_stripe_count: 6 lmv_stripe_offset: 7
> mdtidx   FID[seq:oid:ver]
>  7   [0x3c400:0x30d4:0x0]
>  8   [0x40403:0x30d4:0x0]
>  9   [0x44405:0x30d3:0x0]
>  0   [0x20407:0x30d2:0x0]
>  1   [0x24407:0x30d1:0x0]
>  2   [0x28407:0x30d0:0x0]
> ...
>
>
> Olaf P. Faaland
> Livermore Computing
> phone : 925-422-2263
>
>
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://secure-web.cisco.com/1b6O1t0NM65SMDLd0VNnGkO8_2o0mWu7q_SotZhqVF-Z73mleDDzd45s2PwtZKJUD3Ztk-yckmuoGMq16E0Fw-LUq6hUtZjrPDpumyab9NGBeDZW21yP4CD9mliUhSCZwbhRAGnVnx6TJbxY-lfFJNAdY6Pdfv1hIx4bNQDu5cYl-aIXJWj15Y_Lum0LphN60mITae7Uf-w6xbGKd5g1bEyUK5SHoQ-NIJdGLF215kUypB_BB-96iWRj3tsoPdUp_ClSF7fBDItzoBV5jgOeHam7Ne6ztUVj7zqKJ8wQEEwJAC9CV_u7N7rLrxVaOVuFO2drWjpFe8tnmgn8tlPCjaH8Q4Y85Unr_I3dcOI4xZyo/http%3A%2F%2Flists.lustre.org%2Flisti

Re: [lustre-discuss] LMT 3.2 - MDT display

2016-04-13 Thread Faaland, Olaf P.
Reposting to the correct mailing list.

To: Crowe, Tom; lustre-de...@lists.lustre.org
Subject: Re: [lustre-devel] LMT 3.2 - MDT display

Hi Tim,

It sounds like maybe you see the summary and the OST list, but not the MDT list.

The ltop display has 3 different components:
[ summary ]
[ mdt list ]
[ ost list ]

The summary at the top has all the fields displayed in the mdt list; so if you 
have only 1 MDT, then it doesn't waste the screen space used by an MDT list, 
and just shows
[summary]
[ ost list]

Does that explain it?

thanks,
Olaf P. Faaland
Livermore Computing
phone : 925-422-2263

From: lustre-devel [lustre-devel-boun...@lists.lustre.org] on behalf of Crowe, 
Tom [thcr...@iu.edu]
Sent: Wednesday, April 13, 2016 1:45 PM
To: lustre-de...@lists.lustre.org
Subject: [lustre-devel] LMT 3.2 - MDT display

Greetings,

After seeing a presentation at LUG16 
(http://cdn.opensfs.org/wp-content/uploads/2016/04/LUG2016D1_LMT-Only-a-Flesh-Wound_Faaland.pdf),
 I was excited to try the updated version of LMT (3.2) as it appeared detailed 
MDT/MDS information is included in ltop output.

I have successfully built and installed LMT 3.2, but there is no display of MDT 
info.

Wondering how I might troubleshoot this further. Everything is working, its 
just not giving me the MDT/MDS info as I expected it might.

Thanks,
Tom Crowe
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] dne2: lfs setdirstripe

2015-09-03 Thread Faaland, Olaf P.
Patrick,

Perhaps I'm not seeing something in front of my face.  So far I've found only 
design documents discussing DNE phase I.

The places I have found any design docs at all are here:

http://wiki.old.lustre.org/index.php/Lustre_Design_Documents
https://wiki.hpdd.intel.com/display/PUB/Intel+Design+documents

Where else should I be looking?

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263


From: Patrick Farrell [p...@cray.com]
Sent: Wednesday, September 02, 2015 7:50 PM
To: Faaland, Olaf P.; lustre-discuss@lists.lustre.org
Subject: RE: dne2: lfs setdirstripe

Olaf,

I can explain the rationale for the restrictions, though I have not verified if 
the root only one applies to striped as well as remote directories.  (It's a 
simple test, though.  I'm just not where I can reach a test system.)

Note, to be clear: DNE 2 does not replace DNE 1.  Remote directories and 
striped directories are different things and can coexist.

For enable_remote_dir, it applies only to remote directories - not striped 
directories.

As for the rationale: If enabled, it complicates things notably from an 
administrative perspective...  If you have multiple MDT changes in a path, it 
makes it harder to know what is where, and can cause files on, for example, 
MDT2 or MDT0, to become unreachable if MDT1 is lost.  Also, if you think 
carefully, it doesn't really enable any use cases that can't be done otherwise 
- at least, none that we could find that seemed practical.

As far as the root only thing:
Imagine you are trying to split the load between your MDTs by assigning 
particular users to particular MDTs.  If your users can create their own remote 
directories, they can escape this restriction.  Also, you can open up 
permission by setting it to -1.

I learned this by a mix of reading design docs, experimenting, and being at 
least tangentially involved via the PAC.
I'd suggest design docs as a good place to look for more.

- Patrick Farrell

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
Faaland, Olaf P. [faala...@llnl.gov]
Sent: Wednesday, September 02, 2015 5:21 PM
To: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] dne2: lfs setdirstripe

The lustre we are testing with is built from commit

ea383222e031cdceffbdf2e3afab3b6d1fd53c8e

which is after tag 2.7.57 but before 2.7.59; so recent but not entirely current.

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
Faaland, Olaf P. [faala...@llnl.gov]
Sent: Wednesday, September 02, 2015 3:17 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] dne2: lfs setdirstripe

Hi,

We have begun work on testing DNE with ZFS backend.  So far we've only done the 
installation of the filesystem and begun educating ourselves.

I see in man lfs, that "lfs setdirstripe" has some restrictions by default
 - only executable by root unless "mdt.*.enable_remote_dir_gid" is set
 - only directories on MDT can contain directories that are not on the same 
MDT unless "mdt.*.enable_remote_dir"

1. Are those restrictions still current, or do they refer to DNE phase 1 
restrictions that no longer apply?

2. If the first, allowing only root to invoke "lfs setdirstripe" is current, 
what is the rationale?

3. Is there documentation, or a mailing list thread, that we should read prior 
to posting questions?

Thanks,

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] dne2: lfs setdirstripe

2015-09-03 Thread Faaland, Olaf P.
Ah!  Thank you.

I'll put links on lustre.org so others can find them more easily.

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263


From: Henwood, Richard [richard.henw...@intel.com]
Sent: Thursday, September 03, 2015 10:08 AM
To: Faaland, Olaf P.
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] dne2: lfs setdirstripe

On Thu, 2015-09-03 at 16:47 +, Faaland, Olaf P. wrote:
> Patrick,
>
> Perhaps I'm not seeing something in front of my face.  So far I've found only 
> design documents discussing DNE phase I.
>
> The places I have found any design docs at all are here:
>
> http://wiki.old.lustre.org/index.php/Lustre_Design_Documents
> https://wiki.hpdd.intel.com/display/PUB/Intel+Design+documents
>
> Where else should I be looking?
>

There are DNE1 and DNE2 specific design docs here:

http://wiki.opensfs.org/Contract_SFS-DEV-001

r,


--
richard.henw...@intel.com
Intel High Performance Data Division
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] dne2: lfs setdirstripe

2015-09-02 Thread Faaland, Olaf P.
Hi,

We have begun work on testing DNE with ZFS backend.  So far we've only done the 
installation of the filesystem and begun educating ourselves.

I see in man lfs, that "lfs setdirstripe" has some restrictions by default
 - only executable by root unless "mdt.*.enable_remote_dir_gid" is set
 - only directories on MDT can contain directories that are not on the same 
MDT unless "mdt.*.enable_remote_dir"

1. Are those restrictions still current, or do they refer to DNE phase 1 
restrictions that no longer apply?

2. If the first, allowing only root to invoke "lfs setdirstripe" is current, 
what is the rationale?

3. Is there documentation, or a mailing list thread, that we should read prior 
to posting questions?

Thanks,

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] dne2: lfs setdirstripe

2015-09-02 Thread Faaland, Olaf P.
The lustre we are testing with is built from commit

ea383222e031cdceffbdf2e3afab3b6d1fd53c8e

which is after tag 2.7.57 but before 2.7.59; so recent but not entirely current.

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263

From: lustre-discuss [lustre-discuss-boun...@lists.lustre.org] on behalf of 
Faaland, Olaf P. [faala...@llnl.gov]
Sent: Wednesday, September 02, 2015 3:17 PM
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] dne2: lfs setdirstripe

Hi,

We have begun work on testing DNE with ZFS backend.  So far we've only done the 
installation of the filesystem and begun educating ourselves.

I see in man lfs, that "lfs setdirstripe" has some restrictions by default
 - only executable by root unless "mdt.*.enable_remote_dir_gid" is set
 - only directories on MDT can contain directories that are not on the same 
MDT unless "mdt.*.enable_remote_dir"

1. Are those restrictions still current, or do they refer to DNE phase 1 
restrictions that no longer apply?

2. If the first, allowing only root to invoke "lfs setdirstripe" is current, 
what is the rationale?

3. Is there documentation, or a mailing list thread, that we should read prior 
to posting questions?

Thanks,

Olaf P. Faaland
Livermore Computing
phone : 925-422-2263
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] lustre 2.5.3 changelog mask question

2015-05-29 Thread Faaland, Olaf P.
Kurt,

The 2.5.3 code appears to me to support the following list of changelog
mask values:

static const char *changelog_str[] = {
MARK,  CREAT, MKDIR, HLINK, SLINK, MKNOD,
UNLNK,
RMDIR, RENME, RNMTO, OPEN,  CLOSE, LYOUT,
TRUNC,
SATTR, XATTR, HSM,   MTIME, CTIME, ATIME,
};


IOCTL went away between 2.3 and 2.4, see LU-3279
RNMFM went away between 2.2.90 and 2.2.91, see LU-1331

-Olaf

On 5/29/15, 6:45 AM, Kurt Strosahl stros...@jlab.org wrote:

Good Morning,

   I'm trying to set the changelog mask on a lustre 2.5.3 system and it
seems that the documentation isn't 100% accurate.

According to the documentation found at
https://build.hpdd.intel.com/job/lustre-manual/lastSuccessfulBuild/artifac
t/lustre_manual.pdf one of the values is IOCTL.  However when I try to
set the mask I get the following error...

lctl set_param mdd.lustre2-MDT.changelog_mask=MKDIR CREAT HLINK
UNLNK RMDIR RENME RNMTO TRUNC SATTR IOCTL
mdd.lustre2-MDT.changelog_mask=MKDIR CREAT HLINK UNLNK RMDIR RENME
RNMTO TRUNC SATTR IOCTL
error: set_param: setting
/proc/fs/lustre/mdd/lustre2-MDT/changelog_mask=MKDIR CREAT HLINK
UNLNK RMDIR RENME RNMTO TRUNC SATTR IOCTL: Invalid argument

The same seems to be true for RNMFM.

What was that value changed to?

w/r,
Kurt
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] tag feature of ltop

2015-05-21 Thread Faaland, Olaf P.
OK, thanks.

Olaf P. Faaland
LLNL

From: Alexander I Kulyavtsev [a...@fnal.gov]
Sent: Thursday, May 21, 2015 4:25 PM
To: Faaland, Olaf P.
Cc: Alexander I Kulyavtsev; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] tag feature of ltop

As you said in earlier mail, when I sort by IO rate, locks, etc., selected OST 
are jumping around or just in different rows as seen below.

Anyway, this is not strong desire in the long term: I may feed cerebro output 
to the web page.

Best regards, Alex.

On May 21, 2015, at 5:42 PM, Faaland, Olaf P. 
faala...@llnl.govmailto:faala...@llnl.gov wrote:

Alexander,

Thanks for your reply.

ltop also lets you sort by OSS, so that the OSTs sharing an OSS are all next to 
each other.  Do you find tagging more helpful than that?

Olaf P. Faaland
LLNL

From: Alexander I Kulyavtsev [a...@fnal.govmailto:a...@fnal.gov]
Sent: Thursday, May 21, 2015 2:59 PM
To: Faaland, Olaf P.
Cc: Alexander I Kulyavtsev; 
lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] tag feature of ltop

It may have sense to keep tagging.

I marked OSS dslustre15 and then switched to OST view.  I have all OSTs on 
marked OSS highlighted:

005c F dslustre13  10160 1 0 16503001   95   80
005d F dslustre14  10160 0 0 07472001   95   79
005e F dslustre13  10160 0 0 08880001   95   75
005f F dslustre14  10160 0 0 07161001   95   77
0060 F dslustre15  10160 0 0 0747800   48   96   76
0061 F dslustre16  10160 0 0 07543001   96   78
0062 F dslustre15  10160 0 0 0690700   48   96   78
0063 F dslustre16  10160 0 0 07410001   96   75
0064 F dslustre15  10160 0 0 0611300   48   96   80
0065 F dslustre16  10160 0 0 06833001   96   78
0066 F dslustre15  10160 1 0 1654500   48   96   78
0067 F dslustre16  10160 1 0 17190001   96   78

I was about to say 'we do not use it' yesterday; tracking some issue today.
Thanks, ALex.


On May 18, 2015, at 7:29 PM, Faaland, Olaf P. 
faala...@llnl.govmailto:faala...@llnl.gov wrote:

Hello,

I am working on updating ltop, the text client within LMT 
(https://github.com/chaos/lmt/wiki).  I am adding support for DNE (multiple 
active MDT's within a single filesystem).

In the interesting of keeping the tool free of cruft, I am asking the community 
about their usage.

Currently, ltop allows for the user to tag an OST or an OSS, which causes the 
row(s) for that OSS (or OST's on that OSS) to be underlined so that they stand 
out visually.  Presumably this is so that one can follow an OST as it bounces 
around the table, when the table is sorted by something that changes 
dynamically like CPU usage or lock count.

Does anyone use this feature?  The first few people I polled do not use it, but 
if others use it I will extend it to the MDT's.  If no one uses it, then I'll 
remove it entirely.

Thanks,

Olaf P. Faaland
Livermore Computing
Lawrence Livermore National Lab
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] tag feature of ltop

2015-05-21 Thread Faaland, Olaf P.
Alexander,

Thanks for your reply.

ltop also lets you sort by OSS, so that the OSTs sharing an OSS are all next to 
each other.  Do you find tagging more helpful than that?

Olaf P. Faaland
LLNL

From: Alexander I Kulyavtsev [a...@fnal.gov]
Sent: Thursday, May 21, 2015 2:59 PM
To: Faaland, Olaf P.
Cc: Alexander I Kulyavtsev; lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] tag feature of ltop

It may have sense to keep tagging.

I marked OSS dslustre15 and then switched to OST view.  I have all OSTs on 
marked OSS highlighted:

005c F dslustre13  10160 1 0 16503001   95   80
005d F dslustre14  10160 0 0 07472001   95   79
005e F dslustre13  10160 0 0 08880001   95   75
005f F dslustre14  10160 0 0 07161001   95   77
0060 F dslustre15  10160 0 0 0747800   48   96   76
0061 F dslustre16  10160 0 0 07543001   96   78
0062 F dslustre15  10160 0 0 0690700   48   96   78
0063 F dslustre16  10160 0 0 07410001   96   75
0064 F dslustre15  10160 0 0 0611300   48   96   80
0065 F dslustre16  10160 0 0 06833001   96   78
0066 F dslustre15  10160 1 0 1654500   48   96   78
0067 F dslustre16  10160 1 0 17190001   96   78

I was about to say 'we do not use it' yesterday; tracking some issue today.
Thanks, ALex.


On May 18, 2015, at 7:29 PM, Faaland, Olaf P. 
faala...@llnl.govmailto:faala...@llnl.gov wrote:

Hello,

I am working on updating ltop, the text client within LMT 
(https://github.com/chaos/lmt/wiki).  I am adding support for DNE (multiple 
active MDT's within a single filesystem).

In the interesting of keeping the tool free of cruft, I am asking the community 
about their usage.

Currently, ltop allows for the user to tag an OST or an OSS, which causes the 
row(s) for that OSS (or OST's on that OSS) to be underlined so that they stand 
out visually.  Presumably this is so that one can follow an OST as it bounces 
around the table, when the table is sorted by something that changes 
dynamically like CPU usage or lock count.

Does anyone use this feature?  The first few people I polled do not use it, but 
if others use it I will extend it to the MDT's.  If no one uses it, then I'll 
remove it entirely.

Thanks,

Olaf P. Faaland
Livermore Computing
Lawrence Livermore National Lab
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.orgmailto:lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] tag feature of ltop

2015-05-18 Thread Faaland, Olaf P.
Hello,

I am working on updating ltop, the text client within LMT 
(https://github.com/chaos/lmt/wiki).  I am adding support for DNE (multiple 
active MDT's within a single filesystem).

In the interesting of keeping the tool free of cruft, I am asking the community 
about their usage.

Currently, ltop allows for the user to tag an OST or an OSS, which causes the 
row(s) for that OSS (or OST's on that OSS) to be underlined so that they stand 
out visually.  Presumably this is so that one can follow an OST as it bounces 
around the table, when the table is sorted by something that changes 
dynamically like CPU usage or lock count.

Does anyone use this feature?  The first few people I polled do not use it, but 
if others use it I will extend it to the MDT's.  If no one uses it, then I'll 
remove it entirely.

Thanks,

Olaf P. Faaland
Livermore Computing
Lawrence Livermore National Lab
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org