Re: [lustre-discuss] Build lustre on ubuntu 22.04

2024-02-08 Thread Åke Sandgren
Have you looked at the instructions here, 
https://wiki.whamcloud.com/pages/viewpage.action?pageId=63968116


From: lustre-discuss  on behalf of Yao 
Weng via lustre-discuss 
Sent: Thursday, February 8, 2024 20:21
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Build lustre on ubuntu 22.04

Hi,
I have a question about installing lustre on Ubuntu. There is no lustre-server 
deb package in 
https://downloads.whamcloud.com/public/lustre.
 We have to build the deb package ourselves

We use ubuntu 22.04

Distributor ID:Ubuntu

Description:Ubuntu 22.04.3 LTS

Release:22.04

Codename:jammy

Kernel is

5.15.0-56-generic

We got gcc error when run

./configure 
--with-linux=/home//linux-source-5.15.0/linux-source-5.15.0/


gcc: error: unrecognized command-line option '-V'

gcc: fatal error: no input files

gcc: error: unrecognized command-line option '-qversion'; did you mean 
'--version'?

our gcc is

gcc -v

Using built-in specs.

COLLECT_GCC=gcc

COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper

OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa

OFFLOAD_TARGET_DEFAULT=1

Target: x86_64-linux-gnu

Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
11.4.0-1ubuntu1~22.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs 
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr 
--with-gcc-major-version-only --program-suffix=-11 
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib 
--enable-libphobos-checking=release --with-target-system-zlib=auto 
--enable-objc-gc=auto --enable-multiarch --disable-werror --enable-cet 
--with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 
--enable-multilib --with-tune=generic 
--enable-offload-targets=nvptx-none=/build/gcc-11-XeT9lY/gcc-11-11.4.0/d
 
ebian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-XeT9lY/gcc-11-11.4.0/debian/tmp-gcn/usr
 --without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu 
--host=x86_64-linux-gnu --target=x86_64-linux-gnu 
--with-build-config=bootstrap-lto-lean --enable-link-serialization=2

Thread model: posix

Supported LTO compression algorithms: zlib zstd

gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)


Could anyone help suggest how to properly build the lustre package ?

Thank you !
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Problems pushing update to patch

2023-11-13 Thread Åke Sandgren
Apparently RSA keys are no longer accepted. Updated key and now it works.


From: Åke Sandgren
Sent: Thursday, November 9, 2023 8:37
To: Kira Duwe via lustre-discuss
Subject: Problems pushing update to patch

Hi!

I've probably messed up somehow but I can't see how.
When doing git push I'm getting this:
===
skalman [lustre-release-whamcloud]$ git push review HEAD:refs/for/master
ak...@review.whamcloud.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
skalman [lustre-release-whamcloud]$
===

My ssh key for review.whamcloud.com matches what is available on review, and 
this worked when i made the initial push a while back
(LU-16819).

===
skalman [lustre-release-whamcloud]$ cat .git/config
[remote "review"]
url = ssh://review.whamcloud.com/fs/lustre-release
fetch = +refs/heads/*:refs/remotes/review/*
===
===
skalman [lustre-release-whamcloud]$ grep -A7 review ~/.ssh/config
Host review.whamcloud.com
PreferredAuthentications publickey
User ake_s
Port 29418
ForwardX11 no
ForwardAgent no
IdentityFile ~/.ssh/lustre
===
===
skalman [lustre-release-whamcloud]$ ssh-add -l | grep lustre
2048 SHA256:vpe/... /home/a/ake/.ssh/lustre (RSA)
===

Any ideas?

---
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Problems pushing update to patch

2023-11-08 Thread Åke Sandgren
Hi!

I've probably messed up somehow but I can't see how.
When doing git push I'm getting this:
===
skalman [lustre-release-whamcloud]$ git push review HEAD:refs/for/master
ak...@review.whamcloud.com: Permission denied (publickey).
fatal: Could not read from remote repository.

Please make sure you have the correct access rights
and the repository exists.
skalman [lustre-release-whamcloud]$ 
===

My ssh key for review.whamcloud.com matches what is available on review, and 
this worked when i made the initial push a while back
(LU-16819).

===
skalman [lustre-release-whamcloud]$ cat .git/config 
[remote "review"]
url = ssh://review.whamcloud.com/fs/lustre-release
fetch = +refs/heads/*:refs/remotes/review/*
===
===
skalman [lustre-release-whamcloud]$ grep -A7 review ~/.ssh/config 
Host review.whamcloud.com
PreferredAuthentications publickey
User ake_s
Port 29418
ForwardX11 no
ForwardAgent no
IdentityFile ~/.ssh/lustre
===
===
skalman [lustre-release-whamcloud]$ ssh-add -l | grep lustre
2048 SHA256:vpe/... /home/a/ake/.ssh/lustre (RSA)
===

Any ideas?

---
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Problem pushing PR to review

2023-05-11 Thread Åke Sandgren
This worked when done from another machine, so ignore.


From: lustre-discuss  on behalf of Åke 
Sandgren 
Sent: Thursday, May 11, 2023 9:19
To: lustre-discuss@lists.lustre.org
Subject: [lustre-discuss] Problem pushing PR to review

Hi!

I'm currently getting
Received disconnect from 138.197.209.17 port 29418:2: Too many authentication 
failures: 7
Disconnected from 138.197.209.17 port 29418
fatal: Could not read from remote repository.

when trying to push a PR for review for LU-16819
It's been a while but my ssh keys is valid, so unless the instructions on 
https://wiki.lustre.org/Using_Gerrit are out of date I can't currently figure 
out what the problem is.

---
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se/
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Problem pushing PR to review

2023-05-11 Thread Åke Sandgren
Hi!

I'm currently getting
Received disconnect from 138.197.209.17 port 29418:2: Too many authentication 
failures: 7
Disconnected from 138.197.209.17 port 29418
fatal: Could not read from remote repository.

when trying to push a PR for review for LU-16819
It's been a while but my ssh keys is valid, so unless the instructions on 
https://wiki.lustre.org/Using_Gerrit are out of date I can't currently figure 
out what the problem is.

---
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Question regarding LAD'22 talk on tuning

2022-10-06 Thread Åke Sandgren

Hi!

I'd like to get in contact with the people behind the "Tuning Lustre in 
a LNet routed environment" talk at LAD'22


We have some questions for them...

--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] need info regarding TCP ports for lustre

2022-06-13 Thread Åke Sandgren



On 6/14/22 00:51, Andreas Dilger via lustre-discuss wrote:
*Most* of the port 988 connections will be client->server, but 
occasionally if there is a network problem and the client connection is 
dropped, then server->client connections may be initiated to cancel a 
lock or similar.  If this server->client connection cannot be 
established, then the client may be evicted.


How does that part work with LNet routers inbetween client and server? 
Which IP will it try to talk to?


--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Small bug in 2.14 debian/rules

2022-06-02 Thread Åke Sandgren

No I didn't, forgot to. Sebastien handled it though.

On 6/3/22 00:16, Peter Jones wrote:

Ake

Have you opened a JIRA ticket about this?

Peter

On 2022-06-02, 2:01 AM, "lustre-discuss on behalf of Åke Sandgren" 
 wrote:

 Sorry, meant master not 2.14, and it's commit a5084c2f2ec

 On 6/2/22 10:33, Åke Sandgren wrote:
 > Hi!
 >
 > In 2.14.0 debian/rules there is a comment that breaks a commands \-line
 > continuation causing exported env vars to be dropped.
 >
 > Simple fix is (configure-stamp rule):
 > ===
 > diff --git a/debian/rules b/debian/rules
 > index df80c077ba..14b1892fbf 100755
 > --- a/debian/rules
 > +++ b/debian/rules
 > @@ -199,8 +203,6 @@ configure-stamp: autogen-stamp debian/control.main
 > debian/control.modules.in
 >  elif echo "$${DEB_BUILD_PROFILES}" | grep -qw "nocrypto"; then \
 >  export EXTRAFLAGS="$${EXTRAFLAGS} --disable-crypto"; \
 >  fi; \
 > -   # remove env variables from config cache built by initial
 > configure,
 > -   # and create dedicated cache in temporary build directory
 >  if [ -f "$${CONFIG_CACHE_FILE}" ]; then \
 >  export TMP_CACHE_FILE=$$(mktemp); \
 >  sed "/ac_cv_env/d" "$${CONFIG_CACHE_FILE}" >
 > $${TMP_CACHE_FILE}; \
 > ===
 >
 > Same bug exists in the kdist_config rule.
 >

 --
 Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
 Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
 WWW: http://www.hpc2n.umu.se
 ___
 lustre-discuss mailing list
 lustre-discuss@lists.lustre.org
 http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org



--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Misplaced position for two glibc checks

2022-06-02 Thread Åke Sandgren

Hi!

The tests for LC_GLIBC_SUPPORT_FHANDLES and 
LC_GLIBC_SUPPORT_COPY_FILE_RANGE must be in the "core" set of configure 
tests, i.e. in the

===
AC_DEFUN([LC_CONFIGURE], [
AC_MSG_NOTICE([Lustre core checks
===
section. The reason for that is that they are required for the 
client/server utils code and not only for the kernel part.


This pops up if configuring with --disable-modules --enable-client and 
making the client utilities only, think of a make dkms-debs that does 
NOT produce the kernel modules but only the DKMS package and the utilities.


This probably won't pop up easily without another change I have 
regarding the setting of CPPFLAGS for uapi which are also needed for 
client utils only builds.



PS
Can you point me to the URL for how to correctly produce PR's again, I 
lost that info some time ago  I seem to remember there being some 
more steps to do than I'm used to.


--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Small bug in 2.14 debian/rules

2022-06-02 Thread Åke Sandgren

Sorry, meant master not 2.14, and it's commit a5084c2f2ec

On 6/2/22 10:33, Åke Sandgren wrote:

Hi!

In 2.14.0 debian/rules there is a comment that breaks a commands \-line 
continuation causing exported env vars to be dropped.


Simple fix is (configure-stamp rule):
===
diff --git a/debian/rules b/debian/rules
index df80c077ba..14b1892fbf 100755
--- a/debian/rules
+++ b/debian/rules
@@ -199,8 +203,6 @@ configure-stamp: autogen-stamp debian/control.main 
debian/control.modules.in

     elif echo "$${DEB_BUILD_PROFILES}" | grep -qw "nocrypto"; then \
     export EXTRAFLAGS="$${EXTRAFLAGS} --disable-crypto"; \
     fi; \
-   # remove env variables from config cache built by initial 
configure,

-   # and create dedicated cache in temporary build directory
     if [ -f "$${CONFIG_CACHE_FILE}" ]; then \
     export TMP_CACHE_FILE=$$(mktemp); \
     sed "/ac_cv_env/d" "$${CONFIG_CACHE_FILE}" > 
$${TMP_CACHE_FILE}; \

===

Same bug exists in the kdist_config rule.



--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Small bug in 2.14 debian/rules

2022-06-02 Thread Åke Sandgren

Hi!

In 2.14.0 debian/rules there is a comment that breaks a commands \-line 
continuation causing exported env vars to be dropped.


Simple fix is (configure-stamp rule):
===
diff --git a/debian/rules b/debian/rules
index df80c077ba..14b1892fbf 100755
--- a/debian/rules
+++ b/debian/rules
@@ -199,8 +203,6 @@ configure-stamp: autogen-stamp debian/control.main 
debian/control.modules.in

elif echo "$${DEB_BUILD_PROFILES}" | grep -qw "nocrypto"; then \
export EXTRAFLAGS="$${EXTRAFLAGS} --disable-crypto"; \
fi; \
-   # remove env variables from config cache built by initial configure,
-   # and create dedicated cache in temporary build directory
if [ -f "$${CONFIG_CACHE_FILE}" ]; then \
export TMP_CACHE_FILE=$$(mktemp); \
sed "/ac_cv_env/d" "$${CONFIG_CACHE_FILE}" > 
$${TMP_CACHE_FILE}; \

===

Same bug exists in the kdist_config rule.

--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Avoiding system cache when using ssd pfl extent

2022-05-20 Thread Åke Sandgren




On 5/20/22 09:53, Andreas Dilger via lustre-discuss wrote:

To elaborate a bit on Patrick's answer, there is no mechanism to do this on the 
*client*, because the performance difference between client RAM and server 
storage is still fairly significant, especially if the application is doing 
sub-page read or write operations.

However, on the *server* the OSS and MDS will *not* put flash storage into the 
page cache, because using the kernel page cache has a measurable overhead, and 
(at least in our testing) the performance of NVMe IOPS is actually better 
*without* the page cache because more CPU is available to handle RPCs.  This is 
controlled on the server with 
osd-ldiskfs.*.{read_cache_enable,writethrough_cache_enable}, default to 0 if 
the block device is non-rotational, default to 1 if block device is rotational.


Then my question is, what is it checking to determine non-rotational?

On our systems the NVMe disks have read/writethrough_cache_enable = 1 
(DDN SFA400NVXE) with

===
/dev/sde on /lustre/stor10/ost (NVMe)
cat /sys/block/sde/queue/rotational
0
lctl get_param osd-ldiskfs.*.*cache*enable
osd-ldiskfs.stor10-OST.read_cache_enable=1
osd-ldiskfs.stor10-OST.writethrough_cache_enable=1

EXAScaler SFA CentOS 5.2.3-r5
kmod-lustre-2.12.6_ddn58-1.el7.x86_64
===

--
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se  Mobile: +46 70 7716134  Fax: +46 90-580 14
WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] IPoIB best practises

2022-01-16 Thread Åke Sandgren



On 1/17/22 2:36 AM, Angelos Ching via lustre-discuss wrote:
> Hi Eli,
> 
> Yes & no; part of my info is a bit rusty because I carried them from
> version around 2.10. MR is now turned on by default.
> 
> But you'll need to have an IP setup on each IPoIB interface, and for all
> ib0 & all ib1 interface, they should be in different subnet. Eg: all ib0
> on 192.168.100.0/24 and all ib1 on 192.168.101.0/24

The multirail setup we have is that both ib0 and ib1 are on the same
subnet, that's how DDN configured it for us.

ip a s ib0 | grep inet
inet 172.27.1.30/24 brd 172.27.1.255 scope global ib0
ip a s ib1 | grep inet
inet 172.27.1.50/24 brd 172.27.1.255 scope global ib1

and the modprobe config is

options lnet networks="o2ib1(ib0,ib1)"

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Jobstats Support with Singularity Container

2021-12-14 Thread Åke Sandgren



On 12/14/21 11:29 AM, Andreas Dilger via lustre-discuss wrote:
> The JobID is provided by the clients, the servers don't really care how
> it was generated.

It just has to be <= 32 chars (or is that < 32)

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] OST -> MDT migration and MDT -> OST migration

2021-04-15 Thread Åke Sandgren



On 4/15/21 5:12 AM, Andreas Dilger via lustre-discuss wrote:
> On Apr 14, 2021, at 18:42, Bill Anderson via lustre-discuss
>  > wrote:
>>
>>
>>    Hi All,
>>
>>    I'm trying to figure out how to migrate files stored on an OST to
>> an MDT (that's using DoM) and to migrate files stored on an MDT to an
>> OST (e.g., if the MDT is getting full).  I can see how to migrate
>> between OSTs, but not between OSTs and MDTs.  I'm running Lustre 2.12.3.
>>
>>  Do you happen to know the syntax for migration between OSTs and MDTs?
>>
>>    Thanks for any help!
> 
> You just use "lfs migrate  " to migrate from DoM
> files to OST objects.  However, OST-to-DoM migration isn't available
> until 2.13.

Couldn't you just use lfs mirror extend with a PFL that points into the
DoM, and then a resync/split


-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Speed of deleting small files on OST vs DoM

2020-02-17 Thread Åke Sandgren
Hi!

Is there a good reason why deleting lots of small files (io500 test
md_easy/hard_delete) with the files on OSTs are up to two times faster
then when using DoM with the whole file(s) on the MDT?

Using server/client 2.13.0
DoM up to 64k, test files < 4k

I can see that the actual data deletion with the data on OST is
asynchronous, but I see no reason for it to be almost two times faster.

Both MDT's and OSTs are SSDs.

The situation is basically the same for a single task and for multiple
clients with multiple tasks/client.

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Slow release of inodes on OST

2020-02-08 Thread Åke Sandgren
The filesystems are completely idle during this. It's a test setup where
I'm running io500 and doing nothing else.

I set
osp.rsos-OST-osc-MDT.max_rpcs_in_flight=512
osp.rsos-OST-osc-MDT.max_rpcs_in_progress=32768
which severely reduced my waiting time between runs.
The in_progress being the one that actually affected things.

On 2/8/20 4:50 AM, Andreas Dilger wrote:
> I haven't looked at that code recently, but I suspect that it is waiting
> for journal commits to complete
> every 5s before sending another batch of destroys?  Is the filesystem
> otherwise idle or something?
> 
> 
>> On Feb 7, 2020, at 02:34, Åke Sandgren > <mailto:ake.sandg...@hpc2n.umu.se>> wrote:
>>
>> Loocking at the osp.*.sync* values i see
>> osp.rsos-OST-osc-MDT.sync_changes=14174002
>> osp.rsos-OST-osc-MDT.sync_in_flight=0
>> osp.rsos-OST-osc-MDT.sync_in_progress=4096
>> osp.rsos-OST-osc-MDT.destroys_in_flight=14178098
>>
>> And it takes 10 sec between changes of those values.
>>
>> So is there any other tunable I can tweak on either OSS or MDS side?
>>
>> On 2/6/20 6:58 AM, Andreas Dilger wrote:
>>> On Feb 4, 2020, at 07:23, Åke Sandgren >> <mailto:ake.sandg...@hpc2n.umu.se>
>>> <mailto:ake.sandg...@hpc2n.umu.se>> wrote:
>>>>
>>>> When I create a large number of files on an OST and then remove them,
>>>> the used inode count on the OST decreases very slowly, it takes several
>>>> hours for it to go from 3M to the correct ~10k.
>>>>
>>>> (I'm running the io500 test suite)
>>>>
>>>> Is there something I can do to make it release them faster?
>>>> Right now it has gone from 3M to 1.5M in 6 hours, (lfs df -i).
>>>
>>> It this the object count or the file count?  Are you possibly using a
>>> lot of
>>> stripes on the files being deleted that is multiplying the work needed?
>>>
>>>> These are SSD based OST's in case it matters.
>>>
>>> The MDS controls the destroy of the OST objects, so there is a rate
>>> limit, but ~700/s seems low to me, especially for SSD OSTs.
>>>
>>> You could check "lctl get_param osp.*.sync*" on the MDS to see how
>>> many destroys are pending.  Also, increasing osp.*.max_rpcs_in_flight
>>> on the MDS might speed this up?  It should default to 32 per OST on
>>> the MDS vs. default 8 for clients
>>>
>>> Cheers, Andreas
>>> --
>>> Andreas Dilger
>>> Principal Lustre Architect
>>> Whamcloud
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> -- 
>> Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
>> Internet: a...@hpc2n.umu.se <mailto:a...@hpc2n.umu.se>   Phone: +46 90
>> 7866134 Fax: +46 90-580 14
>> Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
>> <http://www.hpc2n.umu.se/>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org <mailto:lustre-discuss@lists.lustre.org>
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
> 
> 
> 
> 
> 
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Slow release of inodes on OST

2020-02-07 Thread Åke Sandgren
Loocking at the osp.*.sync* values i see
osp.rsos-OST-osc-MDT.sync_changes=14174002
osp.rsos-OST-osc-MDT.sync_in_flight=0
osp.rsos-OST-osc-MDT.sync_in_progress=4096
osp.rsos-OST-osc-MDT.destroys_in_flight=14178098

And it takes 10 sec between changes of those values.

So is there any other tunable I can tweak on either OSS or MDS side?

On 2/6/20 6:58 AM, Andreas Dilger wrote:
> On Feb 4, 2020, at 07:23, Åke Sandgren  <mailto:ake.sandg...@hpc2n.umu.se>> wrote:
>>
>> When I create a large number of files on an OST and then remove them,
>> the used inode count on the OST decreases very slowly, it takes several
>> hours for it to go from 3M to the correct ~10k.
>>
>> (I'm running the io500 test suite)
>>
>> Is there something I can do to make it release them faster?
>> Right now it has gone from 3M to 1.5M in 6 hours, (lfs df -i).
> 
> It this the object count or the file count?  Are you possibly using a lot of
> stripes on the files being deleted that is multiplying the work needed?
> 
>> These are SSD based OST's in case it matters.
> 
> The MDS controls the destroy of the OST objects, so there is a rate
> limit, but ~700/s seems low to me, especially for SSD OSTs.
> 
> You could check "lctl get_param osp.*.sync*" on the MDS to see how
> many destroys are pending.  Also, increasing osp.*.max_rpcs_in_flight
> on the MDS might speed this up?  It should default to 32 per OST on
> the MDS vs. default 8 for clients
> 
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Lustre Architect
> Whamcloud
> 
> 
> 
> 
> 
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] LUSTRE - Installation on DEBIAN 10.x ????

2020-02-06 Thread Åke Sandgren
We're running 2.13.0 server on Ubuntu Bionic 18.04 with 4.15 kernel in
our test setup. Has been working without any real problems for a couple
of weeks. But it is a small test setup so...

For clients we have 2.10.7 from DDN on our production Ubuntu Xenial
systems (with HWE kernel so 4.15)

On 2/6/20 3:50 PM, Thomas Roth wrote:
> Hi Matthias,
> 
> last time we tried this (Debian Wheezy + Lustre 2.5) we had to use the
> Redhat kernel _in_ Debian.
> However, nowadays there is the patchless-ldiskfs-server, and of course
> the server based on ZFS, both of which would require only to compile the
> proper modules.
> My guess is that you have better chances with the newer Lustre versions...
> 
> Regards
> Thomas
> 
> 
> On 05.02.20 11:50, Matthias Krawutschke wrote:
>> Hello everybody,
>>
>>  
>> I am new here on this platform and I have a question about LUSTRE –
>> Installation on DEBIAN 10.x ….. 
>>
>> It is possible to install this on this LINUX – Version & if yes: which
>> version of LUSTRE does it work and how?
>>
>>  
>> Best regards….
>>
>>  
>>  
>>  
>> Matthias Krawutschke, Dipl. Inf.
>>
>>  
>> Universität Potsdam
>> ZIM - Zentrum für Informationstechnologie und Medienmanagement
>> Team High-Performance-Computing
>>
>> Campus Am Neuen Palais: Am Neuen Palais 10 | 14469 Potsdam
>> Tel: +49 331 977-, Fax: +49 331 977-1750
>>
>>  
>> Internet: 
>> 
>> https://www.uni-potsdam.de/de/zim/angebote-loesungen/hpc.html
>>
>>  
>>  
>>
>>
>> ___
>> lustre-discuss mailing list
>> lustre-discuss@lists.lustre.org
>> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>>
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Slow release of inodes on OST

2020-02-05 Thread Åke Sandgren


On 2/6/20 6:58 AM, Andreas Dilger wrote:
> On Feb 4, 2020, at 07:23, Åke Sandgren  <mailto:ake.sandg...@hpc2n.umu.se>> wrote:
>>
>> When I create a large number of files on an OST and then remove them,
>> the used inode count on the OST decreases very slowly, it takes several
>> hours for it to go from 3M to the correct ~10k.
>>
>> (I'm running the io500 test suite)
>>
>> Is there something I can do to make it release them faster?
>> Right now it has gone from 3M to 1.5M in 6 hours, (lfs df -i).
> 
> It this the object count or the file count?  Are you possibly using a lot of
> stripes on the files being deleted that is multiplying the work needed?
> 
>> These are SSD based OST's in case it matters.
> 
> The MDS controls the destroy of the OST objects, so there is a rate
> limit, but ~700/s seems low to me, especially for SSD OSTs.
> 
> You could check "lctl get_param osp.*.sync*" on the MDS to see how
> many destroys are pending.  Also, increasing osp.*.max_rpcs_in_flight
> on the MDS might speed this up?  It should default to 32 per OST on
> the MDS vs. default 8 for clients

Should have checked before sending first mail...
The default on the MDS is apparently 8. This is 2.13.0 and I did not
change any params.

lctl get_param osp.*.max_rpcs_in_flight
osp.rsos-OST-osc-MDT.max_rpcs_in_flight=8

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Slow release of inodes on OST

2020-02-05 Thread Åke Sandgren


On 2/6/20 6:58 AM, Andreas Dilger wrote:
> On Feb 4, 2020, at 07:23, Åke Sandgren  <mailto:ake.sandg...@hpc2n.umu.se>> wrote:
>>
>> When I create a large number of files on an OST and then remove them,
>> the used inode count on the OST decreases very slowly, it takes several
>> hours for it to go from 3M to the correct ~10k.
>>
>> (I'm running the io500 test suite)
>>
>> Is there something I can do to make it release them faster?
>> Right now it has gone from 3M to 1.5M in 6 hours, (lfs df -i).
> 
> It this the object count or the file count?  Are you possibly using a lot of
> stripes on the files being deleted that is multiplying the work needed?

I'm checking lfs df -i for of the OST value, no stripes single OST.

>> These are SSD based OST's in case it matters.
> 
> The MDS controls the destroy of the OST objects, so there is a rate
> limit, but ~700/s seems low to me, especially for SSD OSTs.
> 
> You could check "lctl get_param osp.*.sync*" on the MDS to see how
> many destroys are pending.  Also, increasing osp.*.max_rpcs_in_flight
> on the MDS might speed this up?  It should default to 32 per OST on
> the MDS vs. default 8 for clients

Thanks, that gives me something to play with. Didn't manage to find
anything about this in the manual, but on the other hand I didn't really
know what to search for...

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Slow release of inodes on OST

2020-02-04 Thread Åke Sandgren
Forgot to mention that I'm running 2.13.0 from git on the servers.

On 2/4/20 3:23 PM, Åke Sandgren wrote:
> Hi!
> 
> When I create a large number of files on an OST and then remove them,
> the used inode count on the OST decreases very slowly, it takes several
> hours for it to go from 3M to the correct ~10k.
> 
> (I'm running the io500 test suite)
> 
> Is there something I can do to make it release them faster?
> Right now it has gone from 3M to 1.5M in 6 hours, (lfs df -i).
> 
> These are SSD based OST's in case it matters.
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Slow mount on clients

2020-02-04 Thread Åke Sandgren
Which then of course means that if the MDS HA has done a failover it
will be slow to mount until it's back in its usual place...

Used to happen to us very often when the server side was somewhat unstable.

On 2/4/20 8:55 AM, Moreno Diego (ID SIS) wrote:
> Not sure if it's your case but the order of MGS' NIDs when mounting matters:
> 
> [root@my-ms-01xx-yy ~]# time mount -t lustre 
> 10.210.1.101@tcp:10.210.1.102@tcp:/fs2 /scratch
> 
> real0m0.215s
> user0m0.007s
> sys 0m0.059s
> 
> [root@my-ms-01xx-yy ~]# time mount -t lustre 
> 10.210.1.102@tcp:10.210.1.101@tcp:/fs2 /scratch
> 
> real0m25.196s
> user0m0.009s
> sys 0m0.033s
> 
> Since the MGS is running on the node having the IP "10.210.1.101", if we 
> first try with the other one there seems to be a timeout of 25s.
> 
> Diego
>  
> 
> On 03.02.20, 23:17, "lustre-discuss on behalf of Andrew Elwell" 
>  andrew.elw...@gmail.com> wrote:
> 
> Hi Folks,
> 
> One of our (recently built) 2.10.x filesystems is slow to mount on
> clients (~20 seconds) whereas the others are nigh on instantaneous.
> 
> We saw this before with a 2.7 filesystem that went away after doing
>  but we've no idea what.
> 
> Nothing obvious in the logs.
> 
> Does anyone have suggestions for what causes this, and how to make it
> faster? It's annoying me as "something" isn't right but I can't
> identify what.
> 
> 
> Many thanks
> 
> Andrew
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] rsync not appropriate for lustre

2020-01-23 Thread Åke Sandgren
I think this might be of some interest
https://github.com/hpc/mpifileutils

On 1/23/20 4:33 PM, Bernd Melchers wrote:
> Hi All,
> we are copying large data sets within our lustre filesystem and between
> lustre and an external nfs server. In both cases the performance is
> unexpected low and the reason seems that rsync is reading and writing in
> 32 kB Blocks, whereas our lustre would be more happy with 4 MB
> Blocks.
> rsync has an --block-size=SIZE Parameter but this adjusts only the
> checksum block size (and the maximum is 131072), not the i/o block size.
> Is there a solution to accelerate rsync? 
> 
> Mit freundlichen Grüßen
> Bernd Melchers
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre server build on Ubuntu

2020-01-20 Thread Åke Sandgren
Hi!

I'm looking at building the server part on Ubuntu.
I can see that there are patches for ldiskfs on Ubuntu18 so someone has
clearly done some work here.

What are the pre-requisites for doing this?
Or even better, does anyone have a working recipy?

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] project quotas

2020-01-07 Thread Åke Sandgren
The manual has good examples. Chap. 25.2 Enabling Disk Quotas for instance.

On 1/7/20 9:07 PM, Peeples, Heath wrote:
> I am needing to set up project quotas, but am having a difficult time
> finding documentation/examples of how to do this.  Could someone point
> me to a good source of information for this?  Thanks.
> 
>  
> 
> Heath
> 
> 
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
> 

-- 
Ake Sandgren, HPC2N, Umea University, S-90187 Umea, Sweden
Internet: a...@hpc2n.umu.se   Phone: +46 90 7866134 Fax: +46 90-580 14
Mobile: +46 70 7716134 WWW: http://www.hpc2n.umu.se
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org