Hi all,
anybody knows what ptlrpcd is good for, what it might be doing? I've
seen it eating 100% CPU on OSS where I reformated an OST, but also in
other circumstances.
Regards,
Thomas
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
Hi all,
after installing a Lustre test file system consisting of 34 OSTs, I
encountered a strange error when trying to set up quotas:
lfs quotacheck gave me an Input/Output error, while in
/var/log/kern.log I found a Lustre error
LustreError: 20807:0:(quota_check.c:227:lov_quota_check()) lov
Thanks for your advice!
Indeed, it was the lack of NIS on the MDS that caused that error.
Cheers,
Thomas
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
--
Thomas Roth
Department: Informationstechnologie
Location: SB3
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
Gesellschaft für Schwerionenforschung mbH
Planckstraße 1
D-64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der
Hi all,
on our test cluster I observe a rather large difference between df and
lfs df:
lfs df /lustre
UUID 1K-blocksUsed
AvailableUse% Mounted on
...
filesystem summary: 137454163496 7056126648
Hi all,
I'm still in trouble with numbers: the available, used and necessary
space on my MDT:
According to lfs df, I have now filled my file system with 115.3 TB.
All of these files are sized 5 MB. That should be roughly 24 million files.
For the MDT, lfs df reports 28.2 GB used.
Now I believed
Hi all,
on a Lustre FS v1.6.5 with Debian Etch kernel 2.6.22 that is 94% full, I
can't create any more files.
Each OST has still room for ~ 2GB:
# lfs df
UUID 1K-blocks Used Available Use% Mounted on
MDT_UUID 495497804669376 4665119240% /lustre[MDT:0]
Oh, that was one of the (many) things I skipped when reading the manual.
Seems to have done the trick, thanks a lot.
Thomas
Guy Coates wrote:
Thomas Roth wrote:
Hi all,
on a Lustre FS v1.6.5 with Debian Etch kernel 2.6.22 that is 94% full, I
can't create any more files.
Each OST has still
Hi all,
I've experienced reproducible OSS crashes with 1.6.5 but also
1.6.4.3/1.6.4.2. The cluster is running Debian Etch64, kernel 2.6.22.
The OSS are file servers with two OSTs.
I'm now testing it by just using one OSS in the system (but encountered
the problem first with 9 OSS), mounting
Hi,
Brian J. Murrell wrote:
On Wed, 2008-07-23 at 13:56 +0200, Thomas Roth wrote:
Hi all,
Hi,
I've experienced reproducible OSS crashes with 1.6.5 but also
1.6.4.3/1.6.4.2. The cluster is running Debian Etch64, kernel 2.6.22.
The OSS are file servers with two OSTs.
I'm now testing
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
Gesellschaft für Schwerionenforschung mbH
Planckstraße 1
D-64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Hi all,
I've encountered a LustreError that might have triggered an unwanted
failover of a MGS/MGD -HA-pair of servers. I'm not sure about the
latter, but at least I have not found a trace of that error via Google,
so it might be worth considering.
And it occurred in this form only the two
Hi all,
I'm still successful in bringing my OSSs to a standstill if not crashing
them.
Having reduced the number of stress jobs writing to Lustre (stress -d 2
--hdd-noclean --hdd-bytes 5M) to four, and having reduced the number of
OSS threads (options ost oss_num_threads=256 in
Hi all,
for some time already I keep getting error messages on my MGS/MDS about
operations on unconnected MGS or MDS.
Aug 1 18:08:53 kernel: LustreError:
14182:0:(mgs_handler.c:538:mgs_handle()) lustre_mgs: operation 400 on
unconnected MGS
Aug 1 18:09:19 kernel: LustreError:
Hi all,
since our users have managed to write several TBs to Lustre by now, they
sometimes would like to know what and how much there is in their
directories. Is there any smarter way to find out than to do a du -hs
dirname and wait for 30min for the 12TB-answer ?
I've already told them to
Hi,
read your instructions - that's pretty much the setup we are using, too.
And it works very well, drbd 0.8 non-withstanding, but on a hardware raid.
I do not quite understand your remark about not using an extra net for
drbd - have you tried putting the name that's in your drbd.conf together
Hi all,
I just ran into a LBUG on an MDS still running Lustre Version 1.6.3 with
kernel 2.6.18, Debian Etch.
kern.log c.f. below. You will probably tell me that is a known BUG
already fixed/ to be fixed (I'm unsure how to search for such a thing in
bugzilla).
But my main question concerns the
Hi all,
on an empty and unused Lustre 1.6.5.1 system I cannot reset or set the
quota:
~# lfs quota -u troth /lustre
Disk quotas for user troth:
Filesystem kbytes quota limit grace files quota
limit grace
/lustre 4 3072000 309200 1 11000
Brian J. Murrell wrote:
On Thu, 2008-11-27 at 19:18 +0100, Thomas Roth wrote:
Nov 27 17:57:41 lustre kernel: LustreError:
3974:0:(ldlm_request.c:64:ldlm_expired_completion_wait()) ### lock timed
out (enqueued
at 1227804060, 1001s ago); not entering recovery in server code, just
going
with alt-sysrq-t provided
sysctl variable kerne.sysrq equals 1)?
Andrew.
On Friday 28 November 2008 17:50:51 Thomas Roth wrote:
Hi all,
on an empty and unused Lustre 1.6.5.1 system I cannot reset or set the
quota:
~# lfs quota -u troth /lustre
Disk quotas for user troth
or
mds_quota_recovery as suggested by Andrew yet found nothing.
So, what should I do? Unfortunately the system is already in use by
other people, so just starting fresh with a global mkfs.lustre is not
an option ;-)
Regards,
Thomas
Thomas Roth wrote:
Hi,
I'm still having these problems
Hi all,
we somehow lost our quota info - a check of the quotas of a user gave me
user quotas are not enabled
This system, running Lustre 1.6.5.1, was set up end of October, by which
time I had also enabled quotas there. Of course I had also run some
tests then which showed that quotas were
Hi all,
in a cluster with 375 clients, for a 12 hour period I get about 500
messages of the type
Connection to service MGS via nid a.b@tcp was lost; in progress
operations using this service will fail.
and about 800 messages of the type
Connection to service MDT via nid
.
Regards,
Thomas
cheers
Wojciech
Thomas Roth wrote:
Hi all,
in a cluster with 375 clients, for a 12 hour period I get about 500
messages of the type
Connection to service MGS via nid a.b@tcp was lost; in progress
operations using this service will fail.
and about 800
Hi all,
I've just upgraded a 1 MDT - 2 OST - 2 Clients - test cluster from
Lustre version 1.6.5.1 to 1.6.6
However, I did not follow the manual (
http://manual.lustre.org/manual/LustreManual16_HTML/UpgradingLustre.html#50548855_pgfId-1289726):
I did not use the tunefs.lustre command on MGS/MDT
Hi Wojciech,
thanks for your explanation of writeconf.
Of course, the man pages as well as the Lustre manual state that
writeconf is a potentially dangerous operation.
I'm always afraid to manipulate the MDT: might go wrong and I end up
with 100's of TB of junk (as a restore of backups never
Hi all,
on our production cluster we have for a surprisingly long time ( 1 day)
only the following two error messages (and no visible problems),
although the system is under heavy load right now:
Jan 14 10:44:33 server1 kernel: LustreError:
5118:0:(ldlm_lib.c:1536:target_send_reply_msg()) @@@
Lustre talks about an unconnected MGS.
Thomas
Cliff White wrote:
Thomas Roth wrote:
Hi all,
on our production cluster we have for a surprisingly long time ( 1 day)
only the following two error messages (and no visible problems),
although the system is under heavy load right now:
Jan 14 10
things that might cause operation 101 problems.
101 = LDLM_ENQUEUE, so this is just a lock enqueue.
Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
--
Thomas Roth
/listinfo/lustre-discuss
--
Thomas Roth
Gesellschaft für Schwerionenforschung
Planckstr. 1- 64291 Darmstadt, Germany
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax
Hi all,
from some prior mail here I took it that you can have only one client
non-root-squashed.
Now in the manual ,
http://manual.lustre.org/manual/LustreManual16_HTML/LustreSecurity.html#50548870_pgfId-1292749
there is talk of multiple NIDs to be used here:
lctl conf_param
Hi all,
we have a problem with our production system (v. 1.6.5.1). It is in
recovery, but recovery never finishes.
The background are some unknown problems with the MDT, attempts to
restart the MDS etc. The MDT would start recovery, at some point during
recovery lose connection to its OSTs,
entries on my MGS/MDT server, mgs, mgc mdt lov
mds. The correct device name for the lctl command is the one after mds.
Regards,
Thomas
Thomas Roth wrote:
Hi all,
we have a problem with our production system (v. 1.6.5.1). It is in
recovery, but recovery never finishes.
The background are some
Hi all,
just to repeat my question without further surrounding facts and doubts;
How can I find out which client is currently being recovered? (If the
MDS is in recovery at that moment)
How to find out which client is not recoverable (if recovery gets stuck)?
The MDS seems to know, because
Hi all,
after running for days without any problems, our MDS is refusing
cooperation for two hours now.
The log files show nothing until
Mar 5 16:46:24 mds1 kernel: Lustre:
17841:0:(ldlm_lib.c:525:target_handle_reconnect()) MDT: 481fa70b-590d
-31b6-f621-c6125a54bfff reconnecting
Mar 5
Hi all,
after the problem with our MDS I reported earlier this evening, I did indeed a
complete restart of
the server, so afterwards the MDS was in recovery.
There were 386 clients to be recovered. After 100 min, only one of these was
left, but obviously it
never came back. So I aborted
Thanks Brian.
Brian J. Murrell wrote:
On Thu, 2009-03-05 at 22:19 +0100, Thomas Roth wrote:
My question: what happens to the one client that was not recovered?
It, and all of the clients that have transactions that need to be
replayed after the AWOL client's transactions are all evicted
Brian J. Murrell wrote:
On Fri, 2009-03-06 at 10:45 +0100, Thomas Roth wrote:
Thanks Brian.
NP.
What I meant: the average batch job that wants to read from or write to
Lustre will abort if a file cannot be accessed. The reason doesn't
matter to the jobs or the user.
That may be so
Brian J. Murrell wrote:
On Fri, 2009-03-06 at 20:09 +0100, Thomas Roth wrote:
But this is not what our users observe. Even on an otherwise perfectly
working system, they report I/O errors on access to some files.
EIO == eviction.
I can usually see something happening in the logs of OST
Hi all,
we are suffering from an increasing unusability of our cluster due to
refused connections, with typical log entries on the MDS:
ldlm_lib.ctarget_handle_connect lustre-MDT: refuse reconnection from
77cbd453-ee72-fe75-cb06-c49179e0a...@lustre-client@tcp to
0x810111341000; still
@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für
Hi all,
in a recent shutdown of our Lustre cluster (net reconfig, Version
upgrade to 1.6.7_patched), I decided to try to switch on quotas - this
had failed when the cluster went operational last year.
Again, I suffered from the same error as last year - failure, and
device/resource busy. This
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49
Hi all,
since the upgrade to version 1.6.7._patched, our MDT prints huge amounts of:
0100:0400:4:1240598372.080847:0:4842:0:(service.c:753:ptlrpc_at_send_early_reply())
@@@ Couldn't add any time (42/30), not sending early reply
r...@8107faf6f450 x18776/t0
Hi all,
after several attempts to enable quotas on our 1.6.7.1-cluster - alway
blows up the MDS, I've complained earlier on this list - the MDT now
keeps sending these messages
lustre kernel: LustreError:
3825:0:(ldlm_lib.c:1840:target_handle_dqacq_callback()) dqacq failed!
(rc:-5)
Hi all,
our MDT gets stuck and unresponsive with very high loads (Lustre
1.6.7.1, Kernel 2.6.22, 8 Core, 32GB RAM). The only thing calling
attention is one ll_mt_?? process running with 100% cpu. Nothing unusual
happening on the cluster before that.
After reboot as well as after moving the
and
possible do a writeconf everywhere.
I see that a similar problem was reported by Mag in March this year, but
no clues or solutions appeared.
Any ideas?
Yours,
Thomas
--
Thomas Roth
Department: Informationstechnologie
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Â Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
D-64291 Darmstadt
www.gsi.de
Hi all,
I want to move a MDT from one server to another. After studying some
mails concerning MDT backup, I've just tried (successfully, it seems) to
do that on a small test system with rsync:
- Stop Lustre, umount all servers.
- Format a suitable disk partition on the new hardware, using the
Andreas Dilger wrote:
On Jul 15, 2009 18:35 +0200, Thomas Roth wrote:
...
The traditional backup method of getting the EAs and tar-ing the MDT
doesn't finish in finite time. It did before, and the filesystem has
since grown by a mere 40GB of data, so it shouldn't take that much
longer
Hi all,
recently, many of our clients have been reporting errors of the type:
Jul 21 14:40:41 kernel: LustreError:
26871:0:(file.c:3024:ll_inode_revalidate_fini()) failure -2 inode 72692692
-2 is no such file or directory
Am I right that this means some inode info got lost on the MDT?
Also,
' on the MDT as
any practical implication?
--
Thomas Roth IT-Department
Location: SB3 1.262 Phone: +49-6159-71 1453
'We apologise for the inconvenience.'
___
Lustre-discuss
Hi Andreas,
Andreas Dilger wrote:
On Jul 27, 2009 14:24 +0200, Thomas Roth wrote:
I'm copying around data between 2 MDTs in a test system. Having mounted
the partitione as 'ldiskfs', I had a look in MDT/ROOT. I found all my
test data there, but I'm puzzled by the indicated file sizes
Hi,
these ll_inode_revalidate_fini errors are unfortunately quite known to us.
So what would you guess if that happens again and again, on a number of
clients - MDT softly dying away?
Because we haven't seen any mass evictions (and no reasons for that) in
connection with these errors.
Or could the
correlated with this fsck ;-)).
So I'm still not reassured concerning the health of this MDT.
We are running Lustre v 1.6.7.2 on the servers, the clients mainly still
on 1.6.5.1.
Regards,
Thomas
Oleg Drokin wrote:
Hello!
On Aug 6, 2009, at 12:57 PM, Thomas Roth wrote:
Hi
what's going on here?
TIA,
Thomas
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
D-64291
Hi all,
while trying to fix the recent kernel vulnerability (CVE-2009-2692) we
found that in most cases, our Lustre 1.6.5.1, 1.6.6 and 1.6.7.2 clients
seemed to be quite well protected, at least against the published
exploit: wunderbar_emporium seems to work, but then the root shell never
Peter Kjellstrom wrote:
On Friday 21 August 2009, Thomas Roth wrote:
Hi all,
while trying to fix the recent kernel vulnerability (CVE-2009-2692) we
found that in most cases, our Lustre 1.6.5.1, 1.6.6 and 1.6.7.2 clients
seemed to be quite well protected, at least against the published
Hi all,
on our 1.6.7.2 system, the MDT is quite busy writing the following type
of messages to the log, and I would just like to ask if somebody has an
idea what they mean and if they mean harm:
Sep 21 19:50:30 mds1 kernel: LustreError:
6009:0:(pack_generic.c:566:lustre_msg_buf_v2()) msg
Hi all,
in our 196 OST - Cluster, the previously perfect distribution of files
among the OSTs is not working anymore, since ~ 2 weeks.
The filling for most OSTs is between 57% and 62%, but some (~10) have
risen up to 94%. I'm trying to fix that by having these OSTs deactivated
on the MDT and
quite a number of lists
of user's data from running lfs find --obd OST... /lustre/..., I just
haven't run these lists
through a ls -lh yet. To busy moving the files instead of measuring them ;-)
Regards,
Thomas
Andreas Dilger wrote:
On 2009-10-30, at 12:07, Thomas Roth wrote:
in our 196 OST
levels of
the OSTs.
Regards,
Thomas
Andreas Dilger wrote:
On 2009-10-30, at 12:07, Thomas Roth wrote:
in our 196 OST - Cluster, the previously perfect distribution of files
among the OSTs is not working anymore, since ~ 2 weeks.
The filling for most OSTs is between 57% and 62%, but some (~10
Hi all,
in our 1.6.7.2 - Debian- Kernel 2.6.22 Cluster, 2 Servers with 2 and 3
OSTs have become somewhat blocking in the sense that commands like lfs
df will have to wait for ca. 30s when reaching these OSTs in the list.
Some of our clients do not have this problem, some have these contact(?)
: LustreError:
13531:0:(mds_lov.c:960:__mds_lov_synchronize()) gsilust-OST00b3_UUID
sync failed -2, deactivating
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI
--
Thomas Roth
Department: Informationstechnologie
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Handelsregister: Amtsgericht Darmstadt, HRB 1528
Geschäftsführung
Hi all,
I'm getting my feet wet in the infiniband lake and of course I run into
some problems.
It would seem I got the compilation part of sles11 kernel 2.6.27 +
Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the
infiniband fabric, and because ko2iblnd loads without any
on distinct networks.
lctl list_nids will show you the lustre nids of the node you're logged
into only.
lctl route_list will show you the lustre routers and the networks that
they bridge.
I hope this was helpful.
Erik
On Tue, Jun 22, 2010 at 10:19 AM, Thomas Roth t.r...@gsi.de wrote:
Hi
On 22.06.2010 16:19, Thomas Roth wrote:
Hi all,
I'm getting my feet wet in the infiniband lake and of course I run into
some problems.
It would seem I got the compilation part of sles11 kernel 2.6.27 +
Lustre 1.8.3 + ofed 1.4.2 right, because it allows me to see and use the
infiniband fabric
Hi all,
on a 1.8.4 test system, I tried prepare for lfsck and got an error from
e2fsck:
mds:~# e2fsck -n -v --mdsdb /tmp/mdsdb /dev/sdb2
e2fsck 1.41.10.sun2 (24-Feb-2010)
lustre-MDT lustre database creation, check forced.
Pass 1: Checking inodes, blocks, and sizes
MDS: ost_idx 0 max_id
Thanks, Daniel.
I have tried on another test system without pools, and there it worked
indeed.
Regards,
Thomas
On 09/10/2010 08:48 PM, Daniel Kobras wrote:
Hi Thomas!
On Fri, Sep 10, 2010 at 08:16:57PM +0200, Thomas Roth wrote:
on a 1.8.4 test system, I tried prepare for lfsck and got
Hi all,
I'm trying to understand MDT logs and adaptive timeouts. After upgrade
to 1.8.4 and while users believed Lustre to be still in maintenance (=
no activity), the MDT log just shows
Lustre: 19823:0:(service.c:808:ptlrpc_at_send_early_reply()) @@@
Couldn't add any time (42/30), not sending
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
?
Cheers,
Thomas
--
Thomas Roth
Department: IT/HPC
GSI Darmstadt
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
.
Sujet:
[Lustre-discuss] robinhood error messages
Expéditeur:
Thomas Roth t.r...@gsi.de
Date:
Tue, 23 Nov 2010 20:20:33 +0100
Destinataire:
lustre-discuss@lists.lustre.org
Destinataire:
lustre-discuss
On 24.11.2010 15:17, LEIBOVICI Thomas wrote:
Thomas Roth wrote:
ListMgr | DB query failed in ListMgr_Insert line 340...
and assorted messages, which seem to indicate that the new robinhood
scan tries to put something into the DB that is already there, and
stumbles on this. Or maybe
Hi all,
we have gotten new MDS hardware, and I've got two questions:
What are the recommendations for the RAID configuration and formatting
options?
I was following the recent discussion about these aspects on an OST:
chunk size, strip size, stride-size, stripe-width etc. in the light of
the
Hi all,
I have run llverfs (lustre-utils 1.8.4) on an OST partition as llverfs
-w -v /srv/OST0002.
That went smoothly until all 9759209724 kB were written, terminating with:
write File name: /srv/OST0002/dir00072/file022
write complete
llverfs: writing /srv/OST0002/llverfs.filecount failed :No
partitions..
Cheers,
Thomas
On 27.01.2011 20:06, Andreas Dilger wrote:
On 2011-01-27, at 04:56, Thomas Roth wrote:
I have run llverfs (lustre-utils 1.8.4) on an OST partition as llverfs
-w -v /srv/OST0002.
That went smoothly until all 9759209724 kB were written, terminating
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Darmstadt
Hi all,
we are suffering from a sever metadata performance degradation on our 1.8.4
cluster and are pretty clueless.
- We moved the MDT to a new hardware, since the old one was failing
- We increased the size of the MDT with 'resize2fs' (+ mounted it and saw all
the files)
- We found the
change
when you switched hardware?
The 'still busy' message is a bug, may be fixed in 1.8.5
cliffw
On Sat, Apr 2, 2011 at 1:01 AM, Thomas Roth t.r...@gsi.de
mailto:t.r...@gsi.de wrote:
Hi all,
we are suffering from a sever metadata performance degradation on our
1.8.4
in the private
sector. They expect results. -Ray Ghostbusters
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung
! You've never been out of college!
You don't know what it's like out there! I've worked in the private
sector. They expect results. -Ray Ghostbusters
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Hi all,
a recent posting here (which I can't find atm) has pointed me to
http://jira.whamcloud.com/browse/LU-15, where an issue is discussed that
we seem to see as well: some OSS really get overloaded, and the log says
slow journal start 36s due to heavy IO load
slow commitrw commit 36s due to
Hi all,
I'd like to mount two Lustre filesystems on one client. Issues with more than
one MGS set aside,
the point here is that one of them is an Infiniband-cluster, the other is
ethernet-based.
And my client is on the ethernet.
I have managed to mount the o2ib-fs by setting up an LNET router,
@tcp
Cheers,
Thomas
On 06/14/2011 07:00 PM, Michael Shuey wrote:
Is your ethernet FS in tcp1, or tcp0? Your config bits indicate the
client is in tcp1 - do the servers agree?
--
Mike Shuey
On Tue, Jun 14, 2011 at 12:23 PM, Thomas Roth t.r...@gsi.de wrote:
Hi all,
I'd like
every 300 seconds (and
re-enables it if found).
Hope this helps.
--
Mike Shuey
On Tue, Jun 14, 2011 at 1:26 PM, Thomas Roth t.r...@gsi.de wrote:
Hm, the ethernet FS is in tcp0 - MGS says its nids are MGS-IP@tcp.
So not surprising it refuses that connection.
On the other hand
=statement seems to say: If you have data for tcp, use the
Default-Router-IP and go
via the interace that is on network tcp1.
Oh well, I should probably take some networking lectures...
Regards,
Thomas
On 06/14/2011 06:23 PM, Thomas Roth wrote:
Hi all,
I'd like to mount two Lustre filesystems
Hi all,
I am currently moving off files of a number of OSTs - some in a machine with a
predicted hardware
failure, some for decommissioning old hardware etc. I'm deactivating the OSTs
on the MDS, then lfs
find --obd OST_UUID /dir to create a list of file to migrate.
When finished, the
Your clients have both ib and tcp nids? Because I encountered a strange
behavior trying to mount an ib
based FS and a tcp based FS on the same (ethernet-only) client. To connect to
the ib MDS it had to go
through a lnet router, of course.
Experimentally, I found
options lnet
://lists.lustre.org/mailman/listinfo/lustre-discuss
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
?
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
--
Thomas Roth
Department: Informationstechnologie
Hi Marinho,
no problem for Lustre 1.8. All the necessary packages are here:
http://pkg-lustre.alioth.debian.org/backports/lustre-1.8.7-wc1-squeeze/
We've been running Lustre on Debian since version 1.5.9 (aka beta for
1.6, on Sarge! ;-)). Now we are at 3.5 PB, on 200+ servers. No
Debian-specific
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
--
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.262
Phone: +49-6159-71 1453 Fax
: 831-656-6238
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
--
Thomas Roth
Department
8 -419 0
(The last line, the only peer that is "up", is an LNET-router)
Something to worry about?
Cheers,
Thomas
--
----
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.250
Phone: +49-6159-71 1453 Fax: +
shark et al.?
Cheers,
Thomas
--
--------
Thomas Roth
Department: Informationstechnologie
Location: SB3 1.250
Phone: +49-6159-71 1453 Fax: +49-6159-71 2986
GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1
64291 Darmstadt
www.gsi.de
Gesellschaft mit beschränkter Haftung
Si
I think I have seen this behavior before, and the "df" result shrank to an expected value after the
server had been rebooted. In that case, this seems more like a too persistent caching effect -?
Cheers,
Thomas
--
---
-Unterstützung (mittels 'tune2fs
-O ^quota' und anschließendem 'tunefs.lustre --quota') auf dem MDT konnten wir
es wieder reparieren.
Vielleicht hilft das bei Euch auch...
On Tue, 12 Jul 2016, Thomas Roth wrote:
Hi all,
we are running Lustre 2.5.3 on our servers. OSTs are on ZFS, MDS is on ldiskfs
1 - 100 of 206 matches
Mail list logo