Re: [Lustre-discuss] clients gets EINTR from time to time

2011-03-04 Thread Francois Chassaing
Dear list,
still investigating on this issue, I am now struggling with debugging..
The issue arose once more yesterday, so I started to look at it deeper and 
decided that the trace debug should be written to disk using debug_daemon.
Alas, debugging with only the trace debug active spits more than 100 MB/s 
worth of log ! (yes these are busy clients)...
I've tried several strategies like using debug_kernel from a cron job, or while 
watching my products error log, but even there dk would dump 70MB worth of data 
representing less that one second of debug log...
So chances for me to trace the signal seems looow.
Is there any debug flag less verbose but that may include the signal I'm 
looking for ?

Given John's answers could I maybe use /proc/sys/lustre/dump_on_timeout to dump 
the log only when timeout happens, but this will work only if my problem is 
matching what John can reproduce.

Please also note that I've looked around for abnormal threads_started numbers, 
it is everywhere at the same value than threads_min, except for one mdt entry 
which is at thread_min+1... 


Regards

weboramalineFrançois Chassaing Directeur Technique - CTO 

- Mail Original -
De: John Hammond jhamm...@tacc.utexas.edu
À: Andreas Dilger adil...@whamcloud.com
Cc: lustre-discuss@lists.lustre.org
Envoyé: Vendredi 25 Février 2011 21h16:36 GMT +01:00 Amsterdam / Berlin / Berne 
/ Rome / Stockholm / Vienne
Objet: Re: [Lustre-discuss] clients gets EINTR from time to time

On 02/25/2011 11:39 AM, Andreas Dilger wrote:
 On 2011-02-25, at 6:28, Brian J. Murrell br...@whamcloud.com wrote:
 On 11-02-25 06:18 AM, Francois  wrote:

 I continue to parse debug logs and keep them posted.

 I don't understand why you don't just fix your application to handle a
 perfectly valid and expected condition (that it's currently not
 handling) instead of wasting time trying to find the cause of the
 expected condition.  Even if you find it, it's likely not a bug and not
 something that can/will be fixed.  It's your application that needs to
 be fixed.
 
 In all fairness Brian, it isn't always possible to fix an application like 
 you suggest. It might be commercial (binary only), it might be complex code 
 using 3rd party libraries to do the IO that would lose support if modifed, 
 etc. 
 
 I think the first action to debug this is to run on the client with lctl 
 set_param debug=+trace or =~0 which will enable function entry/exit 
 tracing in Lustre. Then when the problem us hit run lctl dk /tmp/debug to 
 dump the Lustre debug log, and search for -4 (which is -EINTR) to see where 
 this error is first appearing. 
 
 At that point we can make a determination where the source of the error is, 
 and if it is Lustre's fault. I know at one time there was a related problem 
 in the l_wait_event() macro that was improperly masking signals, but I 
 thought it was fixed by 1.8.5. 

Setting aside the moral question of which calls should be interruptible,
I think that the handling of the LUSTRE_FATAL_SIGS (defined in
lustre_lib.h to be SIGKILL, SIGINT, SIGTERM, SIGQUIT, SIGALRM) is
slightly broken.  Under certain situations, Lustre will return -EINTR
although no signals were delivered.  That's probably not the end of the
world for most applications, but OTOH I don't think anybody assumes that
-EINTR will be delivered spuriously.

Consider the following sequence:

1) Process P has a Lustre file F open.

2) P has SIGALRM pending (but blocked).

3) P starts to writing to F and ends up sleeping in (something like):

  sys_write()
   ...
ll_extent_lock()
 ...
  osc_enqueue()
   ...
ptlrpc_queue_wait().

4) The OST does not respond to the request before the deadline, so
l_wait_event() replaces the signal mask of P with the LUSTRE_FATAL_SIGS,
notices that SIGALRM is now deliverable, restores the signal mask of P,
and ptlrpc_queue_wait() returns -EINTR.

5) P is exiting from sys_write(), SIGALRM is blocked (but still pending)
so it doesn't get delivered.

6) P spuriously returns -EINTR from sys_write().

I can reproduce this on 1.8.5/RHEL 5.5.  If the goal is to emulate NFS's
interruptibility during congestion then returning -ERESTARTSYS would be
more appropriate.  Also, it might be worthwhile to make this extra
interruptibility a mount flag, as NFS does.

Best,

John

-- 
John L. Hammond, Ph.D.
TACC, The University of Texas at Austin
jhamm...@tacc.utexas.edu
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST problem

2011-03-04 Thread Lucius
Hi Larry,

thank you for your answer, but I do not have the chance to use infiniband.
This description also starts with formatting the fs. I don't want to format 
the already in user node. I would like to extend it online.
Is it possible to achieve full sync of the data of the two nodes (the one 
existing already, and the second is the new that is on the new server about 
to be attached) The old node has 50% of its capacity uploaded, the new node 
is completely empty.

So the question is, how do I add a failnode to an online system and how do I 
manage to get the data to be in synchron.
Hope someone can help

thank you,
Lucius

- Eredeti üzenet - 
From: Larry
Sent: Tuesday, March 01, 2011 6:25 AM
To: Lucius
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [Lustre-discuss] OST problem

Hi Lucius,
lustre manual  chapter 15 tells you how to do it


On Tue, Mar 1, 2011 at 1:05 PM, Lucius lucius...@hotmail.com wrote:
 Hello everyone,

 I would like to extend a OSS, which is still in current use. I would like 
 to
 extend it with a server which has exactly the same HW configuration, and I’d
 like to extend it in an active/active mode.
 I couldn’t find any documentation about this, as most of the examples show
 how to use failnode during formatting. However, I need to extend the
 currently working system without losing data.
 Also, tunefs.lustre examples show only the parameter configuration, but 
 they
 won’t tell if you need to synchronize the file system before setting the 
 How
 would the system know that on the given server identified by its unique 
 IP,
 which OST mirrors should run?

 Thank you in advance,
 Viktor
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre 2.1 Release

2011-03-04 Thread Mag Gam
This is really great and thanks for keeping this open! For aspiring
software engineers at my school it would be valuable to dial-in the
calls to hear professional speak.



On Fri, Feb 25, 2011 at 8:18 AM, Diego Moreno
diego.moreno-laz...@bull.net wrote:
 Hi Peter,

 That's great news! It's really interesting to know about the Lustre 2.1
 release with all the community involved in it.

 I imagine it's still too soon but is there any roadmap or any date for
 the end of development and testing?

 Is there any features list or it's still to be defined in the mailing list?

 Regards,

 Diego

 On 24/02/2011 23:23, Peter Jones wrote:


 Hi there

 There has been much discussion within the Lustre community about the future 
 of the Lustre 2.x codeline with the following outcome.

   Roles
   -I have taken on the role of Release Manager for the Lustre 2.1 release 
 and Oleg Drokin (gr...@whamcloud.com) will be the Technical Lead for this 
 release.

   Issue Tracking
   -Issues relating to this release will be tracked in Whamcloud's JIRA 
 system -http://jira.whamcloud.com  . Signup is open and free.
   -To see the present list of blockers, please use the filter Lustre 2.1 
 Blockers. This can be conveniently accessed by selecting Manage Filters and 
 then Popular.

   Source Control
   -The code for the release will be made from Whamcloud's git instance 
 -http://git.whamcloud.com/
   -Patches contributed by engineers from third party organizations will be 
 according to arrangement similar to the kernel 
 (seehttp://wiki.whamcloud.com/display/PUB/Submitting+Changes
 for details). The outcome will be that no single organization will own the 
 copyright to this release

   Testing
   -The latest build can be downloaded from thehttp://build.whamcloud.com/
   -Testing results from both Whamcloud and third party organizations will be 
 stored in Maloo, the Whamcloud test database -http://maloo.whamcloud.com  
 http://maloo.whamcloud.com/. 
 Seehttp://wiki.whamcloud.com/display/PUB/Using+Maloo  for details on how to 
 use Maloo either to view progress or to upload your own testing results.

   Weekly Call
   -A weekly status call will take place Tuesday at 1:30pm PT. This call is 
 open to any interested parties. 866-914-3976 534986#

 This was considered the most expedient plan for the Lustre 2.1 release, but 
 a different approach may be taken for ongoing Lustre 2.x releases. This is 
 still under consideration within the Lustre community.

 The Lustre community organizations - EOFS, HPCFS, and OpenSFS - have all 
 expressed support for these plans and we look forward to collaborating with 
 the community for this release.

 A Lustre 2.1 Google group has been setup as a forum to discuss this release 
 -http://groups.google.com/group/lustre-21. Please feel free to signup for 
 this mailing list whether you are interested in collaborating in this 
 release or just observing the progress.

   Regards

   Peter


 NB\ Lustre is a trademark of the Oracle Corporation

 --
 Peter Jones
 Whamcloud, Inc.
 www.whamcloud.com



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

 --
 Diego Moreno
 http://www.bull-world.com/
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Update of PDSI filesystem stats data

2011-03-04 Thread Mag Gam
i hope one of the new features you are going to implement is SNS :-)



On Wed, Feb 23, 2011 at 8:37 PM, Andreas Dilger adil...@whamcloud.com wrote:
 When looking at how to implement features for Lustre (which I'm doing a lot 
 of recently :-) I somtimes consult the PDSI filesystem statistics data at 
 http://www.pdsi-scidac.org/fsstats/ in order to see how these large 
 filesystems are used in real life.  Information like the length of filenames, 
 how many files have hard links, the age of files in the filesystem, etc are 
 useful in deciding where to optimize the implementation.

 Unfortunately, the filesystem surveys there are starting to get a bit dated 
 (the most recent one is almost 3 years old, and the largest filesystems are 
 only ~300TB in size).

 I want to solicit the Lustre user community to contribute some updated 
 statistics, and have confirmed with Garth Gibson (leader of the PDSI 
 workshops and maintainer of that site) that it is still worthwhile to send 
 updated statistics using the 
 http://www.pdsi-scidac.org/fsstats/questionnaire.html form.

 Garth will look at getting some grad students to compile the submitted data, 
 and is particularly interested if anyone has updated data for any filesystem 
 they previously submitted (PNNL, PSC).  In the meantime, I would also 
 appreciate an email with the results as well.

 Thanks in advance for any contributions.

 Cheers, Andreas
 --
 Andreas Dilger
 Principal Engineer
 Whamcloud, Inc.



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST problem

2011-03-04 Thread Wojciech Turek
Hi Lucius,

I am not exactly sure what are you trying to do here. Do you have two OSS
servers, first old one currently in production (OST 50% full), second a new
one that you want to attach to your Lustre filesystem with (OST empty)?
Do you want to create a mirror of old OST on the new OST?

Adding a new failnode in Lustre is simple and can be done while filesystem
is running. For that you use tunefs.lustre or lctl API, examples are in the
Lustre manual.
If you would like to change current failonde configuration, for example
change IP address or network type, you will need to stop Lustre filesystem
and amend the configuration while lustre is stopped, please see examples in
manual.

Cheers

Wojciech

On 4 March 2011 15:16, Lucius lucius...@hotmail.com wrote:

 Hi Larry,

 thank you for your answer, but I do not have the chance to use infiniband.
 This description also starts with formatting the fs. I don't want to format
 the already in user node. I would like to extend it online.
 Is it possible to achieve full sync of the data of the two nodes (the one
 existing already, and the second is the new that is on the new server about
 to be attached) The old node has 50% of its capacity uploaded, the new node
 is completely empty.

 So the question is, how do I add a failnode to an online system and how do
 I
 manage to get the data to be in synchron.
 Hope someone can help

 thank you,
 Lucius

 - Eredeti üzenet -
 From: Larry
 Sent: Tuesday, March 01, 2011 6:25 AM
 To: Lucius
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] OST problem

 Hi Lucius,
 lustre manual  chapter 15 tells you how to do it


 On Tue, Mar 1, 2011 at 1:05 PM, Lucius lucius...@hotmail.com wrote:
  Hello everyone,
 
  I would like to extend a OSS, which is still in current use. I would like
  to
  extend it with a server which has exactly the same HW configuration, and
 I’d
  like to extend it in an active/active mode.
  I couldn’t find any documentation about this, as most of the examples
 show
  how to use failnode during formatting. However, I need to extend the
  currently working system without losing data.
  Also, tunefs.lustre examples show only the parameter configuration, but
  they
  won’t tell if you need to synchronize the file system before setting the
  How
  would the system know that on the given server identified by its unique
  IP,
  which OST mirrors should run?
 
  Thank you in advance,
  Viktor
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mixed oss/ost performance question

2011-03-04 Thread Andreas Dilger
On 2011-03-04, at 3:14 PM, Samuel Aparicio wrote:
 I have a general question about mixing of osts with slower or faster backing 
 storage.  we have a fair number of slower legacy disk pools and a bunch of 
 newer faster ones.  the fast and slow are aggregated separately to provide 
 OST storage targets with a uniform speed characteristic (slow or fast),
 
 my question is whether it would be better to make two separate filesystems 
 (say lustre1 and lustre2) with the slow and fast OSTs respectively,
 or is it reasonable to have these all under one filesystem.

It depends today on the sophistication of your users.  It is possible to split 
different storage classes with OST pools (see the commands lctl pool_add and
lfs setstripe -p), but these are optional separations today.  If users 
don't specify any pool then the default is to use all OSTs (mixing fast and 
slow storage).  You CAN specify default pools on a per-directory basis, but 
this only applies to newly-created files.

Cheers, Andreas
--
Andreas Dilger 
Principal Engineer
Whamcloud, Inc.



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST problem

2011-03-04 Thread Cliff White
For clarity, Lustre does not replicate data. If you add an OST, it is
unique.
If you wish to do failover, this requires shared storage between two nodes.
We do not replicate storage.

If you wish to increase the size of your filesystem, you can add OSTs.
cliffw


On Fri, Mar 4, 2011 at 7:16 AM, Lucius lucius...@hotmail.com wrote:

 Hi Larry,

 thank you for your answer, but I do not have the chance to use infiniband.
 This description also starts with formatting the fs. I don't want to format
 the already in user node. I would like to extend it online.
 Is it possible to achieve full sync of the data of the two nodes (the one
 existing already, and the second is the new that is on the new server about
 to be attached) The old node has 50% of its capacity uploaded, the new node
 is completely empty.

 So the question is, how do I add a failnode to an online system and how do
 I
 manage to get the data to be in synchron.
 Hope someone can help

 thank you,
 Lucius

 - Eredeti üzenet -
 From: Larry
 Sent: Tuesday, March 01, 2011 6:25 AM
 To: Lucius
 Cc: lustre-discuss@lists.lustre.org
 Subject: Re: [Lustre-discuss] OST problem

 Hi Lucius,
 lustre manual  chapter 15 tells you how to do it


 On Tue, Mar 1, 2011 at 1:05 PM, Lucius lucius...@hotmail.com wrote:
  Hello everyone,
 
  I would like to extend a OSS, which is still in current use. I would like
  to
  extend it with a server which has exactly the same HW configuration, and
 I’d
  like to extend it in an active/active mode.
  I couldn’t find any documentation about this, as most of the examples
 show
  how to use failnode during formatting. However, I need to extend the
  currently working system without losing data.
  Also, tunefs.lustre examples show only the parameter configuration, but
  they
  won’t tell if you need to synchronize the file system before setting the
  How
  would the system know that on the given server identified by its unique
  IP,
  which OST mirrors should run?
 
  Thank you in advance,
  Viktor
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss




-- 
cliffw
Support Guy
WhamCloud, Inc.
www.whamcloud.com
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss