Re: [Lustre-discuss] Landing and tracking tools improvements

2011-05-24 Thread sebastien . buisson
Hi,

I found useful in Bugzilla system to be able to 'reply' to a previous 
comment, with email quoting style pre-filled text box.

I also miss the ability to really simply generate a link to a specific bug
, or even a comment in a bug, when writing a comment. For instance, when I 
was writing in Bugzilla 'as explained in bug 1 comment 3', my comment 
was processed and 'bug 1 comment 3' was turned into a clickable link. 
In order to be usable this would require the comments to be numbered.

Apart from that, I like the possibility to edit or delete my own comments.
Keep up the good work!

Cheers,
Sebastien.

lustre-discuss-boun...@lists.lustre.org a écrit sur 23/05/2011 19:06:40 :

 De : Chris Gearing ch...@whamcloud.com
 A : lustre-discuss@lists.lustre.org
 Date : 23/05/2011 19:06
 Objet : [Lustre-discuss] Landing and tracking tools improvements
 Envoyé par : lustre-discuss-boun...@lists.lustre.org
 
 We now have a whole kit of tools [Jira, Gerrit, Jenkins and Maloo] used 
 for tracking, reviewing and testing of code that are being used for the 
 development of Lustre. A lot of time has been spent integrating and 
 connecting them appropriately but as with anything the key is to 
 continuously look for ways to improve what we have and how it works.
 
 So my question is what do people think of the tools as they stand today 
 and how can we improve them moving forwards. if people can respond to 
 lustre-discuss then I'll correlate the outcome of any discussions and 
 then create a Wiki page that can form some plan for improvement.
 
 Please be as descriptive as possible in your replies and take into 
 account that I and others have no experience of Lustre past so if you 
 liked something prior to the current tools you'll need to help me and 
 them understand the details.
 
 Thanks
 
 Chris
 
 ---
 Chris Gearing
 Snr Engineer
 Whamcloud. Inc.
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] [Lustre-community] Poor multithreaded I/O performance

2011-05-24 Thread Kevin Van Maren
[Moved to Lustre-discuss]


However, if I spawn 8 threads such that all of them write to the same 
file (non-overlapping locations), without explicitly synchronizing the 
writes (i.e. I dont lock the file handle)


How exactly does your multi-threaded application write the data?  Are 
you using pwrite to ensure non-overlapping regions or are they all just 
doing unlocked write() operations on the same fd to each write (each 
just transferring size/8)?  If it divides the file into N pieces, and 
each thread does pwrite on its piece, then what each OST sees are 
multiple streams at wide offsets to the same object, which could impact 
performance.

If on the other hand the file is written sequentially, where each thread 
grabs the next piece to be written (locking normally used for the 
current_offset value, so you know where each chunk is actually going), 
then you get a more sequential pattern at the OST.

If the number of threads maps to the number of OSTs (or some modulo, 
like in your case 6 OSTs per thread), and each thread owns the piece 
of the file that belongs to an OST (ie: for (offset = thread_num * 6MB; 
offset  size; offset += 48MB) pwrite(fd, buf, 6MB, offset); ), then 
you've eliminated the need for application locks (assuming the use of 
pwrite) and ensured each OST object is being written sequentially.

It's quite possible there is some bottleneck on the shared fd.  So 
perhaps the question is not why you aren't scaling with more threads, 
but why the single file is not able to saturate the client, or why the 
file BW is not scaling with more OSTs.  It is somewhat common for 
multiple processes (on different nodes) to write non-overlapping regions 
of the same file; does performance improve if each thread opens its own 
file descriptor?

Kevin


Wojciech Turek wrote:
 Ok so it looks like you have in total 64 OSTs and your output file is 
 striped across 48 of them. May I suggest that you limit number of 
 stripes, lets say a good number to start with would be 8 stripes and 
 also for best results use OST pools feature to arrange that each 
 stripe goes to OST owned by different OSS.

 regards,

 Wojciech

 On 23 May 2011 23:09, kme...@cs.uh.edu mailto:kme...@cs.uh.edu wrote:

 Actually, 'lfs check servers' returns 64 entries as well, so I
 presume the
 system documentation is out of date.

 Again, I am sorry the basic information had been incorrect.

 - Kshitij

  Run lfs getstripe your_output_file and paste the output of
 that command
  to
  the mailing list.
  Stripe count of 48 is not possible if you have max 11 OSTs (the
 max stripe
  count will be 11)
  If your striping is correct, the bottleneck can be your client
 network.
 
  regards,
 
  Wojciech
 
 
 
  On 23 May 2011 22:35, kme...@cs.uh.edu
 mailto:kme...@cs.uh.edu wrote:
 
  The stripe count is 48.
 
  Just fyi, this is what my application does:
  A simple I/O test where threads continually write blocks of size
  64Kbytes
  or 1Mbyte (decided at compile time) till a large file of say,
 16Gbytes
  is
  created.
 
  Thanks,
  Kshitij
 
   What is your stripe count on the file,  if your default is 1,
 you are
  only
   writing to one of the OST's.  you can check with the lfs
 getstripe
   command, you can set the stripe bigger, and hopefully your
  wide-stripped
   file with threaded writes will be faster.
  
   Evan
  
   -Original Message-
   From: lustre-community-boun...@lists.lustre.org
 mailto:lustre-community-boun...@lists.lustre.org
   [mailto:lustre-community-boun...@lists.lustre.org
 mailto:lustre-community-boun...@lists.lustre.org] On Behalf Of
   kme...@cs.uh.edu mailto:kme...@cs.uh.edu
   Sent: Monday, May 23, 2011 2:28 PM
   To: lustre-commun...@lists.lustre.org
 mailto:lustre-commun...@lists.lustre.org
   Subject: [Lustre-community] Poor multithreaded I/O performance
  
   Hello,
   I am running a multithreaded application that writes to a common
  shared
   file on lustre fs, and this is what I see:
  
   If I have a single thread in my application, I get a bandwidth of
  approx.
   250 MBytes/sec. (11 OSTs, 1MByte stripe size) However, if I
 spawn 8
   threads such that all of them write to the same file
 (non-overlapping
   locations), without explicitly synchronizing the writes (i.e.
 I dont
  lock
   the file handle), I still get the same bandwidth.
  
   Now, instead of writing to a shared file, if these threads
 write to
   separate files, the bandwidth obtained is approx. 700 Mbytes/sec.
  
   I would ideally like my multithreaded application to see similar
  scaling.
   Any ideas why the performance is limited and any workarounds?
  
   Thank you,
   Kshitij
  
  

Re: [Lustre-discuss] Poor multithreaded I/O performance

2011-05-24 Thread kmehta
This is what my application does:

Each thread has its own file descriptor to the file.
I use pwrite to ensure non-overlapping regions, as follows:

Thread 0, data_size: 1MB, offset: 0
Thread 1, data_size: 1MB, offset: 1MB
Thread 2, data_size: 1MB, offset: 2MB
Thread 3, data_size: 1MB, offset: 3MB

repeat cycle
Thread 0, data_size: 1MB, offset: 4MB
and so on (This happens in parallel, I dont wait for one cycle to end
before the next one begins).

I am gonna try the following:
a)
Instead of a round-robin distribution of offsets, test with sequential
offsets:
Thread 0, data_size: 1MB, offset:0
Thread 0, data_size: 1MB, offset:1MB
Thread 0, data_size: 1MB, offset:2MB
Thread 0, data_size: 1MB, offset:3MB

Thread 1, data_size: 1MB, offset:4MB
and so on. (I am gonna keep these separate pwrite I/O requests instead of
merging them or using writev)

b)
Map the threads to the no. of OSTs using some modulo, as suggested in the
email below.

c)
Experiment with fewer no. of OSTs (I currently have 48).

I shall report back with my findings.

Thanks,
Kshitij

 [Moved to Lustre-discuss]


 However, if I spawn 8 threads such that all of them write to the same
 file (non-overlapping locations), without explicitly synchronizing the
 writes (i.e. I dont lock the file handle)


 How exactly does your multi-threaded application write the data?  Are
 you using pwrite to ensure non-overlapping regions or are they all just
 doing unlocked write() operations on the same fd to each write (each
 just transferring size/8)?  If it divides the file into N pieces, and
 each thread does pwrite on its piece, then what each OST sees are
 multiple streams at wide offsets to the same object, which could impact
 performance.

 If on the other hand the file is written sequentially, where each thread
 grabs the next piece to be written (locking normally used for the
 current_offset value, so you know where each chunk is actually going),
 then you get a more sequential pattern at the OST.

 If the number of threads maps to the number of OSTs (or some modulo,
 like in your case 6 OSTs per thread), and each thread owns the piece
 of the file that belongs to an OST (ie: for (offset = thread_num * 6MB;
 offset  size; offset += 48MB) pwrite(fd, buf, 6MB, offset); ), then
 you've eliminated the need for application locks (assuming the use of
 pwrite) and ensured each OST object is being written sequentially.

 It's quite possible there is some bottleneck on the shared fd.  So
 perhaps the question is not why you aren't scaling with more threads,
 but why the single file is not able to saturate the client, or why the
 file BW is not scaling with more OSTs.  It is somewhat common for
 multiple processes (on different nodes) to write non-overlapping regions
 of the same file; does performance improve if each thread opens its own
 file descriptor?

 Kevin


 Wojciech Turek wrote:
 Ok so it looks like you have in total 64 OSTs and your output file is
 striped across 48 of them. May I suggest that you limit number of
 stripes, lets say a good number to start with would be 8 stripes and
 also for best results use OST pools feature to arrange that each
 stripe goes to OST owned by different OSS.

 regards,

 Wojciech

 On 23 May 2011 23:09, kme...@cs.uh.edu mailto:kme...@cs.uh.edu
 wrote:

 Actually, 'lfs check servers' returns 64 entries as well, so I
 presume the
 system documentation is out of date.

 Again, I am sorry the basic information had been incorrect.

 - Kshitij

  Run lfs getstripe your_output_file and paste the output of
 that command
  to
  the mailing list.
  Stripe count of 48 is not possible if you have max 11 OSTs (the
 max stripe
  count will be 11)
  If your striping is correct, the bottleneck can be your client
 network.
 
  regards,
 
  Wojciech
 
 
 
  On 23 May 2011 22:35, kme...@cs.uh.edu
 mailto:kme...@cs.uh.edu wrote:
 
  The stripe count is 48.
 
  Just fyi, this is what my application does:
  A simple I/O test where threads continually write blocks of size
  64Kbytes
  or 1Mbyte (decided at compile time) till a large file of say,
 16Gbytes
  is
  created.
 
  Thanks,
  Kshitij
 
   What is your stripe count on the file,  if your default is 1,
 you are
  only
   writing to one of the OST's.  you can check with the lfs
 getstripe
   command, you can set the stripe bigger, and hopefully your
  wide-stripped
   file with threaded writes will be faster.
  
   Evan
  
   -Original Message-
   From: lustre-community-boun...@lists.lustre.org
 mailto:lustre-community-boun...@lists.lustre.org
   [mailto:lustre-community-boun...@lists.lustre.org
 mailto:lustre-community-boun...@lists.lustre.org] On Behalf Of
   kme...@cs.uh.edu mailto:kme...@cs.uh.edu
   Sent: Monday, May 23, 2011 2:28 PM
   To: 

Re: [Lustre-discuss] Landing and tracking tools improvements

2011-05-24 Thread Andreas Dilger
I have to echo some of the comments of Chris Morrone. 

The in-line comments in Gerrit do not appear in the expanded view, and need 
to be found individually in the various patches. Having 3 lines of context plus 
the comments is enough for most cases, and if not then a URL to the actual 
comment would be great. 

While the linking from Gerrit back to Jira is good (due to embedded LU-nnn 
link in patch summary line) it is not so easy to find which changes in Gerrit 
are open from the Jira ticket. Sometimes there are multiple changes open for a 
single bug, either by accident (forgetting the Change-Id) or on purpose.  
Having a single comment in Jira for each open change would be good. 

I also agree that allowing an entire change to be visible on a single page 
would be helpful. I user to pre-load a bunch of patches from bugzilla into my 
browser before a flight, but with Gerrit that isn't really practical due to the 
number of tabs it would create. 

That said, I like that Gerrit is a gatekeeper and ensures that what is 
inspected is also what is tested and landed, even if it means that patches 
sometimes have to go through multiple review cycles for trivial changes.

Being able to compare patches against previous versions in Gerrit speeds up the 
process of reviewing new versions of a change, but it is complicated if the 
base version of the patch is changed. At that point it will also show what 
gas changed between the base versions as if it were part of the patch, which is 
confusing. It would be better to limit the output to only the code that was 
modified in the two changes. 

Cheers, Andreas

On 2011-05-23, at 11:06 AM, Chris Gearing ch...@whamcloud.com wrote:

 We now have a whole kit of tools [Jira, Gerrit, Jenkins and Maloo] used 
 for tracking, reviewing and testing of code that are being used for the 
 development of Lustre. A lot of time has been spent integrating and 
 connecting them appropriately but as with anything the key is to 
 continuously look for ways to improve what we have and how it works.
 
 So my question is what do people think of the tools as they stand today 
 and how can we improve them moving forwards. if people can respond to 
 lustre-discuss then I'll correlate the outcome of any discussions and 
 then create a Wiki page that can form some plan for improvement.
 
 Please be as descriptive as possible in your replies and take into 
 account that I and others have no experience of Lustre past so if you 
 liked something prior to the current tools you'll need to help me and 
 them understand the details.
 
 Thanks
 
 Chris
 
 ---
 Chris Gearing
 Snr Engineer
 Whamcloud. Inc.
 
 
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] SLES 11 SP1 Client rpms built but not working

2011-05-24 Thread Rick Mohr
Peter,

Sorry for the late response.  I don't know if this will help you or not,
but below are the commands I ran to build the lustre client rpms on one
of our SLES systems:


nautilus:~ # cat /etc/SuSE-release 
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 1

nautilus:~ # uname -a
Linux nautilus 2.6.32.29-0.3.1.2687.3.PTF.607050.iommu-default #1 SMP 
2011-02-25 13:36:59 +0100 x86_64 x86_64 x86_64 GNU/Linux

nautilus:~ # cd /usr/src/linux-2.6.32.29-0.3.1.2687.3.PTF.607050.iommu

nautilus:/usr/src/linux-2.6.32.29-0.3.1.2687.3.PTF.607050.iommu # make 
cloneconfig
Cloning configuration file /proc/config.gz
...

nautilus:/usr/src/linux-2.6.32.29-0.3.1.2687.3.PTF.607050.iommu # make prepare
scripts/kconfig/conf -s arch/x86/Kconfig
  CHK include/linux/version.h
  UPD include/linux/version.h


nautilus:/usr/src/linux-2.6.32.29-0.3.1.2687.3.PTF.607050.iommu # make scripts
  HOSTCC  scripts/genksyms/genksyms.o
  SHIPPED scripts/genksyms/lex.c


nautilus:/usr/src/linux-2.6.32.29-0.3.1.2687.3.PTF.607050.iommu # cd 
/root/lustre-1.8.5

nautilus:~/lustre-1.8.5 # ./configure --disable-server 
--with-linux=/usr/src/linux-2.6.32.29-0.3.1.2687.3.PTF.607050.iommu \
--with-linux-obj=/usr/src/linux-2.6.32.29-0.3.1.2687.3.PTF.607050.iommu-obj/x86_64/default
 \
--with-linux-config=/boot/config-2.6.32.29-0.3.1.2687.3.PTF.607050.iommu-default
checking build system type... x86_64-unknown-linux-gnu
checking host system type... x86_64-unknown-linux-gnu


nautilus:~/lustre-1.8.5 # make rpms

-- 
Rick Mohr
HPC Systems Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu/

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Client Kernel panic - not syncing. Lustre 1.8.5

2011-05-24 Thread Mag Gam
stick with 1.6.6 , its a great release! BTW, why did you decide to
upgrade to 1.8.x? is there a feature you are looking for?


On Fri, May 20, 2011 at 2:48 PM, Aaron Everett aever...@forteds.com wrote:
 Thanks for the tip. I've already updated with the LU-286 patch, but I'll
 build new rpms with both patches and roll that out too. Since updating with
 the LU-286 patch Lustre has been running cleanly. Thanks for the support and
 the work!
 Aaron

 On Fri, May 20, 2011 at 4:40 AM, Johann Lombardi joh...@whamcloud.com
 wrote:

 On Thu, May 19, 2011 at 01:57:33PM -0400, Aaron Everett wrote:
  Sorry for the noise. I cleaned everything up, untarred a fresh copy of

 np. BTW, while you are patching the lustre client, you might also want to
 apply the following patch http://review.whamcloud.com/#change,457 which
 fixes a memory leak in the same part of the code.

 Johann
 --
 Johann Lombardi
 Whamcloud, Inc.
 www.whamcloud.com


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre HA Experiences

2011-05-24 Thread Mag Gam
What was your conclusion? What is a good HA solution with Lustre? I am
hoping SNS will be a big push for the next year


On Wed, May 4, 2011 at 5:16 PM, Jason Rappleye jason.rappl...@nasa.gov wrote:

 On May 4, 2011, at 10:05 AM, Charles Taylor wrote:


 We are dipping our toes into the waters of Lustre HA using
 pacemaker.     We have 16 7.2 TB OSTs across 4 OSSs (4 OSTs each).
 The four OSSs are broken out into two dual-active pairs running Lustre
 1.8.5.    Mostly, the water is fine but we've encountered a few
 surprises.

 1. An 8-client  iozone write test in which we write 64 files of 1.7
 TB  each seems to go well - until the end at which point iozone seems
 to finish successfully and begins its cleanup.   That is to say it
 starts to remove all 64 large files.    At this point, the ll_ost
 threads go bananas - consuming all available cpu cycles on all 8 cores
 of each server.   This seems to block the corosync totem exchange
 long enough to initiate a stonith request.

 Running oprofile or profile.pl (possibly only included in SGI's respin of 
 perfsuite, original is at http://perfsuite.ncsa.illinois.edu/) is useful in 
 situations where you have one or more thread consuming a lot of CPU. It 
 should point to what function(s) the offending thread(s) are spending time 
 in. From there, bugzilla/jira or the mailing list should be able to help 
 further.

 2. We have found that re-mounting the OSTs, either via the HA agent or
 manually, often can take a *very* long time - on the order of four or
 five minutes.   We have not figured out why yet.   An strace of the
 mount process has not yielded much.    The mount seems to just be
 waiting for something but we can't tell what.

 Could be bz 18456.

 Jason

 --
 Jason Rappleye
 System Administrator
 NASA Advanced Supercomputing Division
 NASA Ames Research Center
 Moffett Field, CA 94035





 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss