[Lustre-discuss] Information regarding the FILE HANDLE

2011-05-26 Thread vilobh meshram
Hi,

I was wondering in which part/structure of the code the file handle
information is present in Lustre. Can somebody please point me to it ?

I tried looking up in the code but couldn't exactly make it out.

Is File handle part of EA i.e. the extended attributes ?

Thanks,
Vilobh
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] mv_sata module for rhel5 and write through patch

2011-05-26 Thread Brock Palen
We are (finally) updating our x4500's to rhel5 and luster 1.8.5 from rhel4 and 
1.6.7

On rhel4 we had used the patch from:
https://bugzilla.lustre.org/show_bug.cgi?id=14040

for the mv_sata  module.

Is this still recommended on rhel5? To use the mv_sata module over the stock 
redhat sata_mv as well as applying this patch?  That patch is quite old is 
there a newer one?

What are other x4500/thumper users running?

Also I will do some digging on the list but why is lustre 2.0 not the 
'production' version? We are planning on 1.8.x for now but if 2.0 is stable we 
would install that one.

Can we upgrade directly from 1.6 to 2.0 if we did this?

Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
bro...@umich.edu
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mv_sata module for rhel5 and write through patch

2011-05-26 Thread Kevin Van Maren
Brock Palen wrote:
 We are (finally) updating our x4500's to rhel5 and luster 1.8.5 from rhel4 
 and 1.6.7

 On rhel4 we had used the patch from:
 https://bugzilla.lustre.org/show_bug.cgi?id=14040

 for the mv_sata  module.

 Is this still recommended on rhel5? To use the mv_sata module over the stock 
 redhat sata_mv as well as applying this patch?  That patch is quite old is 
 there a newer one?
   

I don't know: the last I heard was that the upcoming rhel 5.3 was to 
have an in-tree Marvell driver that worked.  If your system is still 
under support, I'd contact Oracle support for information about running 
RHEL5 on the x4500.

You do want to ensure the write-back cache is disabled on the drive, but 
you may be able to do that with udev scripts.  See Bug 17462 for an 
example for the J4400.

 What are other x4500/thumper users running?

 Also I will do some digging on the list but why is lustre 2.0 not the 
 'production' version? We are planning on 1.8.x for now but if 2.0 is stable 
 we would install that one.
   

Lustre 2.0 is not being widely used, and would not be covered by an 
Oracle support contract.  It is strongly recommended to run production 
systems on 1.8.x rather than 2.0.  If you really want to try Lustre 2.x, 
you will want to use something newer than 2.0: maybe check with 
lustre...@googlegroups.com for the current status of the whamcloud git 
repository?

 Can we upgrade directly from 1.6 to 2.0 if we did this?

 Brock Palen
 www.umich.edu/~brockp
 Center for Advanced Computing
 bro...@umich.edu
 (734)936-1985

   

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] mount mdt/mgs - file exists -17

2011-05-26 Thread Dan
Hi,

After my MDS crashed I was unable to mount the mdt/mgs.  The dmesg
output is below.  I'm unable to remove lustre modules (lustre_rmmod) and
it's listed under /proc/fs/lustre/devices but not mounted.  Rebooting
the system to try again results in a kernel panic.  Upon reset I ran
fsck which revealed no problems so I tried a --writeconf and deleted
CATALOGS but still received -17 and was unable to reboot clean or unload
modules.

Fortunately this is my test system but I'd like to understand what
happened!  Running Lustre 1.8.5 on RHEL 5.5.

cat /proc/fs/lustre/devices
7 AT osc test-OST-osc test-mdtlov_UUID 1

Lustre: MGS MGS started
Lustre: MGC192.168.5.100@o2ib: Reactivating import
Lustre: MGC192.168.5.100@o2ib: Reactivating import
Lustre: Enabling user_xattr
Lustre: test-MDT: Now serving test-MDT on /dev/sda1 with
recovery enabled
Lustre: 5590:0:(lproc_mds.c:271:lprocfs_wr_group_upcall()) test-MDT:
group upcall set to /usr/sbin/l_getgroups
Lustre: test-MDT.mdt: set parameter
group_upcall=/usr/sbin/l_getgroups
LustreError: 5590:0:(ldlm_lib.c:331:client_obd_setup()) can't add
initial connection
LustreError: 5590:0:(obd_config.c:372:class_setup()) setup
test-OST-osc failed (-2)
LustreError: 5590:0:(obd_config.c:1199:class_config_llog_handler()) Err
-2 on cfg command:
Lustre:cmd=cf003 0:test-OST-osc  1:test-OST_UUID
2:128.174.5.100@tcp  
LustreError: 15c-8: MGC192.168.5.100@o2ib: The configuration from log
'test-MDT' failed (-2). This may be the result of communication
errors between this node and the MGS, a bad configuration, or other
errors. See the syslog for more information.
LustreError: 5453:0:(obd_mount.c:1126:server_start_targets()) failed to
start server test-MDT: -2
LustreError: 5453:0:(obd_mount.c:1655:server_fill_super()) Unable to
start targets: -2
Lustre: Failing over test-MDT
Lustre: Failing over test-mdtlov
Lustre: test-MDT: shutting down for failover; client state will be
preserved.
Lustre: MDT test-MDT has stopped.
Lustre: MGS has stopped.
Lustre: server umount test-MDT complete
LustreError: 5453:0:(obd_mount.c:2050:lustre_fill_super()) Unable to
mount  (-2)

Thanks,

Dan
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] df and du difference on lustre fs

2011-05-26 Thread Ekaterina Popova
Greetings,
I've got such strange situation:

We've installed lustre-client(2.0.0) and xen(4.0) on Debian squeeze. 
Then we loaded img files with virtual machine images on lustre (images 
were striped on 2 OSTs). VM started but then crashed. After that I tried 
to remove img files from lustre. Client hangs and xen server hangs too. 
After that I removed img files from another lustre client without any 
problems and reboot xen server. When I make df -h it seems that img 
files are on the lustre, but ls command doesn't show deleted files and 
du -sm shows correct used space on lustre.

df -h /lustre/
FilesystemSize  Used Avail Use% Mounted on
10.165.175.201@tcp0:/lihep
   5.4T   62G  5.1T   2% /lustre

du -sm /lustre/
41630/lustre/

On OSTs :

# df -h
FilesystemSize  Used Avail Use% Mounted on
...
/dev/md10 2.7T   32G  2.6T   2% /ost1

# df -h
FilesystemSize  Used Avail Use% Mounted on
...
/dev/md13 2.7T   30G  2.6T   2% /ost2

How can I fix it ?

With regards,
Ekaterina Popova
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Poor multithreaded I/O performance

2011-05-26 Thread kmehta
Ok I ran the following tests:

[1]
Application spawns 8 threads. I write to Lustre having 8 OSTs.
Each thread writes data in blocks of 1 Mbyte in a round robin fashion, i.e.

T0 writes to offsets 0, 8MB, 16MB, etc.
T1 writes to offsets 1MB, 9MB, 17MB, etc.
The stripe size being 1MByte, every thread ends up writing to only 1 OST.

I see a bandwidth of 280 Mbytes/sec, similar to the single thread
performance.

[2]
I also ran the same test such that every thread writes data in blocks of 8
Mbytes for the same stripe size. (Thus, every thread will write to every
OST). I still get similar performance, ~280Mbytes/sec, so essentially I
see no difference between each thread writing to a single OST vs each
thread writing to all OSTs.

And as I said before, if all threads write to their own separate file, the
resulting bandwidth is ~700Mbytes/sec.

I have attached my C file (simple_io_test.c) herewith. Maybe you could run
it and see where the bottleneck is. Comments and instructions for
compilation have been included in the file. Do let me know if you need any
clarification on that.

Your help is appreciated,
Kshitij

 This is what my application does:

 Each thread has its own file descriptor to the file.
 I use pwrite to ensure non-overlapping regions, as follows:

 Thread 0, data_size: 1MB, offset: 0
 Thread 1, data_size: 1MB, offset: 1MB
 Thread 2, data_size: 1MB, offset: 2MB
 Thread 3, data_size: 1MB, offset: 3MB

 repeat cycle
 Thread 0, data_size: 1MB, offset: 4MB
 and so on (This happens in parallel, I dont wait for one cycle to end
 before the next one begins).

 I am gonna try the following:
 a)
 Instead of a round-robin distribution of offsets, test with sequential
 offsets:
 Thread 0, data_size: 1MB, offset:0
 Thread 0, data_size: 1MB, offset:1MB
 Thread 0, data_size: 1MB, offset:2MB
 Thread 0, data_size: 1MB, offset:3MB

 Thread 1, data_size: 1MB, offset:4MB
 and so on. (I am gonna keep these separate pwrite I/O requests instead of
 merging them or using writev)

 b)
 Map the threads to the no. of OSTs using some modulo, as suggested in the
 email below.

 c)
 Experiment with fewer no. of OSTs (I currently have 48).

 I shall report back with my findings.

 Thanks,
 Kshitij

 [Moved to Lustre-discuss]


 However, if I spawn 8 threads such that all of them write to the same
 file (non-overlapping locations), without explicitly synchronizing the
 writes (i.e. I dont lock the file handle)


 How exactly does your multi-threaded application write the data?  Are
 you using pwrite to ensure non-overlapping regions or are they all just
 doing unlocked write() operations on the same fd to each write (each
 just transferring size/8)?  If it divides the file into N pieces, and
 each thread does pwrite on its piece, then what each OST sees are
 multiple streams at wide offsets to the same object, which could impact
 performance.

 If on the other hand the file is written sequentially, where each thread
 grabs the next piece to be written (locking normally used for the
 current_offset value, so you know where each chunk is actually going),
 then you get a more sequential pattern at the OST.

 If the number of threads maps to the number of OSTs (or some modulo,
 like in your case 6 OSTs per thread), and each thread owns the piece
 of the file that belongs to an OST (ie: for (offset = thread_num * 6MB;
 offset  size; offset += 48MB) pwrite(fd, buf, 6MB, offset); ), then
 you've eliminated the need for application locks (assuming the use of
 pwrite) and ensured each OST object is being written sequentially.

 It's quite possible there is some bottleneck on the shared fd.  So
 perhaps the question is not why you aren't scaling with more threads,
 but why the single file is not able to saturate the client, or why the
 file BW is not scaling with more OSTs.  It is somewhat common for
 multiple processes (on different nodes) to write non-overlapping regions
 of the same file; does performance improve if each thread opens its own
 file descriptor?

 Kevin


 Wojciech Turek wrote:
 Ok so it looks like you have in total 64 OSTs and your output file is
 striped across 48 of them. May I suggest that you limit number of
 stripes, lets say a good number to start with would be 8 stripes and
 also for best results use OST pools feature to arrange that each
 stripe goes to OST owned by different OSS.

 regards,

 Wojciech

 On 23 May 2011 23:09, kme...@cs.uh.edu mailto:kme...@cs.uh.edu
 wrote:

 Actually, 'lfs check servers' returns 64 entries as well, so I
 presume the
 system documentation is out of date.

 Again, I am sorry the basic information had been incorrect.

 - Kshitij

  Run lfs getstripe your_output_file and paste the output of
 that command
  to
  the mailing list.
  Stripe count of 48 is not possible if you have max 11 OSTs (the
 max stripe
  count will be 11)
  If your striping is correct, the bottleneck can be your client
 network.
 
  regards,
 
  

Re: [Lustre-discuss] df and du difference on lustre fs

2011-05-26 Thread Oleg Drokin
Hello!

On May 26, 2011, at 8:19 AM, Ekaterina Popova wrote:

 We've installed lustre-client(2.0.0) and xen(4.0) on Debian squeeze. 
 Then we loaded img files with virtual machine images on lustre (images 
 were striped on 2 OSTs). VM started but then crashed. After that I tried 
 to remove img files from lustre. Client hangs and xen server hangs too. 

kernel messages from the hung instance would be interesting to see.

 After that I removed img files from another lustre client without any 
 problems and reboot xen server. When I make df -h it seems that img 
 files are on the lustre, but ls command doesn't show deleted files and 
 du -sm shows correct used space on lustre.

I assume that the hung client was holding the file open, then when you
unlinked the files from another client, only name got removed on MDS
and objects on OSTs remained.

 # df -h
 FilesystemSize  Used Avail Use% Mounted on
 ...
 /dev/md13 2.7T   30G  2.6T   2% /ost2
 How can I fix it ?

Since you already killed the client holding the file open
once MDS registers the death of that client due to ping timeout,
it would close the descriptor and would proceed with destroying the
now orphan objects.

Alternatively next time your MDS restarts such orphaned objects should
also be destroyed.

Bye,
Oleg
--
Oleg Drokin
Senior Software Engineer
Whamcloud, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Information regarding the FILE HANDLE

2011-05-26 Thread Oleg Drokin
Hello!

On May 26, 2011, at 3:13 AM, vilobh meshram wrote:

 I was wondering in which part/structure of the code the file handle 
 information is present in Lustre. Can somebody please point me to it ? 
 
 I tried looking up in the code but couldn't exactly make it out.
 
 Is File handle part of EA i.e. the extended attributes ?

There are multiple things that could be called file handle, so it would be 
great if you explain a little bit
about what is it you are actually looking for.

Bye,
Oleg
--
Oleg Drokin
Senior Software Engineer
Whamcloud, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] mount mdt/mgs - file exists -17

2011-05-26 Thread Johann Lombardi
On Thu, May 26, 2011 at 10:27:26AM -0700, Dan wrote:
 Lustre: MGS MGS started
 Lustre: MGC192.168.5.100@o2ib: Reactivating import
 Lustre: MGC192.168.5.100@o2ib: Reactivating import

So you use infiniband ...

[...]
 LustreError: 5590:0:(ldlm_lib.c:331:client_obd_setup()) can't add
 initial connection
 LustreError: 5590:0:(obd_config.c:372:class_setup()) setup
 test-OST-osc failed (-2)
 LustreError: 5590:0:(obd_config.c:1199:class_config_llog_handler()) Err
 -2 on cfg command:
 Lustre:cmd=cf003 0:test-OST-osc  1:test-OST_UUID
 2:128.174.5.100@tcp

But a tcp nid is registered for OST. Is this intended?
If so, have you configured lnet on the MDS to use tcp?

Cheers,
Johann

-- 
Johann Lombardi
Whamcloud, Inc.
www.whamcloud.com
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Information regarding the FILE HANDLE

2011-05-26 Thread vilobh meshram
Hi Oleg,

Thanks for the reply.

What is equivalent of the NFS File handle in Lustre ?

Can you give me example of few of the multiple things that could be called
as a File handle.

Thanks,
Vilobh

On Thu, May 26, 2011 at 4:30 PM, Oleg Drokin gr...@whamcloud.com wrote:

 Hello!

 On May 26, 2011, at 3:13 AM, vilobh meshram wrote:

  I was wondering in which part/structure of the code the file handle
 information is present in Lustre. Can somebody please point me to it ?
 
  I tried looking up in the code but couldn't exactly make it out.
 
  Is File handle part of EA i.e. the extended attributes ?

 There are multiple things that could be called file handle, so it would be
 great if you explain a little bit
 about what is it you are actually looking for.

 Bye,
Oleg
 --
 Oleg Drokin
 Senior Software Engineer
 Whamcloud, Inc.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Information regarding the FILE HANDLE

2011-05-26 Thread Oleg Drokin
Hello!

   Well, the closest to nfs fh is probably lustre FID. (inode inum and 
generation in 1.8).

   Multiple things are: the open handle returned by MDS open, the FIDs, the nfs 
file handle constructed by lustre_nfs layer, the file descriptors returned from 
open(2), ...

Bye,
Oleg
On May 26, 2011, at 7:31 PM, vilobh meshram wrote:

 Hi Oleg,
 
 Thanks for the reply.
 
 What is equivalent of the NFS File handle in Lustre ?
 
 Can you give me example of few of the multiple things that could be called as 
 a File handle.
 
 Thanks,
 Vilobh
 
 On Thu, May 26, 2011 at 4:30 PM, Oleg Drokin gr...@whamcloud.com wrote:
 Hello!
 
 On May 26, 2011, at 3:13 AM, vilobh meshram wrote:
 
  I was wondering in which part/structure of the code the file handle 
  information is present in Lustre. Can somebody please point me to it ?
 
  I tried looking up in the code but couldn't exactly make it out.
 
  Is File handle part of EA i.e. the extended attributes ?
 
 There are multiple things that could be called file handle, so it would be 
 great if you explain a little bit
 about what is it you are actually looking for.
 
 Bye,
Oleg
 --
 Oleg Drokin
 Senior Software Engineer
 Whamcloud, Inc.
 
 

--
Oleg Drokin
Senior Software Engineer
Whamcloud, Inc.

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss