Re: [Lustre-discuss] [wc-discuss] The ost_connect operation failed with -16

2012-05-30 Thread Liang Zhen
Hi, I think you might hit this: http://jira.whamcloud.com/browse/LU-952 , you 
can find the patch from this ticket

Regards
Liang

On May 30, 2012, at 11:21 AM, huangql wrote:

 Dear  all,
 
 Recently we found the problem in OSS that some threads might be hung when the 
 server got heavy IO load. In this case, some clients will be evicted or 
 refused by some OSTs and got the error messages as following:
 
 Server side:
 
 May 30 11:06:31 boss07 kernel: Lustre: Service thread pid 8011 was inactive 
 for 200.00s. The thread might be hung, or it might only be slow and will 
 resume later. D
 umping the stack trace for debugging purposes: May 30 11:06:31 boss07 kernel: 
 Lustre: Skipped 1 previous similar message
 May 30 11:06:31 boss07 kernel: Pid: 8011, comm: ll_ost_71 
 May 30 11:06:31 boss07 kernel: 
 May 30 11:06:31 boss07 kernel: Call Trace:
 May 30 11:06:31 boss07 kernel:  [886f5d0e] 
 start_this_handle+0x301/0x3cb [jbd2]
 May 30 11:06:31 boss07 kernel:  [800a09ca] 
 autoremove_wake_function+0x0/0x2e
 May 30 11:06:31 boss07 kernel:  [886f5e83] 
 jbd2_journal_start+0xab/0xdf [jbd2]
 May 30 11:06:31 boss07 kernel:  [888ce9b2] 
 fsfilt_ldiskfs_start+0x4c2/0x590 [fsfilt_ldiskfs]
 May 30 11:06:31 boss07 kernel:  [88920551] 
 filter_version_get_check+0x91/0x2a0 [obdfilter]
 May 30 11:06:31 boss07 kernel:  [80036cf4] __lookup_hash+0x61/0x12f
 May 30 11:06:31 boss07 kernel:  [8893108d] 
 filter_setattr_internal+0x90d/0x1de0 [obdfilter]
 May 30 11:06:31 boss07 kernel:  [800e859b] lookup_one_len+0x53/0x61
 May 30 11:06:31 boss07 kernel:  [88925452] 
 filter_fid2dentry+0x512/0x740 [obdfilter]
 May 30 11:06:31 boss07 kernel:  [88924e27] 
 filter_fmd_get+0x2b7/0x320 [obdfilter]
 May 30 11:06:31 boss07 kernel:  [8003027b] __up_write+0x27/0xf2
 May 30 11:06:31 boss07 kernel:  [88932721] 
 filter_setattr+0x1c1/0x3b0 [obdfilter]
 May 30 11:06:31 boss07 kernel:  [8882677a] 
 lustre_pack_reply_flags+0x86a/0x950 [ptlrpc]
 May 30 11:06:31 boss07 kernel:  [8881e658] 
 ptlrpc_send_reply+0x5c8/0x5e0 [ptlrpc]
 May 30 11:06:31 boss07 kernel:  [88822b05] 
 lustre_msg_get_version+0x35/0xf0 [ptlrpc]
 May 30 11:06:31 boss07 kernel:  [888b0abb] ost_handle+0x25db/0x55b0 
 [ost]
 May 30 11:06:31 boss07 kernel:  [80150d56] __next_cpu+0x19/0x28
 May 30 11:06:31 boss07 kernel:  [800767ae] 
 smp_send_reschedule+0x4e/0x53
 May 30 11:06:31 boss07 kernel:  [8883215a] 
 ptlrpc_server_handle_request+0x97a/0xdf0 [ptlrpc]
 May 30 11:06:31 boss07 kernel:  [888328a8] 
 ptlrpc_wait_event+0x2d8/0x310 [ptlrpc]
 May 30 11:06:31 boss07 kernel:  [8008b3bd] 
 __wake_up_common+0x3e/0x68
 May 30 11:06:31 boss07 kernel:  [88833817] ptlrpc_main+0xf37/0x10f0 
 [ptlrpc]
 May 30 11:06:31 boss07 kernel:  [8005dfb1] child_rip+0xa/0x11
 May 30 11:06:31 boss07 kernel:  [888328e0] ptlrpc_main+0x0/0x10f0 
 [ptlrpc]
 May 30 11:06:31 boss07 kernel:  [8005dfa7] child_rip+0x0/0x11
 May 30 11:06:31 boss07 kernel:
 May 30 11:06:31 boss07 kernel: LustreError: dumping log to 
 /tmp/lustre-log.1338347191.8011
 
 
 Client side:
 
 May 30 09:58:36 ccopt kernel: LustreError: 11-0: an error occurred while 
 communicating with 192.168.50.123@tcp. The ost_connect operation failed with 
 -16
 
 When you got this error message, you failed to run ls, df ,vi, touch 
 and so on, which affect us to do anything in the file system.
 I think the ost_connect failure could report some error messages to users 
 instead of  causing any interactive actions stuck.
 
 Could someone give us some advice or any suggestions to solve this problem?
 
 Thank you very much in advance.
 
 
 Best Regards
 Qiulan Huang
 2012-05-30
 
 Computing center,the Institute of High Energy Physics, China
 Huang, QiulanTel: (+86) 10 8823 6010-105
 P.O. Box 918-7   Fax: (+86) 10 8823 6839
 Beijing 100049  P.R. China   Email: huan...@ihep.ac.cn
 ===   
 
 
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Tar backup of MDT runs extremely slow, tar pauses on pointers to very large files

2012-05-30 Thread Peter Grandi
[ ... ]

 The tar backup of the MDT is taking a very long time. So far it has
 backed up 1.6GB of the 5.0GB used in nine hours. In watching the tar
 process pointers to small or average size files are backed up quickly
 and at a consistent pace. When tar encounters a pointer/inode
 belonging to a very large file (100GB+) the tar process stalls on that
 file for a very long time, as if it were trying to archive the real
 filesize amount of data rather than the pointer/inode.

If you have stripes on, a 100GiB file will have 100,000 1MiB
stripes, and each requires a chunk of metadata. The descriptor
for that file will have this potentially a very large number of
extents, scattered around the MDT block device, depending on how
slowly the file grew etc.
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ofed with FDR14 support Lustre

2012-05-30 Thread Michael Shuey
We're using 1.8.7.80-wc1 here.  It's basically 1.8.7-wc1, but with a
few fixes pulled in from git a few months back to build on rhel6.2.
It's built on top of Mellanox's OFED 1.5.3-3.0.0, and is working just
fine on our FDR14 cluster.

--
Mike Shuey


On Wed, May 30, 2012 at 3:41 PM, John White jwh...@lbl.gov wrote:
 Does anyone know of Lustre version that can build against an ofed that 
 supports FDR14 (1.5.4+, by my understanding)?  Or is this still in the pipes?

 The compat matrix on the Whamcloud site only talks of support up to 1.5.3.1 
 (confirmed to build but doesn't support FDR14).
 
 John White
 HPC Systems Engineer
 (510) 486-7307
 One Cyclotron Rd, MS: 50C-3209C
 Lawrence Berkeley National Lab
 Berkeley, CA 94720

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] ofed with FDR14 support Lustre

2012-05-30 Thread Ashley Pittman

We have a small patch to lbuild to build against Mellanox 1.5.3-3.  One
patch is to allow specifying of OFED build trees by name rather than
version, the second patch is to build with the mellanox ofed.

These are from 2.1.2 on RHEL6 but they apply elsewhere as well.

Ashley.

On Wed, 2012-05-30 at 12:54 -0700, Michael Shuey wrote:
 We're using 1.8.7.80-wc1 here.  It's basically 1.8.7-wc1, but with a
 few fixes pulled in from git a few months back to build on rhel6.2.
 It's built on top of Mellanox's OFED 1.5.3-3.0.0, and is working just
 fine on our FDR14 cluster.
 
 --
 Mike Shuey
 
 
 On Wed, May 30, 2012 at 3:41 PM, John White jwh...@lbl.gov wrote:
  Does anyone know of Lustre version that can build against an ofed that 
  supports FDR14 (1.5.4+, by my understanding)?  Or is this still in the 
  pipes?
 
  The compat matrix on the Whamcloud site only talks of support up to 1.5.3.1 
  (confirmed to build but doesn't support FDR14).
  
  John White
  HPC Systems Engineer
  (510) 486-7307
  One Cyclotron Rd, MS: 50C-3209C
  Lawrence Berkeley National Lab
  Berkeley, CA 94720
 
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

diff -r 7d5dec13571e lustre/kernel_patches/targets/2.6-rhel6.target.in
--- a/lustre/kernel_patches/targets/2.6-rhel6.target.in	Tue May 22 12:32:58 2012 +0100
+++ b/lustre/kernel_patches/targets/2.6-rhel6.target.in	Tue May 22 12:37:20 2012 +0100
@@ -7,7 +7,8 @@
 LUSTRE_VERSION=@VERSION@
 
 DEVEL_PATH_ARCH_DELIMETER=.
-OFED_VERSION=inkernel
+OFED_VERSION=1.5.3
+OFED_TARBALL=MLNX_OFED_SRC-1.5.3-3.0.0.tgz
 
 BASE_ARCHS=i686 x86_64 ia64 ppc64
 BIGMEM_ARCHS=

diff -r 7121b6da363f build/lbuild
--- a/build/lbuild	Tue May 22 12:46:07 2012 +0100
+++ b/build/lbuild	Tue May 22 12:47:05 2012 +0100
@@ -531,6 +531,15 @@
 return 0
 fi
 
+# If a full filename has been provided instead of just a version
+# then use that.
+if [ -n ${OFED_TARBALL} ]; then
+if [ -f ${KERNELDIR}/${OFED_TARBALL} ]; then
+return 0
+fi
+fatal 1 ${OFED_TARBALL} not found in ${KERNELDIR}
+fi
+
 local OFED_BASE_VERSION=$OFED_VERSION
 if [[ $OFED_VERSION = *.*.*.* ]]; then
 OFED_BASE_VERSION=${OFED_VERSION%.*}
@@ -692,10 +701,17 @@
 
 unpack_ofed() {
 
-if ! untar $KERNELDIR/OFED-${OFED_VERSION}.tgz; then
-return 1
+if [ -n ${OFED_TARBALL} ]; then
+if ! untar $KERNELDIR/${OFED_TARBALL}; then
+return 1
+fi
+else
+if ! untar $KERNELDIR/OFED-${OFED_VERSION}.tgz; then
+return 1
+fi
 fi
 [ -d OFED ] || ln -sf OFED-[0-9].[0-9]* OFED
+[ -d OFED ] || ln -sf *OFED_SRC-[0-9].[0-9]* OFED
 
 }
 

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Tar backup of MDT runs extremely slow, tar pauses on pointers to very large files

2012-05-30 Thread Andreas Dilger
On 2012-05-29, at 1:28 PM, Peter Grandi wrote:
 The tar backup of the MDT is taking a very long time. So far it has
 backed up 1.6GB of the 5.0GB used in nine hours. In watching the tar
 process pointers to small or average size files are backed up quickly
 and at a consistent pace. When tar encounters a pointer/inode
 belonging to a very large file (100GB+) the tar process stalls on that
 file for a very long time, as if it were trying to archive the real
 filesize amount of data rather than the pointer/inode.
 
 If you have stripes on, a 100GiB file will have 100,000 1MiB
 stripes, and each requires a chunk of metadata. The descriptor
 for that file will have this potentially a very large number of
 extents, scattered around the MDT block device, depending on how
 slowly the file grew etc.

While that may be true for other distributed filesystems, that is
not true for Lustre at all.  The size of a Lustre object is not
fixed to a chunk size like 32MB or similar, but rather is
variable depending on the size of the file itself.  The number of
stripes (== objects) on a file is currently fixed at file
creation time, and the MDS only needs to store the location of
each stripe (at most one per OST).  The actual blocks/extents of
the objects are managed inside the OST itself and are never seen
by the client or the MDS.

Cheers, Andreas
--
Andreas Dilger   Whamcloud, Inc.
Principal Lustre Engineerhttp://www.whamcloud.com/




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Tar backup of MDT runs extremely slow, tar pauses on pointers to very large files

2012-05-30 Thread Alex Kulyavtsev

Is this the same issue as at backup MDT question (and follow up)
http://lists.lustre.org/pipermail/lustre-discuss/2009-April/010151.html
due to sparse files on MDT?  Does tar take a lot of CPU?
Alex.

On May 30, 2012, at 5:02 PM, Andreas Dilger wrote:


The tar backup of the MDT is taking a very long time. So far it has
backed up 1.6GB of the 5.0GB used in nine hours. In watching the tar
process pointers to small or average size files are backed up  
quickly

and at a consistent pace. When tar encounters a pointer/inode
belonging to a very large file (100GB+) the tar process stalls on  
that

file for a very long time, as if it were trying to archive the real
filesize amount of data rather than the pointer/inode.


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Tar backup of MDT runs extremely slow, tar pauses on pointers to very large files

2012-05-30 Thread Jeff Johnson
Following up on my original post. I switched from /bin/tar that comes 
with RHEL/CentOS 5.x to thw Whamcloud patched tar utility. The entire 
backup was successful and took only 12 hours to complete. The CPU 
utilization was high 90% but only on one core. The process was much 
faster than the standard tar shipped in RHEL/CentOS and the only slow 
downs were on file pointers to very large files (100TB+) with large 
stripe counts. The files that were going very slow when I reported the 
initial problem were backed up instantly with the Whamcloud version of tar.

Best part, the MDT was saved and the 4PB filesystem is in production again.

--Jeff



On 5/30/12 3:02 PM, Andreas Dilger wrote:
 On 2012-05-29, at 1:28 PM, Peter Grandi wrote:
 The tar backup of the MDT is taking a very long time. So far it has
 backed up 1.6GB of the 5.0GB used in nine hours. In watching the tar
 process pointers to small or average size files are backed up quickly
 and at a consistent pace. When tar encounters a pointer/inode
 belonging to a very large file (100GB+) the tar process stalls on that
 file for a very long time, as if it were trying to archive the real
 filesize amount of data rather than the pointer/inode.
 If you have stripes on, a 100GiB file will have 100,000 1MiB
 stripes, and each requires a chunk of metadata. The descriptor
 for that file will have this potentially a very large number of
 extents, scattered around the MDT block device, depending on how
 slowly the file grew etc.
 While that may be true for other distributed filesystems, that is
 not true for Lustre at all.  The size of a Lustre object is not
 fixed to a chunk size like 32MB or similar, but rather is
 variable depending on the size of the file itself.  The number of
 stripes (== objects) on a file is currently fixed at file
 creation time, and the MDS only needs to store the location of
 each stripe (at most one per OST).  The actual blocks/extents of
 the objects are managed inside the OST itself and are never seen
 by the client or the MDS.

 Cheers, Andreas
 --
 Andreas Dilger   Whamcloud, Inc.
 Principal Lustre Engineerhttp://www.whamcloud.com/




 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


-- 
--
Jeff Johnson
Manager
Aeon Computing

jeff.john...@aeoncomputing.com
www.aeoncomputing.com
t: 858-412-3810 x101   f: 858-412-3845
m: 619-204-9061

4905 Morena Boulevard, Suite 1313 - San Diego, CA 92117

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss