Re: [Lustre-discuss] Xyratex News Regarding Lustre - Press Release

2013-02-19 Thread Larry
Con!
On Feb 20, 2013 7:11 AM, Kevin Canady kevin_can...@xyratex.com wrote:

 Greetings Community!
 Today we are very excited to announce that Xyratex has purchased
Lustre® and its assets from Oracle. We intend for Lustre to remain an
open-source, community-driven file system to be promoted by our community
organizations.  We undertook the acquisition because we realize its
importance to the entire community and we want to help ensure that it will
continue to deliver for all of us over the long term.  This is critically
important to the growth and vitality of Lustre; it’s how it became what it
is today, and it’s how it will deliver the most value in the future.
Several members of the Lustre community have endorsed these plans and
voiced their support of our purchase, and contributed quotes which were
included in our announcement today.

 Special thanks to our two community leaders Hugo Falter with EOFS and
Norm Morse with OpenSFS for their support to date and into the future as we
work to further the collaboration around Lustre.  Expect to hear more soon
as we head into the 2013 Lustre User Group meeting in San Diego.
http://www.opensfs.org/events/lug13/  It should be an exciting event!

 Best regards,
 Kevin

 P. Kevin Canady
 Director, Business Development Lustre and HPC Services
 kevin_can...@xyratex.com
 O: 510-687-5475
 C: 415.505.7701

 Xyratex Advances Lustre® Initiative, Assumes Ownership of Related Assets

 Xyratex plans to offer Lustre community and ClusterStor™ users
significant value

 Havant, UK – Feb. 19, 2013 – Xyratex Ltd (Nasdaq: XRTX), a leading
provider of data storage technology, today announced it plans to advance
the global Lustre® portfolio by supporting the community-oriented
development of Lustre as an open source file system and continuing to work
in conjunction with the broader community to help chart the best path
forward for this key technology. Xyratex has recently acquired the original
Lustre trademark, logo, website and associated intellectual property from
Oracle, and will assume responsibility for providing support to Lustre
customers going forward.

 “Lustre is a powerful open source file system, and Xyratex strongly
believes that all members of the Lustre community need to continue to play
a part in the evolution of the code and the benefits it delivers over the
long term,” said Steve Barber, CEO of Xyratex. “We want to ensure that
current Lustre customers get the best possible feature roadmap and support,
and we intend to engage the entire community to advance the Lustre
technology. We also appreciate Oracle’s support of Lustre, and their
efforts to ensure the long-term success of the technology.”

 The Lustre file system, which was first released in 2003, is a
client/server based, distributed architecture designed for large-scale
compute and I/O-intensive, performance-sensitive applications. The Lustre
architecture currently powers six of the top 10 high-performance computing
(HPC) clusters in the world and more than 60 of the 100-largest HPC
installations. It has emerged as a particularly popular choice in the
meteorology, simulation, oil and gas, life science, rich media and finance
sectors.

 This purchase also gives Xyratex the opportunity to continue to leverage
Lustre and provide more value through its best-of-breed ClusterStor™ family
of scale-out HPC data storage solutions. ClusterStor delivers a new
standard in file system performance, scalability and efficiency, and brings
together what were previously discrete server, network and storage
platforms with their own separate software layers. The results are
integrated, modular, scale-out storage building blocks that enable systems
to scale both performance and capacity while aggressively reducing space,
power and administrative overhead.

 “Cray has been using Lustre as our primary parallel file system for the
past 10 years, and has deployed some of the largest and most successful
Lustre installations in the world with a variety of storage products,” said
Barry Bolding, vice president of storage and data management at Cray. “We
have recently worked with Xyratex to deploy successful Lustre installations
in the government, energy, manufacturing and academic markets with the Cray
Sonexion storage system, including the record-breaking NCSA installation
running Lustre at over 1TB/sec. This announcement is another important step
for Lustre and the OpenSFS community, and shows the promising future of the
Lustre file system in supercomputing and Big Data.”

 “Xyratex’ deep knowledge of Lustre, and ability to deploy and support it,
has been critical in helping NCSA bring the Blue Waters system into
production and making a new class of computational and data focused
petascale system usable for our scientific and engineering teams,” said Dr.
William Kramer, Blue Waters Deputy Directory at the University of Illinois'
National Center for Supercomputing Application, whose Blue Waters
supercomputer is amongst the fastest and 

Re: [Lustre-discuss] [wc-discuss] Re: Lustre 2.2 production experience

2012-06-09 Thread Larry
We have deployed 2.1.1 for several clusters, which each has hundreds of
nodes.
On Jun 10, 2012 6:05 AM, Wojciech Turek wj...@cam.ac.uk wrote:

 Thanks for a quick reply Andreas. I slightly misunderstood the lustre
 release process and thought that the next stable/production version is
 2.2

 I am then interested in the experience of people running Lustre 2.1

 Cheers

 Wojciech

 On 9 June 2012 21:52, Andreas Dilger adil...@whamcloud.com wrote:
  I think you'll find that there are not yet (m)any production deployments
 of 2.2. There are a number of production 2.1 deployments, and this is the
 current maintenance stream from Whamcloud.
 
  Cheers, Andreas
 
  On 2012-06-09, at 14:33, Wojciech Turek wj...@cam.ac.uk wrote:
 
  I am building a 1.5PB storage system which will employ Lustre as the
  main file system. The storage system will be extended at the later
  stage beyond 2PB.  I am considering using Lustre 2.2 for production
  environment. This Lustre storage system will replace our older 300TB
  system which is currently running Lustre 1.8.8. I am quite happy with
  lustre 1.8.8 however for the new system Lustre 2.2 seem to be a better
  match.  The storage system will be attached to a university wide
  cluster (800 nodes), hence there will be quite a large range of
  applications using the filesystem. Could people with production
  deployments of Lustre 2.2 share their experience please?
 
 
  --
  Wojciech Turek
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OSS1 Node issue

2012-02-21 Thread Larry
I have checked your logs, maybe there are several osts on your oss1,
there must be at least one ost is read-only, it's have no business
with permissions. running e2fsck on you ost device is recommended to
resolve the rc=-30 problem.

On Tue, Feb 21, 2012 at 4:00 PM, VIJESH EK ekvij...@gmail.com wrote:
 Dear Sir,

 Thanks for your immediate response...

 I have  checked the OST permission, it is in read write mode only  and no
 hard disk is failed in the storage console all are in on-line working
 status.
 Herewith i have attached the detailed log information , you kindly go
 through the logs and get back me

 Thanks  Regards

 VIJESH



 On Tue, Feb 21, 2012 at 1:13 PM, Larry tsr...@gmail.com wrote:

 Hi,

 Your OST becomes read-only, that's the reason. Generally, it has
 relationship with your hardware, for example, your storage is broken,
 or your ldiskfs file system is broken.
 You'd better check your storage and e2fsck the OST.

 On Tue, Feb 21, 2012 at 2:52 PM, VIJESH EK ekvij...@gmail.com wrote:
  Dear All,
 
  We have done the following changes  in the exec Nodes , still now also
  we
  are
  getting the same errors in /var/log/messages.
 
  1. We have changed the exec Nodes spool directory to local directory by
  editing the file /home/appl/sge-root/default/common/configuration and
  changes the parameter  execd_spool_dir.
 
  After changing this also the same error, i.e below mentioned error is
  coming
  in OSS1 Node. This error is generating only in the OSS1 Node.
 
  Feb  6 18:32:10 oss1 kernel: LustreError:
  9362:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  6 18:32:05 oss1 kernel: LustreError:
  9422:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  6 18:32:06 oss1 kernel: LustreError:
  9432:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  6 18:32:07 oss1 kernel: LustreError:
  9369:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  6 18:32:10 oss1 kernel: LustreError:
  9362:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
 
 
  Can u tell me how to change the Master spool directory  ?
  Is it possible to change the directory in live mode ?
 
  Kindly explain briefly, so that we can proceed for the next step..
 
 
  Thanks and Regards
 
  VIJESH
 
 
 
 
 
 
 
  On Fri, Feb 10, 2012 at 1:19 PM, Carlos Thomaz ctho...@ddn.com wrote:
 
  Hi vijesh.
 
  Are you running the SGE master spooling on lustre?!?! What about the
  exec
  nodes spooling?!
 
  I strongly recommend you to do not run the master spooling on lustre.
  And
  if possible use local spooling on local disk for the exec nodes.
 
  SGE (át. least until version 6.2u7) is known to get unstable when
  running
  the spooling on lustre.
 
  Carlos
 
  On Feb 10, 2012, at 1:18 AM, VIJESH EK ekvij...@gmail.com wrote:
 
  Dear All,
 
  Kindly get a solution for these below issue...
 
  Thanks  Regards
 
  VIJESH E K
 
 
 
  On Thu, Feb 9, 2012 at 3:26 PM, VIJESH EK ekvij...@gmail.com wrote:
 
  Dear Sir,
 
  I am getting below mentioned error messages continuously in OSS1
  Node,it
  causes that
  sge service is not running intermittently...
 
 
  Feb  5 04:03:37 oss1 kernel: LustreError:
  9193:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:03:47 oss1 kernel: LustreError:
  9164:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:03:47 oss1 kernel: LustreError:
  28420:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:03:48 oss1 kernel: LustreError:
  9266:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:03:50 oss1 kernel: LustreError:
  9200:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:03:53 oss1 kernel: LustreError:
  9230:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:03:57 oss1 kernel: LustreError:
  9212:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:04:03 oss1 kernel: LustreError:
  9262:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:04:08 oss1 kernel: LustreError:
  9162:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:04:15 oss1 kernel: LustreError:
  9271:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:04:23 oss1 kernel: LustreError:
  9191:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
  Feb  5 04:04:32 oss1 kernel: LustreError:
  9242:0:(filter_io_26.c:693:filter_commitrw_write()) error starting
  transaction: rc = -30
 
 
  The detailed log information  i have attached

[Lustre-discuss] problem in lustre lnet routing

2011-11-23 Thread Larry
Hi all,

I have a problem in setting lnet routing. The MDS and OSSes have IB
and GigE networks,  30.9.100.* for IB and 20.9.100.* for GigE. Most of
the clients have IB, too. But a few of them haven't. So I choose one
client as a lnet router. Below is the configurations:

On the MDS and OSSes,

IB: 30.9.100.*
GigE: 20.9.100.*
modprobe.conf: options lnet networks=o2ib0(ib0) routes=tcp0 30.9.0.5@o2ib0

On the router,

IB 30.9.0.5
GigE: 20.9.0.5
modprobe.conf: options lnet networks=o2ib0(ib0),tcp0(eth1)
forwarding=enabled

On the GigE client,

GigE: 20.9.0.2
modprobe.conf: options lnet networks=tcp0(eth1) routes=o2ib0 20.9.0.5@tcp0


After the lnet configured,client can lctl ping every MDS and OSSes .
For example,

client:~ # lctl ping 30.9.100.31@o2ib
12345-0@lo
12345-30.9.100.31@o2ib

where 30.9.100.31 is MDS.

But mount -t lustre 30.9.100.31@o2ib0:30.9.100.32@o2ib0:/fnfs /mnt
failed, the log says,


Nov 24 10:36:37 cn-fn02 kernel: [502743.285050] Lustre: OBD class
driver, http://wiki.whamcloud.com/
Nov 24 10:36:37 cn-fn02 kernel: [502743.285056] Lustre: Lustre
Version: 2.1.0
Nov 24 10:36:37 cn-fn02 kernel: [502743.285060] Lustre: Build
Version: RC2-g9d71fe8-PRISTINE-2.6.32.12-0.7-default
Nov 24 10:36:37 cn-fn02 kernel: [502743.287057] Lustre: Lustre LU
module (a17f6d00).
Nov 24 10:36:37 cn-fn02 kernel: [502743.358095] Lustre: Added LNI
20.9.0.2@tcp [8/256/0/180]
Nov 24 10:36:37 cn-fn02 kernel: [502743.358153] Lustre: Accept secure, port 988
Nov 24 10:36:37 cn-fn02 kernel: [502743.423409] Lustre: Lustre OSC
module (a1a9b800).
Nov 24 10:36:37 cn-fn02 kernel: [502743.438668] Lustre: Lustre LOV
module (a1b09500).
Nov 24 10:36:37 cn-fn02 kernel: [502743.460108] Lustre: Lustre client
module (a1ba9a40).
Nov 24 10:36:37 cn-fn02 kernel: [502743.480266] Lustre:
4329:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import
MGC30.9.100.31@o2ib-MGC30.9.100.31@o2ib_0 neti
d 2: select flavor null
Nov 24 10:36:37 cn-fn02 kernel: [502743.485938] Lustre:
MGC30.9.100.31@o2ib: Reactivating import
Nov 24 10:36:37 cn-fn02 kernel: [502743.517528] Lustre:
4329:0:(sec.c:1474:sptlrpc_import_sec_adapt()) import
fnfs-MDT-mdc-8801b79afc00-30.9.100.31@
o2ib netid 2: select flavor null
Nov 24 10:36:42 cn-fn02 kernel: [502748.508709] Lustre:
4401:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request
x1386324633321488 sent from fnfs-MDT00
00-mdc-8801b79afc00 to NID 20.9.100.31@tcp has timed out for sent
delay: [sent 1322102197] [real_sent 0] [current 1322102202] [deadline
5s] [delay 0s]  r
eq@88019c603c00 x1386324633321488/t0(0)
o-1-fnfs-MDT_UUID@30.9.100.31@o2ib:12/10 lens 368/512 e 0 to 1 dl
1322102202 ref 2 fl Rpc:XN//ff
ff rc 0/-1
Nov 24 10:37:07 cn-fn02 kernel: [502773.472069] Lustre:
4401:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request
x1386324633321491 sent from fnfs-MDT00
00-mdc-8801b79afc00 to NID 30.9.100.32@o2ib has timed out for slow
reply: [sent 132210] [real_sent 132210] [current 1322102227]
[deadline 5s] [de
lay 0s]  req@88019b092400 x1386324633321491/t0(0)
o-1-fnfs-MDT_UUID@30.9.100.32@o2ib:12/10 lens 368/512 e 0 to 1 dl
1322102227 ref 1 fl Rpc:XN/f
fff/ rc 0/-1
Nov 24 10:37:27 cn-fn02 kernel: [502793.442762] Lustre:
4402:0:(import.c:526:import_select_connection())
fnfs-MDT-mdc-8801b79afc00: tried all connect
ions, increasing latency to 5s
Nov 24 10:37:27 cn-fn02 kernel: [502793.442802] Lustre:
4401:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request
x1386324633321493 sent from fnfs-MDT00
00-mdc-8801b79afc00 to NID 20.9.100.31@tcp has failed due to
network error: [sent 1322102247] [real_sent 1322102247] [current
1322102247] [deadline 10s]
[delay -10s]  req@8801b68ebc00 x1386324633321493/t0(0)
o-1-fnfs-MDT_UUID@30.9.100.31@o2ib:12/10 lens 368/512 e 0 to 1 dl
1322102257 ref 1 fl Rpc:XN/
/ rc 0/-1
Nov 24 10:38:02 cn-fn02 kernel: [502828.392144] Lustre:
4401:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request
x1386324633321495 sent from fnfs-MDT00
00-mdc-8801b79afc00 to NID 30.9.100.32@o2ib has timed out for slow
reply: [sent 1322102272] [real_sent 1322102272] [current 1322102282]
[deadline 10s] [d
elay 0s]  req@88019c603c00 x1386324633321495/t0(0)
o-1-fnfs-MDT_UUID@30.9.100.32@o2ib:12/10 lens 368/512 e 0 to 1 dl
1322102282 ref 1 fl Rpc:XN/
/ rc 0/-1
Nov 24 10:38:17 cn-fn02 kernel: [502843.369501] Lustre:
4402:0:(import.c:526:import_select_connection())
fnfs-MDT-mdc-8801b79afc00: tried all connect
ions, increasing latency to 10s
Nov 24 10:38:17 cn-fn02 kernel: [502843.369561] Lustre:
4401:0:(client.c:1778:ptlrpc_expire_one_request()) @@@ Request
x1386324633321497 sent from fnfs-MDT00
00-mdc-8801b79afc00 to NID 20.9.100.31@tcp has failed due to
network error: [sent 1322102297] [real_sent 1322102297] [current
1322102297] [deadline 15s]
[delay -15s]  req@88019b082000 x1386324633321497/t0(0)

Re: [Lustre-discuss] help for endless e2fsck

2011-09-19 Thread Larry
8TB LUN is a big device, I have ever had a 3TB OST device error, whose
e2fsck consumed about 10 hours, with Lustre 1.8.1.1 and
e2fsprogs-1.41.10.sun2. Maybe you can first upgrade your e2fsprogs?

2011/9/19 enqiang zhou eqz...@gmail.com:
 It's a 8TB LUN and e2fsck have lasted for about 30 hours.I'm not sure
 if I should wait more enough time for the end.Bellow is part of
 e2fsck's log.

 ... ...
 Illegal block number passed to ext2fs_test_block_bitmap #4243581855
 for multiply claimed block map
 Illegal block number passed to ext2fs_test_block_bitmap #2363489791
 for multiply claimed block map
 Illegal block number passed to ext2fs_test_block_bitmap #4091539423
 for multiply claimed block map
 Illegal block number passed to ext2fs_test_block_bitmap #398961
 for multiply claimed block map
 Illegal block number passed to ext2fs_test_block_bitmap #1339682798
 for multiply claimed block map
 Pass 1C: Scanning directories for inodes with multiply-claimed blocks
 19:06
 Pass 1D: Reconciling multiply-claimed blocks

 ... (inode #423776, mod time Tue Oct  6 18:10:49 1981)
... (inode #405328, mod time Tue Oct  6 18:10:49 1981)
... (inode #366432, mod time Tue Oct  6 18:10:49 1981)
... (inode #349536, mod time Tue Oct  6 18:10:49 1981)
... (inode #329824, mod time Tue Oct  6 18:10:49 1981)
... (inode #312928, mod time Tue Oct  6 18:10:49 1981)
... (inode #275296, mod time Tue Oct  6 18:10:49 1981)
... (inode #238688, mod time Tue Oct  6 18:10:49 1981)
... (inode #223056, mod time Tue Oct  6 18:10:49 1981)
... (inode #220768, mod time Tue Oct  6 18:10:49 1981)
... (inode #201056, mod time Tue Oct  6 18:10:49 1981)
... (inode #184160, mod time Tue Oct  6 18:10:49 1981)
... (inode #164448, mod time Tue Oct  6 18:10:49 1981)
... (inode #146528, mod time Tue Oct  6 18:10:49 1981)
... (inode #126816, mod time Tue Oct  6 18:10:49 1981)
... (inode #109920, mod time Tue Oct  6 18:10:49 1981)
... (inode #90208, mod time Tue Oct  6 18:10:49 1981)
... (inode #74576, mod time Tue Oct  6 18:10:49 1981)
... (inode #72288, mod time Tue Oct  6 18:10:49 1981)
... (inode #35680, mod time Tue Oct  6 18:10:49 1981)
 Clone multiply-claimed blocks? yes

 Illegal block number passed to ext2fs_test_block_bitmap #3449154175
 for multiply claimed block map
 Clone multiply-claimed blocks? yes

 Illegal block number passed to ext2fs_test_block_bitmap #3449154175
 for multiply claimed block map

 I'd appreciate any suggestion anyone could give me!

 2011/9/19, Larry tsr...@gmail.com:
 you say the e2fsck enters a endless loop, maybe you don't give enough
 time for it. By the way, you'd better attach some logs

 On 9/18/11, enqiang zhou eqz...@gmail.com wrote:
 hi,all

 We experienced a serious raid problem and OST on the RAID corrupted, it
 could not be mounted.
 Dmesg showed message as bellow when I tryed to mount it as ldiskfs,

 LDISKFS-fs error (device sdd): ldiskfs_check_descriptors: Checksum for
 group
 14208 failed (51136!=40578)
 LDISKFS-fs: group descriptors corrupted!

 Then I tryed to repair it using e2fsck but entering a endless loop, e2fsck
 never stop! And I couldn't mount it as ldiskfs after I sent kill signal to
 e2fsck.
 I also tryed some advice found on list, like tune2fs -O ununit_bg
 /dev/xxx,
 then e2fsck, but none could be helpfull. Our lustre version is 1.8.1.1
 with
 e2fsprogs-1.41.10.sun2
 Can Mr Andreas Dilger give me some advice?

 Any help will be greatly appreciated. Thanks!

 Best Regards



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Where to download Lustre from since 01 Aug?

2011-08-12 Thread Larry
I believe there is no offcial version, just oracle version and
whamcloud version

On Tue, Aug 9, 2011 at 8:19 AM, Mathew Eis m...@usgs.gov wrote:
 Hi List,

 We too, got lost looking for downloads in the Oracle etherland...

 Are there any differences between the official version and the
 whamcloud version? Also, I don't see previous releases such as 1.8.4 or
 1.8.5 available, are 1.8.6 and 2.0.0 the only versions available through
 whamcloud?

 Will OpenSFS be taking over the maintenance/hosting of Lustre now that
 Oracle seems to have dropped support?

 Thanks in advance!

 --
 Mathew I Eis
 IT Specialist
 US Geological Survey

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Does the Whamcloud's lustre 1.8.6 rc1 support sles?

2011-06-11 Thread Larry
ok, I'll report it soon, thanks

On Sat, Jun 11, 2011 at 1:20 AM, Andreas Dilger adil...@whamcloud.com wrote:
 On 2011-06-10, at 9:42 AM, Larry wrote:
 I build lustre 1.8.6 rc1 on sles 10 sp2, kernel
 2.6.16.60-0.69.1_x86_64-smp today,  and get a lot of errors in
 applying ldiskfs' kernel patches. Now I'm trying to update these
 kernel patches one by one. So does this version support sles 10 sp2?

 The specific version of each supported kernel is in lustre/ChangeLog.
 It reports 2.6.16.60-0.42.8 (SLES 10) as the supported SLES kernel,
 so it shouldn't be very different than the one you have, but one never
 knows what SLES is up to (they bumped SLES 11 from 2.6.27 to 2.6.32 for SP1).

 If you need to make any serious patch changes, you should file a bugzilla
 and/or Jira bug with the updates using a clear topic like Updated ldiskfs
 patches for SLES 10 2.6.16.60-0.69.1 so that others don't need to do this
 work again.

 Cheers, Andreas
 --
 Andreas Dilger
 Principal Engineer
 Whamcloud, Inc.




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Does the Whamcloud's lustre 1.8.6 rc1 support sles?

2011-06-11 Thread Larry
Thanks, Peter
I have learned from the Oracle's changelog that they still support SLES.
Maybe I'll do some jobs to export these kernel patches to SLES and
test it in the future.

On Sat, Jun 11, 2011 at 10:01 PM, Peter Jones pjo...@whamcloud.com wrote:
 Sorry, I was in transit yesterday when this was posted or I would have
 replied sooner. You should note for the 1.8.6-wc release that Whamcloud
 is only supporting RHEL\CentOS5 servers and clients and RHEL6 clients -
 we are not routinely building and testing SLES. We have done some
 exploratory work on extending to include SLES and may add this in future
 1.8.x releases. SLES11 clients are already supported for the upcoming
 2.1 community release. In the meantime, SLES users can still use the
 equivalent Oracle 1.8.6 release (though someone from Oracle would need
 to comment on the availability of this).

 On 11-06-11 4:21 AM, Larry wrote:
 ok, I'll report it soon, thanks

 On Sat, Jun 11, 2011 at 1:20 AM, Andreas Dilgeradil...@whamcloud.com  
 wrote:
 On 2011-06-10, at 9:42 AM, Larry wrote:
 I build lustre 1.8.6 rc1 on sles 10 sp2, kernel
 2.6.16.60-0.69.1_x86_64-smp today,  and get a lot of errors in
 applying ldiskfs' kernel patches. Now I'm trying to update these
 kernel patches one by one. So does this version support sles 10 sp2?
 The specific version of each supported kernel is in lustre/ChangeLog.
 It reports 2.6.16.60-0.42.8 (SLES 10) as the supported SLES kernel,
 so it shouldn't be very different than the one you have, but one never
 knows what SLES is up to (they bumped SLES 11 from 2.6.27 to 2.6.32 for 
 SP1).

 If you need to make any serious patch changes, you should file a bugzilla
 and/or Jira bug with the updates using a clear topic like Updated ldiskfs
 patches for SLES 10 2.6.16.60-0.69.1 so that others don't need to do this
 work again.

 Cheers, Andreas
 --
 Andreas Dilger
 Principal Engineer
 Whamcloud, Inc.




 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


 --
 Peter Jones
 Whamcloud, Inc.
 www.whamcloud.com

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Does the Whamcloud's lustre 1.8.6 rc1 support sles?

2011-06-10 Thread Larry
Hi, everyone

I build lustre 1.8.6 rc1 on sles 10 sp2, kernel
2.6.16.60-0.69.1_x86_64-smp today,  and get a lot of errors in
applying ldiskfs' kernel patches. Now I'm trying to update these
kernel patches one by one. So does this version support sles 10 sp2?

Thanks a lot

Best Regards,

Larry
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] problem reading HDF files on 1.8.5 filesystem

2011-05-04 Thread Larry
try mounting the lustre filesystem with -o flock or -o localflock

On Thu, May 5, 2011 at 4:47 AM, Christopher Walker
cwal...@fas.harvard.edu wrote:
 Hello,

 We have a user who is trying to post-process HDF files in R.  Her script
 goes through a number (~2500) of files in a directory, opening and
 reading the contents.  This usually goes fine, but occasionally the
 script dies with:


 HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080:
   #000: H5F.c line 1560 in H5Fopen(): unable to open file
     major: File accessability
     minor: Unable to open file
   #001: H5F.c line 1337 in H5F_open(): unable to read superblock
     major: File accessability
     minor: Read failed
   #002: H5Fsuper.c line 542 in H5F_super_read(): truncated file
     major: File accessability
     minor: File has been truncated
 Error in hdf5load(file = myfile, load = FALSE, verbosity = 0, tidy =
 TRUE) :
   unable to open HDF file:
 /n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-00-g01.h5
 HDF5-DIAG: Error detected in HDF5 (1.9.4) thread 46944713368080:
   #000: H5F.c line 2012 in H5Fclose(): decrementing file ID failed
     major: Object atom
     minor: Unable to close file
   #001: H5I.c line 1340 in H5I_dec_ref(): can't locate ID
     major: Object atom
     minor: Unable to find atom information (already closed?)
 Error in hdf5cleanup(16778754L) : unable to close HDF file


 But this file definitely does exist -- any stat or ls command shows it
 without a problem.  Further, once I 'ls' this file, if I rerun the same
 script, it successfully reads this file, but then dies on the next one
 with the same error.  If I 'ls' the entire directory, the script runs to
 completion without a problem.  strace output shows:

 open(/n/scratch2/moorcroft_lab/nlevine/Moore_sites_final/met/LT_spinup/ms67/analy/s67-E-1628-04-00-00-g01.h5,
 O_RDONLY) = 3
 fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
 lseek(3, 0, SEEK_SET)                   = 0
 read(3, \211HDF\r\n\32\n, 8)          = 8
 read(3, \0, 1)                        = 1
 read(3,
 \0\0\0\0\10\10\0\4\0\20\0\0\0\0\0\0\0\0\0\0\0\0\0\377\377\377\377\377\377\377\377@...,
 87) = 87
 close(3)                                = 0
 write(2, HDF5-DIAG: Error detected in HDF..., 42) = 42
 etc

 which initially looks fine to me, followed by an abrupt close.

 NFS filesystems and our 1.6.7.2 filesystem have no such problems -- any
 suggestions?

 Thanks very much,
 Chris
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] e2fsck issue

2011-04-13 Thread Larry
you‘d better unmount the clients and ost/mdt first, then fsck them.

On Wed, Apr 13, 2011 at 4:29 PM, Christos Theodosiou
ctheo...@grid.auth.gr wrote:
 Hi all,

 I am trying to perform a file-system check on a mounted lustre file-system.

 The e2fsck fails with the following message:

   e2fsck -n -v --mdsdb /tmp/mdsdb /dev/msavg/lv001
 e2fsck 1.41.10.sun2 (24-Feb-2010)
 device /dev/mapper/msavg-lv001 mounted by lustre per
 /proc/fs/lustre/mds/lustrefs-MDT/mntdev
 Warning!  /dev/msavg/lv001 is mounted.
 e2fsck: MMP: device currently active while trying to open /dev/msavg/lv001

 The superblock could not be read or does not describe a correct ext2
 filesystem.  If the device is valid and it really contains an ext2
 filesystem (and not swap or ufs or something else), then the superblock
 is corrupt, and you might try running e2fsck with an alternate superblock:
     e2fsck -b 32744 device

 I tried setting -b argument but the message persists.

 Do you have any suggestions on how to proceed.

 Best regards
 Christos

 --
 Christos Theodosiou
 Scientific Computational Center
 Aristotle University
 54 124 Thessaloniki, Greece
 Tel: +30 2310 99 8988
 Fax: +30 2310 99 4309
 http://www.grid.auth.gr

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] e2fsck and related errors during recovering

2011-04-06 Thread Larry
Is it helpful updating the e2fsprogs to the newest version? I have
ever had a problem during e2fsck, after updating the e2fsprogs, it's
ok.

On Thu, Apr 7, 2011 at 2:29 AM, Andreas Dilger adil...@whamcloud.com wrote:
 Having the actual error messages makes this kind of problem much easier to 
 solve.

 At a guess, if the journal was removed by e2fsck you can re-add it with 
 tune2fs -J size=400 /dev/{mdsdev}.

 As for lfsck, if you still need to run it, you need to make sure the same 
 version of e2fsprogs is on all OSTs and MDS.

 Cheers, Andreas

 On 2011-04-06, at 1:26 AM, Werner Dilling dill...@zdv.uni-tuebingen.de 
 wrote:

 Hello,
 after a crash of our lustre system (1.6.4) we have problems repairing the 
 filesystem. Running the 1.6.4 e2fsck failed on the mds filesystem so we 
 tried with the latest 1.8 version which succeeded. But trying to mount mds 
 as ldiskfs filesystem failed with the standard error message: bad superblock 
 on 
 We tried to get more info and the file command
 file -s -L /dev/ produced ext2 filesystem instead of ext3 filesystem 
 which we got from all ost-filesystems.
 We were able to produce the mds-database which is needed to get info for lfs 
 fsck. But using this database to create the ost databases failed with the 
 error message: error getting mds_hdr (large number:8) in /tmp/msdb: Cannot 
 allocate memory ..
 So I assume the msdb is in bad shape and my question is how we can proceed. 
 I assume we have to create a correct version of the mds-filesystem and how 
 to do this is unknown. Any help and info is appreciated.

 Thanks
 w.dilling



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] persistent client re-connect failure

2011-03-21 Thread Larry
If you *only* deactivate it on mds, then you can still see the ost on
client, just not to write on it anymore.

On Mon, Mar 21, 2011 at 11:49 AM, Samuel Aparicio sapari...@bccrc.ca wrote:
 Follow up to this posting. I notice on the client that lctl device_list
 reports the following:

  0 UP mgc MGC10.9.89.51@tcp 5a76b5b6-82bf-2053-8c17-e68ffe552edc 5
   1 UP lov lustre-clilov-8100459a9c00
 6775de4c-6c29-9316-a715-3472233477d1 4
   2 UP mdc lustre-MDT-mdc-8100459a9c00
 6775de4c-6c29-9316-a715-3472233477d1 5
   3 UP osc lustre-OST-osc-8100459a9c00
 6775de4c-6c29-9316-a715-3472233477d1 5
   4 UP osc lustre-OST0001-osc-8100459a9c00
 6775de4c-6c29-9316-a715-3472233477d1 5
   5 UP osc lustre-OST0002-osc-8100459a9c00
 6775de4c-6c29-9316-a715-3472233477d1 5
   6 UP osc lustre-OST0003-osc-8100459a9c00
 6775de4c-6c29-9316-a715-3472233477d1 4
   7 UP osc lustre-OST0004-osc-8100459a9c00
 6775de4c-6c29-9316-a715-3472233477d1 5
   8 UP osc lustre-OST0005-osc-8100459a9c00
 6775de4c-6c29-9316-a715-3472233477d1 5
   9 UP osc lustre-OST0006-osc-8100459a9c00
 6775de4c-6c29-9316-a715-3472233477d1 5
  10 UP lov lustre-clilov-810c92f2b800
 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 4
  11 UP mdc lustre-MDT-mdc-810c92f2b800
 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
  12 UP osc lustre-OST-osc-810c92f2b800
 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
  13 UP osc lustre-OST0001-osc-810c92f2b800
 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
  14 UP osc lustre-OST0002-osc-810c92f2b800
 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
  15 UP osc lustre-OST0003-osc-810c92f2b800
 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 4
  16 UP osc lustre-OST0004-osc-810c92f2b800
 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
  17 UP osc lustre-OST0005-osc-810c92f2b800
 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
  18 UP osc lustre-OST0006-osc-810c92f2b800
 0ecd69f5-6793-fcb1-0e05-8851c99e5dc5 5
  19 UP lov lustre-clilov-81047a45c000
 6a3d5815-4851-31b0-9400-c8892e11dae4 4
  20 UP mdc lustre-MDT-mdc-81047a45c000
 6a3d5815-4851-31b0-9400-c8892e11dae4 5
  21 UP osc lustre-OST-osc-81047a45c000
 6a3d5815-4851-31b0-9400-c8892e11dae4 5
  22 UP osc lustre-OST0001-osc-81047a45c000
 6a3d5815-4851-31b0-9400-c8892e11dae4 5
  23 UP osc lustre-OST0002-osc-81047a45c000
 6a3d5815-4851-31b0-9400-c8892e11dae4 5
  24 UP osc lustre-OST0003-osc-81047a45c000
 6a3d5815-4851-31b0-9400-c8892e11dae4 4
  25 UP osc lustre-OST0004-osc-81047a45c000
 6a3d5815-4851-31b0-9400-c8892e11dae4 5
  26 UP osc lustre-OST0005-osc-81047a45c000
 6a3d5815-4851-31b0-9400-c8892e11dae4 5
  27 UP osc lustre-OST0006-osc-81047a45c000
 6a3d5815-4851-31b0-9400-c8892e11dae4 5

 However OST3 is non-existent, it was de-activated on the MDS - why would the
 clients think it exists?















 Professor Samuel Aparicio BM BCh PhD FRCPath
 Nan and Lorraine Robertson Chair UBC/BC Cancer Agency
 675 West 10th, Vancouver V5Z 1L3, Canada.
 office: +1 604 675 8200 lab website http://molonc.bccrc.ca

 PLEASE SUPPORT MY FUNDRAISING FOR THE RIDE TO SEATTLE AND
 THE WEEKEND TO END WOMENS CANCERS. YOU CAN DONATE AT THE LINKS BELOW
 Ride to Seattle Fundraiser
 Weekend to End Womens Cancers




 On Mar 20, 2011, at 8:41 PM, Samuel Aparicio wrote:

 I am stuck with the following issue on a client attached to a lustre system.
 we are running lustre 1.8.5
 somehow connectivity to the OST failed at some point and the mount hung.
 after unmounting and re-mounting the client attempts to reconnect.
 lctl ping shows the client to be connected and normal ping to the OSS/MGS
 servers shows connectivity.
 remounting the filesystem results in only some files being visible.
 the kernel messages are as follows:
 -
 Lustre: setting import lustre-OST0003_UUID INACTIVE by administrator request
 Lustre: lustre-OST0003-osc-8110238c7400.osc: set parameter active=0
 Lustre: Skipped 3 previous similar messages
 LustreError: 14114:0:(lov_obd.c:315:lov_connect_obd()) not connecting OSC
 ^\; administratively disabled
 Lustre: Client lustre-client has started
 LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc
 -5, returning -EIO
 LustreError: 14207:0:(file.c:995:ll_glimpse_size()) Skipped 1 previous
 similar message
 LustreError: 14207:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc
 -5, returning -EIO
 LustreError: 14686:0:(file.c:995:ll_glimpse_size()) obd_enqueue returned rc
 -5, returning -EIO
 Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) @@@ Request
 x1363662012007464 sent from lustre-OST-osc-8110238c7400 to NID
 10.9.89.21@tcp 16s ago has timed out (16s prior to deadline).
   req@810459ce4c00 x1363662012007464/t0
 o8-lustre-OST_UUID@10.9.89.21@tcp:28/4 lens 368/584 e 0 to 1 dl
 1300678232 ref 1 fl Rpc:N/0/0 rc 0/0
 Lustre: 22218:0:(client.c:1476:ptlrpc_expire_one_request()) Skipped 182
 previous similar messages
 Lustre: 22219:0:(import.c:517:import_select_connection())

[Lustre-discuss] Does lustre 1.8 stop update and maintenance?

2011-03-01 Thread Larry
Hi, all

Does lustre 1.8 stop update and maintenance? I have not seen any
updates for a long time. Only Whamcloud releases the Lustre 2.1. Does
it mean Oracle freeze the development of lustre?
___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Does lustre 1.8 stop update and maintenance?

2011-03-01 Thread Larry
Whamcloud is a great company!
I think I should do something for lustre...

On Wed, Mar 2, 2011 at 1:48 AM, Robert Read rr...@whamcloud.com wrote:
 Hi,

 I cannot comment about Oracle's plans regarding Lustre, but Whamcloud does 
 intend to continue supporting 1.8.x for some time.  You can see activity 
 related to 1.8.x (as well as 2.1) in http://jira.whamcloud.com.

 cheers,
 robert read
 Whamcloud, Inc



 On Mar 1, 2011, at 4:48 , Larry wrote:

 Hi, all

 Does lustre 1.8 stop update and maintenance? I have not seen any
 updates for a long time. Only Whamcloud releases the Lustre 2.1. Does
 it mean Oracle freeze the development of lustre?
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] OST problem

2011-02-28 Thread Larry
Hi Lucius,
lustre manual  chapter 15 tells you how to do it


On Tue, Mar 1, 2011 at 1:05 PM, Lucius lucius...@hotmail.com wrote:
 Hello everyone,

 I would like to extend a OSS, which is still in current use. I would like to
 extend it with a server which has exactly the same HW configuration, and I’d
 like to extend it in an active/active mode.
 I couldn’t find any documentation about this, as most of the examples show
 how to use failnode during formatting. However, I need to extend the
 currently working system without losing data.
 Also, tunefs.lustre examples show only the parameter configuration, but they
 won’t tell if you need to synchronize the file system before setting the How
 would the system know that on the given server identified by its unique IP,
 which OST mirrors should run?

 Thank you in advance,
 Viktor
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] An odd problem in my lustre 1.8.0

2010-12-19 Thread Larry
Dear all,

I have an odd problem today in my lustre 1.8.0. All of the OSSes and
MDS appear well. But one of client has problem. When creating a file
in OST5(one of my osts), and dd or echo something to this file, then
the process hangs, and never succeeds. for example,

client1:/home # lfs setstripe -o 5 test.txt

client1:/home # lfs getstripe test.txt
OBDS:
0: lustre-OST_UUID ACTIVE
1: lustre-OST0001_UUID ACTIVE
2: lustre-OST0002_UUID ACTIVE
3: lustre-OST0003_UUID ACTIVE
4: lustre-OST0004_UUID ACTIVE
5: lustre-OST0005_UUID ACTIVE
6: lustre-OST0006_UUID ACTIVE
7: lustre-OST0007_UUID ACTIVE
8: lustre-OST0008_UUID ACTIVE
9: lustre-OST0009_UUID ACTIVE
10: lustre-OST000a_UUID ACTIVE
11: lustre-OST000b_UUID ACTIVE
12: lustre-OST000c_UUID ACTIVE
13: lustre-OST000d_UUID ACTIVE
14: lustre-OST000e_UUID ACTIVE
15: lustre-OST000f_UUID ACTIVE
16: lustre-OST0010_UUID ACTIVE
test.txt
obdidx   objid  objidgroup
 5   158029029  0x96b54e50

client1:/home # dd if=/dev/zero of=test.txt bs=1M count=100

then the dd process hangs and never return. If I edit and save it,
then it's location changes to another OST, not OST5. for example

client1:/home # dd if=/dev/zero of=test.txt bs=1M count=100 #(ctrl-C)
1+0 records in
0+0 records out
0 bytes (0 B) copied, 173.488 seconds, 0.0 kB/s

client1:/home # vi test.txt #add something

client1:/home # lfs getstripe test.txt
OBDS:
0: lustre-OST_UUID ACTIVE
1: lustre-OST0001_UUID ACTIVE
2: lustre-OST0002_UUID ACTIVE
3: lustre-OST0003_UUID ACTIVE
4: lustre-OST0004_UUID ACTIVE
5: lustre-OST0005_UUID ACTIVE
6: lustre-OST0006_UUID ACTIVE
7: lustre-OST0007_UUID ACTIVE
8: lustre-OST0008_UUID ACTIVE
9: lustre-OST0009_UUID ACTIVE
10: lustre-OST000a_UUID ACTIVE
11: lustre-OST000b_UUID ACTIVE
12: lustre-OST000c_UUID ACTIVE
13: lustre-OST000d_UUID ACTIVE
14: lustre-OST000e_UUID ACTIVE
15: lustre-OST000f_UUID ACTIVE
16: lustre-OST0010_UUID ACTIVE
test.txt
obdidx   objid  objidgroup
 6   159122026  0x97c026a0

But both the client and the OSS seems good. By the way, other clients
and OSS have not this problem.

client1:/home # lfs check servers
lustre-MDT-mdc-810438d12c00 active.
lustre-OST000a-osc-810438d12c00 active.
lustre-OST000f-osc-810438d12c00 active.
lustre-OST000c-osc-810438d12c00 active.
lustre-OST0006-osc-810438d12c00 active.
lustre-OST000e-osc-810438d12c00 active.
lustre-OST0009-osc-810438d12c00 active.
lustre-OST-osc-810438d12c00 active.
lustre-OST000d-osc-810438d12c00 active.
lustre-OST0003-osc-810438d12c00 active.
lustre-OST0002-osc-810438d12c00 active.
lustre-OST0008-osc-810438d12c00 active.
lustre-OST000b-osc-810438d12c00 active.
lustre-OST0004-osc-810438d12c00 active.
lustre-OST0007-osc-810438d12c00 active.
lustre-OST0005-osc-810438d12c00 active.
lustre-OST0010-osc-810438d12c00 active.
lustre-OST0001-osc-810438d12c00 active.

I try it many times. The log report some error messages only once.

On client:

Dec 19 18:28:57 client1 kernel: LustreError: 11-0: an error occurred
while communicating with 12.12.71@o2ib. The ost_punch operation
failed with -107
Dec 19 18:28:57 client1 kernel: LustreError: Skipped 1 previous similar message
Dec 19 18:28:57 client1 kernel: Lustre:
lustre-OST0005-osc-810438d12c00: Connection to service
lustre-OST0005 via nid 12.12.71@o2ib was lost; in prog
ress operations using this service will wait for recovery to complete.
Dec 19 18:28:57 client1 kernel: LustreError:
4570:0:(import.c:909:ptlrpc_connect_interpret()) lustre-OST0005_UUID
went back in time (transno 189979771521 was
 previously committed, server now claims 0)!  See
https://bugzilla.lustre.org/show_bug.cgi?id=9646
Dec 19 18:28:57 client1 kernel: LustreError: 167-0: This client was
evicted by lustre-OST0005; in progress operations using this service
will fail.
Dec 19 18:28:57 client1 kernel: LustreError:
7128:0:(rw.c:192:ll_file_punch()) obd_truncate fails (-5) ino 41729130
Dec 19 18:28:57 client1 kernel: Lustre:
lustre-OST0005-osc-810438d12c00: Connection restored to service
lustre-OST0005 using nid 12.12.71@o2ib.

On OSS:

Dec 19 18:27:52 os6 kernel: LustreError:
0:0:(ldlm_lockd.c:305:waiting_locks_callback()) ### lock callback
timer expired after 101s: evicting client at 12
.12.12...@o2ib  ns: filter-lustre-OST0005_UUID lock:
810087d66200/0xae56b014db6d6d0a lrc: 3/0,0 mode: PR/PR res:
158015656/0 rrc: 2 type: EXT [0-1844
6744073709551615] (req 0-18446744073709551615) flags: 0x10020 remote:
0xe02336632642c5fc expref: 27 pid: 5333 timeout 7284896273
Dec 19 18:28:57 os6 kernel: LustreError:
5407:0:(ldlm_lib.c:1826:target_send_reply_msg()) @@@ processing error
(-107)  r...@8103dd91b400 x1343016412725286/t0 o10-?@?:0/0
lens 400/0 e 0 to 0 dl 1292754580 ref 1 fl Interpret:/0/0 rc -107/0

The MDS has no messages related with this.

I don't know 

Re: [Lustre-discuss] fsck.ext4 for device ... exited with signal 11.

2010-12-01 Thread Larry
Old version of e2fsprogs actually has bugs like this, the newer the
better, I think

On Thu, Dec 2, 2010 at 7:11 AM, Craig Prescott presc...@hpc.ufl.edu wrote:
 Andreas Dilger wrote:
 Do you have enough RAM to run e2fsck on this node?  Have you tried running 
 it under gdb to see if it can catch the sig11 and print a stack trace?

 Yup, plenty of RAM - we've got 32GB in this node.

 We've already started up fsck again using Colin's suggestion of
 e2fsprogs-1.41.12.2.  So far so good.  But if we need to fire it up
 under gdb, I guess that's what we'll do.

 Thanks,
 Craig Prescott
 UF HPC Center
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] lmt version 3 release

2010-11-02 Thread Larry
Con!
Let's have a try.

On Wed, Nov 3, 2010 at 4:12 AM, Jim Garlick garl...@llnl.gov wrote:
 Version 3 of the Lustre Monitoring Tool (LMT) is now available on gcode:

 http://code.google.com/p/lmt/

 This is a major release that hopefully will improve LMT usability.
 It has been tested with Lustre 1.8.3.  A few highlights:

  * New ltop that works directly with Cerebro and has an expanded display.
  * Auto-configuration of MySQL database (lustre config is determied on the 
 fly)
  * Improved error handling and logging (configurable)
  * New config file
  * Code improvements for maintainability

 For those upgrading from Version 2, the LMT schema has not changed,
 and the new monitor module is backwards compatible with the Version 2
 metric modules.  Upgrading consists of:

 1. Setting up the new /etc/lmt/lmt.conf config file on the LMT server
 2. Updating the lmt-server package on the LMT server and restarting cerebrod
 3. Updating the lmt-server-agent package on the Lustre servers and restarting
 cerebrod

 Please refer to the Installation wiki page on the above gcode site,
 and direct any issues to lmt-disc...@googlegroups.com and/or
 the LMT gcode issue tracker.

 Jim Garlick
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] MPI-IO / ROMIO support for Lustre

2010-11-01 Thread Larry
we use  localflock  in order to work with MPI-IO. flock may
consume more addtional resource than localflock.

On Mon, Nov 1, 2010 at 10:35 PM, Mark Dixon m.c.di...@leeds.ac.uk wrote:
 Hi,

 I'm trying to get the MPI-IO/ROMIO shipped with OpenMPI and MVAPICH2
 working with our Lustre 1.8 filesystem. Looking back at the list archives,
 3 different solutions have been offered:

 1) Disable data sieving         (change default library behaviour)
 2) Mount Lustre with localflock (flock consistent only within a node)
 3) Mount Lustre with flock      (flock consistent across cluster)

 However, it is not entirely clear which of these was considered the
 best. Could anyone who is using MPI-IO on Lustre comment which they
 picked, please?

 I *think* the May 2008 list archive indicates I should be using (3), but
 I'd feel a whole lot better about it if I knew I wasn't alone :)

 Cheers,

 Mark
 --
 -
 Mark Dixon                       Email    : m.c.di...@leeds.ac.uk
 HPC/Grid Systems Support         Tel (int): 35429
 Information Systems Services     Tel (ext): +44(0)113 343 5429
 University of Leeds, LS2 9JT, UK
 -
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Compiling lustre with snmp feature

2010-10-19 Thread Larry
Fortunately we don't need the checksum, so we haven't seen 20560 until
now. But 19528 happened several days ago, we have two choice:
1, update to the higher version, eg, 1.8.1 or 1.8.4, I'd like to
update to 1.8.4, but does that mean I should update the OFED driver
together? We try to avoid changing the OFED.
2, patch the current version with attachment 23648 and 23751 for bz 19528.

Considering the OFEd driver, we may have to patch 1.8.0 instead of updating it.

On Tue, Oct 19, 2010 at 9:03 PM, Peter Jones peter.x.jo...@oracle.com wrote:
 Hmm. My experience is that 20560 was the most disruptive issue in early
 1.8.x releases, but that was fixed in 1.8.1.1. Larry, 19528 was fixed in
 1.8.1. You can check by checking the patch and noting that the first release
 with a landed+ flag is 1.8.1. HTH

 Larry wrote:

 which critical bug does 1.8.0 have and fixed in 1.8.0.1? I know 1.8.0
 has bug #19528, but I don't know whether it fixed or not in 1.8.0.1




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Compiling lustre with snmp feature

2010-10-18 Thread Larry
which critical bug does 1.8.0 have and fixed in 1.8.0.1? I know 1.8.0
has bug #19528, but I don't know whether it fixed or not in 1.8.0.1

On Mon, Oct 18, 2010 at 10:41 PM, Brian J. Murrell
brian.murr...@oracle.com wrote:
 On Mon, 2010-10-18 at 16:37 +0200, Alfonso Pardo wrote:
 Hello,

 Hi,

 I try to compile with command:

 ./configure --with-linux=/usr/src/kernels/2.6.18-92.el5-x86_64/
 --enable-snmp

 But I get the error in some point:

 checking for register_mib... no

 You need to look in config.log and see why it's failing to find that.

 I have Centos 5.2 and lustre version 1.8.0

 1.8.0 had a subsequent 1.8.0.1 release which means that it fixed a
 critical bug.  I would strongly advise upgrading, and since you are
 going to upgrade, it might as well be to 1.8.4, the latest release where
 you will likely get more people's attention with questions and bug
 reports/fixes.

 Any package to install?

 I don't know off-hand, which is why I gave you instructions to discover
 what the problem is exactly.

 b.


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] How do you monitor your lustre?

2010-09-30 Thread Larry
the latest lmt (lmt-2.6.4-2) is updated on Sep 17, 2010. Why say moribund?

On Thu, Sep 30, 2010 at 5:46 PM, Andreas Davour dav...@pdc.kth.se wrote:

 I ask because the lmt project seem to be quite moribund. Anyone else out there
 doing something?

 /andreas
 --
 Systems Engineer
 PDC Center for High Performance Computing
 CSC School of Computer Science and Communication
 KTH Royal Institute of Technology
 SE-100 44 Stockholm, Sweden
 Phone: 087906658
 A satellite, an earring, and a dust bunny are what made America great!
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre 1.8.4 ETA

2010-08-18 Thread Larry
just git-pull from lustre repository and checkout the 1.8.4

On Tue, Aug 17, 2010 at 9:50 PM, Wojciech Turek wj...@cam.ac.uk wrote:
 Any idea when 1.8.4 will be released? Is there a source code available
 somewhere so I can try to build it myself?

 Many thanks

 Wojciech

 On 3 August 2010 17:55, Johann Lombardi johann.lomba...@oracle.com wrote:

 Hi James,

 On Tue, Aug 03, 2010 at 10:42:00AM -0600, James Robnett wrote:
  Wonderful news.  On a related topic.  Can the build scripts be
  made available (or a cleansed variant).  It's not that cumbersome
  to write one's own but if they already exist it'd be handy to re-use
  them rather than recreating them,  or at least use them as a reference.

 Our build scripts are - and have alway been - available under the build
 directory (build/{lbuild,lbuild-rhel5,...}).

 Cheers,
 Johann
 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



 --
 Wojciech Turek

 Senior System Architect

 High Performance Computing Service
 University of Cambridge
 Email: wj...@cam.ac.uk
 Tel: (+)44 1223 763517

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre::LFS + Lustre::Info (inc. lustre-info.pl) available on the CPAN

2010-07-29 Thread Larry
good job, I'll download and learn from them.

On Wed, Jul 28, 2010 at 9:38 PM, Adrian Ulrich adr...@blinkenlights.ch wrote:
 First: Sorry for the shameless self advertising, but...

 I uploaded two lustre-related modules to the CPAN:

 #1: Lustre::Info provides easy access to information located
    at /proc/fs/lustre, it also comes with a 'performance monitoring'
    script called 'lustre-info.pl'

 #2 Lustre::LFS offers IO::Dir and IO::File-like filehandles but
   with additional lustre-specific features ($dir_fh-set_stripe...)


 Examples and details:

 Lustre::Info and lustre-info.pl
 ---

 Lustre::Info provides a Perl-OO interface to lustres procfs information.

 (confusing) example code to get the blockdevice of all OSTs:

  #
  my $l = Lustre::Info-new;
  print join(\n, map( { $l-get_ost($_)-get_name.: 
 .$l-get_ost($_)-get_blockdevice } \
                       �...@{$l-get_ost_list}), '' ) if $l-is_ost;
  #

 ..output:
  $ perl test.pl
  lustre1-OST001e: /dev/md17
  lustre1-OST0016: /dev/md15
  lustre1-OST000e: /dev/md13
  lustre1-OST0006: /dev/md11

 The module also includes a script called 'lustre-info.pl' that can
 be used to gather some live performance statistics:

 Use `--ost-stats' to get a quick overview on what's going on:
 $ lustre-info.pl --ost-stats
  lustre1-OST0006 (@ /dev/md11) :  write=   5.594 MB/s, read=   0.000 MB/s, 
 create=  0.0 R/s, destroy=  0.0 R/s, setattr=  0.0 R/s, preprw=  6.0 R/s
  lustre1-OST000e (@ /dev/md13) :  write=   3.997 MB/s, read=   0.000 MB/s, 
 create=  0.0 R/s, destroy=  0.0 R/s, setattr=  0.0 R/s, preprw=  4.0 R/s
  lustre1-OST0016 (@ /dev/md15) :  write=   5.502 MB/s, read=   0.000 MB/s, 
 create=  0.0 R/s, destroy=  0.0 R/s, setattr=  0.0 R/s, preprw=  6.0 R/s
  lustre1-OST001e (@ /dev/md17) :  write=   5.905 MB/s, read=   0.000 MB/s, 
 create=  0.0 R/s, destroy=  0.0 R/s, setattr=  0.0 R/s, preprw=  6.7 R/s


 You can also get client-ost details via `--monitor=MODE'

 $ lustre-info.pl --monitor=ost --as-list  # this will only show clients where 
 read+write = 1MB/s
 client nid       | lustre1-OST0006    | lustre1-OST000e    | lustre1-OST0016 
    | lustre1-OST001e    | +++ TOTALS +++ (MB/s)
 10.201.46...@o2ib  | r=   0.0, w=   0.0 | r=   0.0, w=   0.0 | r=   0.0, w=   
 0.0 | r=   0.0, w=   1.1 | read=   0.0, write=   1.1
 10.201.47...@o2ib  | r=   0.0, w=   0.0 | r=   0.0, w=   1.2 | r=   0.0, w=   
 2.0 | r=   0.0, w=   0.0 | read=   0.0, write=   3.2


 There are many more options, checkout `lustre-info.pl --help' for details!


 Lustre::LFS::Dir and Lustre::LFS::File
 ---

 This two packages behave like IO::File and IO::Dir but both of
 them add some lustre-only features to the returned filehandle.

 Quick example:
  my $fh = Lustre::LFS::File; # $fh is a normal IO::File-like FH
  $fh-open( test) or die;
  print $fh Foo Bar!\n;
  my $stripe_info = $fh-get_stripe or die Not on a lustre filesystem?!\n;



 Keep in mind that both Lustre modules are far from being complete:
 Lustre::Info really needs some MDT support and Lustre::LFS is just a
 wrapper for /usr/bin/lfs: An XS-Version would be much better.

 But i'd love to hear some feedback if someone decides to play around
 with this modules + lustre-info.pl :-)


 Cheers,
  Adrian


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] I/O errors with NAMD

2010-07-23 Thread Larry
we have  the same problem when running namd in lustre sometimes, the
console log suggest file lock expired, but I don't know why.

On Fri, Jul 23, 2010 at 8:12 AM, Wojciech Turek wj...@cam.ac.uk wrote:
 Hi Richard,

 If the cause of the I/O errors is Lustre there will be some message in the
 logs. I am seeing similar problem with some applications that run on our
 cluster. The symptoms are always the same, just before application crashes
 with I/O error node gets evicted with a message like that:
  LustreError: 167-0: This client was evicted by ddn_data-OST000f; in
 progress operations using this service will fail.

 The OSS that mounts the OST from the above message has following line in the
 log:
 LustreError: 0:0:(ldlm_lockd.c:305:waiting_locks_callback()) ### lock
 callback timer expired after 101s: evicting client at 10.143@tcp  ns:
 filter-ddn_data-OST000f_UUID lock: 81021a84ba00/0x744b1dd44
 81e38b2 lrc: 3/0,0 mode: PR/PR res: 34959884/0 rrc: 2 type: EXT
 [0-18446744073709551615] (req 0-18446744073709551615) flags: 0x20 remote:
 0x1d34b900a905375d expref: 9 pid: 1506 timeout 8374258376

 Can you please check your logs for similar messages?

 Best regards

 Wojciech

 On 22 July 2010 23:43, Andreas Dilger andreas.dil...@oracle.com wrote:

 On 2010-07-22, at 14:59, Richard Lefebvre wrote:
  I have a problem with the Scalable molecular dynamics software NAMD. It
  write restart files once in a while. But sometime the binary write
  crashes. The when it crashes is not constant. The only constant thing is
  it happens when it writes on our Lustre file system. When it write on
  something else, it is fine. I can't seem find any errors in any of the
  /var/log/messages. Anyone had any problems with NAMD?

 Rarely has anyone complained about Lustre not providing error messages
 when there is a problem, so if there is nothing in /var/log/messages on
 either the client or the server then it is hard to know whether it is a
 Lustre problem or not...

 If possible, you could try running the application under strace (limited
 to the IO calls, or it would be much too much data) to see which system call
 the error is coming from.

 Cheers, Andreas
 --
 Andreas Dilger
 Lustre Technical Lead
 Oracle Corporation Canada Inc.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss



 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] I/O errors with NAMD

2010-07-23 Thread Larry
There are many kinds of reasons that a server evicts a client, maybe
network error, maybe ptlrpcd bug, but according to my experience, the
only chance to see the I/O error is running namd in lustre filesystem,
I can see some other evict  events sometimes, but none of them
results in I/O error. So besides the evict client, there may be
something else causing the I/O error.

On Fri, Jul 23, 2010 at 6:54 PM, Wojciech Turek wj...@cam.ac.uk wrote:
 There is a similar thread on this mailing list:
 http://groups.google.com/group/lustre-discuss-list/browse_thread/thread/afe24159554cd3ff/8b37bababf848123?lnk=gstq=I%2FO+error+on+clients#
 Also there is a bug open which reports similar problem:
 https://bugzilla.lustre.org/show_bug.cgi?id=23190



 On 23 July 2010 10:02, Larry tsr...@gmail.com wrote:

 we have  the same problem when running namd in lustre sometimes, the
 console log suggest file lock expired, but I don't know why.

 On Fri, Jul 23, 2010 at 8:12 AM, Wojciech Turek wj...@cam.ac.uk wrote:
  Hi Richard,
 
  If the cause of the I/O errors is Lustre there will be some message in
  the
  logs. I am seeing similar problem with some applications that run on our
  cluster. The symptoms are always the same, just before application
  crashes
  with I/O error node gets evicted with a message like that:
   LustreError: 167-0: This client was evicted by ddn_data-OST000f; in
  progress operations using this service will fail.
 
  The OSS that mounts the OST from the above message has following line in
  the
  log:
  LustreError: 0:0:(ldlm_lockd.c:305:waiting_locks_callback()) ### lock
  callback timer expired after 101s: evicting client at 10.143@tcp
  ns:
  filter-ddn_data-OST000f_UUID lock: 81021a84ba00/0x744b1dd44
  81e38b2 lrc: 3/0,0 mode: PR/PR res: 34959884/0 rrc: 2 type: EXT
  [0-18446744073709551615] (req 0-18446744073709551615) flags: 0x20
  remote:
  0x1d34b900a905375d expref: 9 pid: 1506 timeout 8374258376
 
  Can you please check your logs for similar messages?
 
  Best regards
 
  Wojciech
 
  On 22 July 2010 23:43, Andreas Dilger andreas.dil...@oracle.com wrote:
 
  On 2010-07-22, at 14:59, Richard Lefebvre wrote:
   I have a problem with the Scalable molecular dynamics software NAMD.
   It
   write restart files once in a while. But sometime the binary write
   crashes. The when it crashes is not constant. The only constant thing
   is
   it happens when it writes on our Lustre file system. When it write on
   something else, it is fine. I can't seem find any errors in any of
   the
   /var/log/messages. Anyone had any problems with NAMD?
 
  Rarely has anyone complained about Lustre not providing error messages
  when there is a problem, so if there is nothing in /var/log/messages on
  either the client or the server then it is hard to know whether it is a
  Lustre problem or not...
 
  If possible, you could try running the application under strace
  (limited
  to the IO calls, or it would be much too much data) to see which system
  call
  the error is coming from.
 
  Cheers, Andreas
  --
  Andreas Dilger
  Lustre Technical Lead
  Oracle Corporation Canada Inc.
 
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 
 
  ___
  Lustre-discuss mailing list
  Lustre-discuss@lists.lustre.org
  http://lists.lustre.org/mailman/listinfo/lustre-discuss
 
 




 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] NFS Export Issues (RESOLVED)

2010-07-22 Thread Larry
After my installation of OS, the first thing is to turn off the
SElinux, it seems no use, and often makes a lot of trouble...

On Wed, Jul 21, 2010 at 10:24 PM, William Olson
lustre_ad...@reachone.com wrote:

 When it comes to inexplicable permission problems, have you checked if
 SELinux is turned off on the NFS server?


 I knew if I was patient somebody would point out the simple answer and
 make me look like an idiot!! hahahaha

 THANK YOU!!

 So, set selinux into permissive mode, adjusted iptables(wasn't part of
 the original problem, but I didn't save my rules before rebooting) and
 guess what?.. It works. :)
 YAY!

 I think my sysadmin badge needs to be revoked for a day...
 Regards,

 Daniel.

 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss


 ___
 Lustre-discuss mailing list
 Lustre-discuss@lists.lustre.org
 http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Permanently delete OST

2010-01-29 Thread Larry Brown

 https://bugzilla.lustre.org/show_bug.cgi?id=18329
 
 According to dev team resolved and closed 
 
 Regards
 Heiko
 ___

That shows the version being 1.6.6 but also that it was resolved 1/22/10
which is very recent.  This bug must be in the 1.8 tree (as that is my
current version) and was not noticed?  Would this fix automatically be
applied to the 1.8 tree procedurally or does a separate bug have to be
reported?  Also, if it has been fixed in 1.8 is there a scheduled
release?

Thanks, as this will be nice to be able to tell what disk capacity is
available.

Larry



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Permanently delete OST

2010-01-29 Thread Larry Brown

  https://bugzilla.lustre.org/show_bug.cgi?id=18329
  
  According to dev team resolved and closed 
  
  Regards
  Heiko

By the way, I got a hold of the source code and the patch listed for
that bug listed in Heiko's message.  I introduced the changes to the
lfs.c file and now I no longer get the error code.  That patch does work
but has not been applied to 1.8.1.1 in the released rpms.

Still no option for permanent removal, but at least lfs df works again.

Larry




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Permanently delete OST

2010-01-28 Thread Larry Brown
On Tue, 2010-01-26 at 11:29 -0500, Brian J. Murrell wrote:

 Yes, I looked at that bug yesterday.  I don't see anything in there that
 provides any sort of --perm argument to completely purge an OST from
 the configuration.
 
 b.

What is the latest on this?  Also, after a system reboot, at the point
the first permanently inactive OST would be listed in lfs df the
output stops with the line error: llapi_obd_statfs failed: Bad address
(-14).  This looks like a bug mentioned earlier in the 1.6.6 version of
Lustre.  I am running 1.8.1.1.

Larry

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


[Lustre-discuss] Permanently delete OST

2010-01-21 Thread Larry Brown
For some reason this eludes my searches through the archives.  I keep
seeing documentation on how to deactive the OST and copy files off it.
In the manual it states removal of the OST permanently can be
accomplished with:

lctl conf_param OST name.osc.active=0

I have run this and in the proc info I now see active is now set to 0.
However it still exists in proc and when running lfs df it shows there
as an inactive device. 

I'm wanting it removed from existence.  The OST no longer physically
exists yet I am haunted by its persistence on the MGS/MDT.

TIA,

Larry

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] stata_mv mv_stata which is better?

2008-08-07 Thread Larry McIntosh

Yes Brock -- as Mike has mentioned we also took this doc and provided
this for our TACC customer:

http://www.tacc.utexas.edu/resources/hpcsystems/

where we put 72 x4500s in place with this configuration with them.

In addition, Sun's recent Linux HPC Software

http://www.sun.com/software/products/hpcsoftware/index.xml

has the mv_sata driver and the SW configurations needed to put the
x4500 together as an OSS one can build up upon and further configure
with the SW RAID patches also included.

HTH

Mike Berg wrote On 08/07/08 11:11,:

Brock,

It is recommended that mv_sata is used on the x4500.

It has been a while since I have built this up myself and a few Lustre  
releases back but I do understand the pain. I hope that with Lustre  
1.6.5.1 on RHEL 4.5 you can just build mv_sata against the provided  
Lustre kernel and alias it accordingly in modprobe.conf and create a  
new initrd, then update grub. I don't have gear handy to give it try  
unfortunately. Please let me know your experiences with this if you  
pursue it.

Enclosed is a somewhat dated document on what we have found to be the  
best configuration of the x4500 for use with Lustre. Ignore the N1SM  
parts. We optimized for performance and RAS with some sacrifices on  
capacity. Hopefully this is a useful reference.


Regards,
Mike Berg
Sr. Lustre Solutions Engineer
Sun Microsystems, Inc.
Office/Fax: (303) 547-3491
E-mail:  [EMAIL PROTECTED]

  





On Aug 6, 2008, at 1:48 PM, Brock Palen wrote:

  

Is it still worth the effort to try and build mv_stata?  when working
with an x4500?
stata_mv from RHEL4 does not appear to show some of the stability
problems discussed online before.

I am curious because the build system sun provides with the driver
does not play nicely with the lustre kernel source packaging.

If it is worth all the pain, if others have already figured it out.
Any help would be grateful.


Brock Palen
www.umich.edu/~brockp
Center for Advanced Computing
[EMAIL PROTECTED]
(734)936-1985



___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss




___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Solution Delivery

2008-03-27 Thread Larry McIntosh

Brennan,

One needs SAM-QFS Linux client on a given node that is a lustre client. 
There
can be multiples of these within a cluster. 

This client(s) can then move (copy data) between SAM and Lustre.

This has been put in place at a number of places for Sun customers such
as DKRZ in Germany.

This is a very basic solution today and we would like to see a tighter
integration
between the Lustre and SAM development efforts to have this as a more
robust offering and
SAM SW support for newer Linux Kernels.

It is my understanding that this is underway.  However for the time
being it will require
the aforementioned type of client.

Larry

Peter Bojanic wrote On 03/26/08 15:48,:

Hi Brennan,

Larry McIntosh of our Linux HPC team can advise you regarding our  
Lustre/SAM integration options.

Cheers,
Bojanic

On 26-Mar-08, at 17:47, Brennan [EMAIL PROTECTED] wrote:

  

What is the process for integrating a Lustre+SAMFS solution into an  
existing customer environment. The plan is to have CRS build the  
Lustre component, but Lustre and SAMFS will need to configured and  
integrated into the customer computing environment. I am very  
familiar  with the SAMFS integration, but not Lustre integration.

Do we have resources in PS to provide the integration? Is this done  
by the CFS organization?

Also, a small scale benchmark of the solution may be required. Which  
Benchmark center could provide Lustre support?

Thanks,

Jim Brennan
Digital Media Systems
Sun Systems Group
Universal City, CA
(310) 901-86777


___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss
  

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss


Re: [Lustre-discuss] Lustre Solution Delivery

2008-03-27 Thread Larry McIntosh

Brennan,

One needs SAM-QFS Linux client on a given node that is a lustre client. 
There
can be multiples of these within a cluster. 

This client(s) can then move (copy data) between SAM and Lustre.

This has been put in place at a number of places for Sun customers such
as DKRZ in Germany.

This is a very basic solution today and we would like to see a tighter
integration
between the Lustre and SAM development efforts to have this as a more
robust offering and
SAM SW support for newer Linux Kernels.

It is my understanding that this is underway.  However for the time
being it will require
the aforementioned type of client.

Larry


Brennan wrote On 03/26/08 14:47,:

What is the process for integrating a Lustre+SAMFS solution into an  
existing customer environment. The plan is to have CRS build the  
Lustre component, but Lustre and SAMFS will need to configured and  
integrated into the customer computing environment. I am very  
familiar  with the SAMFS integration, but not Lustre integration.

Do we have resources in PS to provide the integration? Is this done  
by the CFS organization?

Also, a small scale benchmark of the solution may be required. Which  
Benchmark center could provide Lustre support?

Thanks,

Jim Brennan
Digital Media Systems
Sun Systems Group
Universal City, CA
(310) 901-86777
  

___
Lustre-discuss mailing list
Lustre-discuss@lists.lustre.org
http://lists.lustre.org/mailman/listinfo/lustre-discuss