Re: [lustre-discuss] Expanding a zfsonlinux OST pool

2015-12-07 Thread Bob Ball
Found this URL that outlines the procedure, and specifically mentions 
some details in regard to zfsonlinux.

http://blog.ociru.net/2013/09/25/let-your-zfs-extend

bob


On 11/23/2015 4:35 PM, Faaland, Olaf P. wrote:

Hello Bob,

We did something similar - our MDS's used zpools based on spinning 
disks in JBODs and we switched to SSDs without bringing the filesystem 
down, using ZFS to replicate data.  It worked great for us.


How are your pools organized (ie what does zpool status show)? There 
might be options that are more or less risky, or take more or less 
time, depending on how zfs is using the disks.


Also, how often are disks failing and how long does a replacement take 
to resilver, with your current disks?


-Olaf


*From:* Bob Ball [b...@umich.edu]
*Sent:* Monday, November 23, 2015 12:22 PM
*To:* Faaland, Olaf P.; Morrone, Chris
*Cc:* Bob Ball
*Subject:* Expanding a zfsonlinux OST pool

Hi,

We have some zfsonlinux pools in use with Lustre 2.7 that use some 
older disks, and we are rapidly running out of spares for those.  What 
we would _like_ to do, if possible, is replace all of those 750GB 
disks in an OST, one at a time with re-silver between, with 1TB disks, 
then expand the OST when the last is complete to utilize the larger 
space and the more reliable disks.


Is this going to work?  One of us here found the following:

According to the Oracle docs, a pool can autoexpand if you set it to
do so.  I think the default must be off because the one I checked is
off (but that does indicate support of the feature in the linux
release also).

http://docs.oracle.com/cd/E19253-01/819-5461/githb/index.html

[root@umdist02 ~]# zpool get autoexpand ost-006
NAME PROPERTYVALUE   SOURCE
ost-006  autoexpand  off default

We are using zfsonlinux version 0.6.4.2.  Can we follow the procedures outlined 
in the oracle doc using zfsonlinux?

I guess my initial question assumed the expansion would not happen until the 
last disk is added and re-silvered, but the document indicates this is not 
really necessary?

Thanks,
bob




___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre 2.5.x on Ubuntu 14.04

2015-12-07 Thread Anjana Kar

The compile fails on a 3.13 kernel with errors in libcfs.
Are patches available to fix these errors?

~/lustre-release/libcfs/include/libcfs/linux/linux-prim.h:100:1:
error: unknown type name ‘read_proc_t’
 typedef read_proc_t cfs_read_proc_t;

~/lustre-release/libcfs/include/libcfs/params_tree.h:85:17:
error: dereferencing pointer to incomplete type
  spin_lock(&(dp)->pde_unload_lock);


Thanks,
-Anjana Kar
 Pittsburgh Supercomputing Center
 k...@psc.edu


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] RE No free catalog slots for log ( Lustre 2.5.3 & Robinhood 2.5.3 )

2015-12-07 Thread Alexander Boyko
>
> By the way, are the llog files you mentioned virtual or real? if they are
> real, where are they located? Need I clean them manually ?

They are real, the location is O/1/...
 lustre/utils/llog_reader ./changelog_catalog.dmp
rec #1 type=1064553b len=64
Header size : 8192
Time : Mon Dec  7 15:44:37 2015
Number of records: 1
Target uuid :
---
#01 (064)ogen=0 name=0x8:1
...
I`ve dump and check file, location base at name from record.
debugfs:  dump O/1/d8/8 plain.llog
lustre/utils/llog_reader ./plain.llog
rec #1 type=1066 len=96 offset 8192
Header size : 8192
Time : Mon Dec  7 15:46:40 2015
Number of records: 1
Target uuid :
---
#01 (096)changelog record id:0x0 cr_flags:0x1000 cr_type:CREAT(0x1)
Looks like O/1/  for llog files only.


On Mon, Dec 7, 2015 at 4:55 AM, wanglu  wrote:

> Hi Alexander,
>
> Before I recieved this reply, I deregistered the cl1 user. It took a very
> long time, and I am not sure if it successfully finished or not since the
> server crashed once the next morning.
> Then, I  moved the old changelog_catalog file, and created  a zero
> changelog_user file instead.
> This is what I got from the old changelog_catalog file.
> # ls -l /tmp/changelog.dmp
> -rw-r--r-- 1 root root 4153280 Dec  6 06:54 /tmp/changelog.dmp
> # llog_reader changelog.dmp |grep "type=1064553b" |wc -l
> 63432
> This number is smaller than 64768, I am not sure if it is related to the
> unfinished deregisteration or not.
>
> The first record number is 1, the last record number of is 64767. I think
> there maybe some skipped record numbers:
> # llog_reader changelog.dmp |grep "type=1064553b" |head -n 1
> rec #1 type=1064553b len=64
> # llog_reader changelog.dmp |grep "type=1064553b" |tail -n 1
> rec #64767 type=1064553b len=64
> # llog_reader changelog.dmp |grep "^rec" | grep -v "type=1064553b"
> return 0 lines.
>
> By the way, are the llog files you mentioned virtual or real? if they are
> real, where are they located? Need I clean them manually ?
>
> Thanks,
> Lu,Wang
> *From:* Alexander Boyko 
> *Date:* 2015-12-04 21:36
> *To:* wanglu ; lustre-discuss
> 
> *Subject:* RE [lustre-discuss] No free catalog slots for log ( Lustre
> 2.5.3 & Robinhood 2.5.3 )
>
>> Here are 4 questions which we cannot find answers in LU-1586:
>>
>> 1.   According to Andres?s reply, there should some unconsumed
>> changelog files on our MDT, and these files have taken all the space (file
>> quotas?) Lustre gives to changelog. With Lustre 2.1, these files are under
>> OBJECTS directory and can be listed in ldiskfs mode. In our case, with
>> Lustre 2.5.3, there is no OBJECTS directory can be found. In this case, how
>> can we monitor the situation before the unconsumed changelogs takes up all
>> the disk space?
>>
> The changelog base on one catalog file and a plain llog files. Catalog
> stores limited number of records about 64768. A catalog record size is 64
> byte. Each record has information about plain llog file. A plain llog file
> stores records about IO operation. A number of records at the plain llog
> file is about 64768 with different record size. So changelog could store
> 64768^2 IO operations and it occupy filesystem space. The error "no free
> catalog slots" is happened when changelog catalog doesn`t have a slot to
> store a record about new plain lllog. All slots are filled or internal
> changelog markers became crazy and internal logic don`t work.
> To be closer to the root cause, you need to dump a changelog catalog and
> check bitmap. Is there free slots? Something like
>
> debugfs -R "dump changelog_catalog changelog_catalog.dmp" /dev/md55 &&
> used=`llog_reader changelog_catalog.dmp | grep "type=1064553b" | wc -l`
>
> 2.   Why there are so many unconsumed changelogs? Could it related to
>> our frequent remount of MDT( abort_recovery mode )?
>>
> umount operation create half empty plain llog file. And changelog_clear
> can`t remove it, if all slots is freed. Only new mount can remove that
> file. It could be related or not.
>
>
>
>> 3.   When we remount the MDT, robinhood is still running. Why robinhood
>> can not consume those old changelogs after MDT service is recovered?
>> 4.   Why there is a huge difference between current index(4199610352 )
>> and cl1(49035933) index?
>>
>> Thank you for your time and help !
>>
>> Wang,Lu
>>
>
> --
> Alexander Boyko
> Seagate
> www.seagate.com
>



-- 
Alexander Boyko
Seagate
www.seagate.com
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.7 deployment issues

2015-12-07 Thread Christopher J. Morrone

On 12/04/2015 10:46 AM, Mohr Jr, Richard Frank (Rick Mohr) wrote:


The balance between features and stability is a complex topic, but it is one 
that Lustre developers care about.  For a recent thread on this topic, take a 
look at:

http://lists.lustre.org/pipermail/lustre-devel-lustre.org/2015-November/002767.html


Another link that might me helpful:

http://wiki.lustre.org/Retired_Release_Terminology

Chris

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.7 deployment issues

2015-12-07 Thread Mohr Jr, Richard Frank (Rick Mohr)

> On Dec 7, 2015, at 2:25 PM, Christopher J. Morrone  wrote:
> 
> Another link that might me helpful:
> 
> http://wiki.lustre.org/Retired_Release_Terminology

A perfect explanation.  Wish I had known about that link :-)

--
Rick Mohr
Senior HPC System Administrator
National Institute for Computational Sciences
http://www.nics.tennessee.edu

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org