Re: [lustre-discuss] Draining and replacing OSTs with larger volumes

2019-02-28 Thread Patrick Farrell
Scott,


This sounds great.  Slower, but safer.  You might want to integrate the pool 
suggestion Jongwoo made in the other recent thread in order to control 
allocations to your new OSTs (assuming you're trying to stay live during most 
of this).


- Patrick


From: Scott Wood 
Sent: Thursday, February 28, 2019 6:15:54 PM
To: Patrick Farrell; Jongwoo Han
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Draining and replacing OSTs with larger volumes

My Thanks to both Jongwoo and Patrick for your responses.

Great advice to do a practice run in a virtual environment but I'm lucky enough 
to have a physical one. I have a testbed that has the same versions of all 
software but with iscsi targets as the OSTs, rather than physical arrays, and 
not so many OSTs (8 in the testbed and 60 in production)  I do use it for test 
upgrades and fully intend to do a dry run there.

Jongwoo, to address your point, yes the rolling migration is forced, as we only 
have two new arrays, and 10 existing arrays which we can upgrade the drives in. 
 You asked about OST sizes.  OSTs are 29TB, six per array, two arrays per OSS 
pair, 5 OSS pairs.  I also expect the migrate-replace-migrate-replace to be 
painfully slow, but with the hardware at hand, it's the only option.  I was 
figuring they may take a few weeks to drain each pair of arrays.  As for the 
rolling upgrade, based on yours and Patrick's responses, we'll skip that to 
keep things cleaner.

Taking your points in to consideration, the amended plan will be:

1) Deploy a new HA pair of OSSs with arrays populated with OSTs that are twice 
the size of our current ones, but stick with the existing v2.10.3
2) Remove the 12 OSTs that are connected to my oldest HA pair of OSSs as 
described in 14.9.3, using 12 parallel migrate processes across 12 clients
3) Repopulate those arrays with the larger drives and make new 12 OSTs from 
scratch, with fresh indices, and bring them online
4) Repeat steps 2 and 3 for the four remaining original HA pairs of OSSs
5) Take a break and let the dust settle
6) At a later date, have a scheduled outage and upgrade from 2.10.3 to whatever 
the current maintenance release is

Again, you feedback is appreciated.

Cheers
Scott

From: Patrick Farrell 
Sent: Thursday, 28 February 2019 11:06 PM
To: Jongwoo Han; Scott Wood
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Draining and replacing OSTs with larger volumes

Scott,

I’d like to strongly second all of Jongwoo’s advice, particularly that about 
adding new OSTs rather than replacing existing ones, if possible.  That 
procedure is so much simpler and involves a lot less messing around “under the 
hood”.  It takes you from a complex procedure with many steps to, essentially, 
copying a bunch of data around while your file system remains up, and adding 
and removing a few OSTs at either end.

It would also be non-destructive for your existing data.  One of the scary 
things about the original proposed process is that if something goes wrong 
partway through, the original data is already gone (or at least very hard to 
get).

Regards,
- Patrick

From: lustre-discuss  on behalf of 
Jongwoo Han 
Sent: Thursday, February 28, 2019 5:36:54 AM
To: Scott Wood
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Draining and replacing OSTs with larger volumes



On Thu, Feb 28, 2019 at 11:09 AM Scott Wood 
mailto:woodystr...@hotmail.com>> wrote:
Hi folks,

Big upgrade process in the works and I had some questions.  Our current 
infrastructure has 5 HA pairs of OSSs and arrays with an HA pair of management 
and metadata servers who also share an array, all running lustre 2.10.3.  
Pretty standard stuff.  Our upgrade plan is as follows:

1) Deploy a new HA pair of OSSs with arrays populated with OSTs that are twice 
the size of our originals.
2) Follow the process in section 14.9 of the lustre docs to drain all OSTs in 
one of existing the HA pairs' arrays
3) Repopulate the first old pair of deactivated and drained arrays with new 
larger drives
4) Upgrade the offline OSSs from 2.10.3 to 2.10.latest?
5) Return them to service
6) Repeat steps 2-4 for the other 4 old HA pairs of OSSs and OSTs

I'd expect this would be doable without downtime as we'd only be taking arrays 
offline that have no objects on them, and we've added new arrays and OSSs 
before with no issues.  I have a few questions before we begin the process:

1) My interpretation of the docs is that  we OK to install them with 2.10.6 (or 
2.10.7, if it's out), as rolling upgrades withing X.Y are supported.  Is that 
correct?

In theory, rolling upgrade should work, but generally recommended upgrade 
procedure is to stop filesystem and unmount all MDS and OSS, upgrade package 
and bring them up. This will prevent human errors during repeated per-server 
upgrade.
When it is done correctly, It will take not 

Re: [lustre-discuss] Draining and replacing OSTs with larger volumes

2019-02-28 Thread Scott Wood
My Thanks to both Jongwoo and Patrick for your responses.

Great advice to do a practice run in a virtual environment but I'm lucky enough 
to have a physical one. I have a testbed that has the same versions of all 
software but with iscsi targets as the OSTs, rather than physical arrays, and 
not so many OSTs (8 in the testbed and 60 in production)  I do use it for test 
upgrades and fully intend to do a dry run there.

Jongwoo, to address your point, yes the rolling migration is forced, as we only 
have two new arrays, and 10 existing arrays which we can upgrade the drives in. 
 You asked about OST sizes.  OSTs are 29TB, six per array, two arrays per OSS 
pair, 5 OSS pairs.  I also expect the migrate-replace-migrate-replace to be 
painfully slow, but with the hardware at hand, it's the only option.  I was 
figuring they may take a few weeks to drain each pair of arrays.  As for the 
rolling upgrade, based on yours and Patrick's responses, we'll skip that to 
keep things cleaner.

Taking your points in to consideration, the amended plan will be:

1) Deploy a new HA pair of OSSs with arrays populated with OSTs that are twice 
the size of our current ones, but stick with the existing v2.10.3
2) Remove the 12 OSTs that are connected to my oldest HA pair of OSSs as 
described in 14.9.3, using 12 parallel migrate processes across 12 clients
3) Repopulate those arrays with the larger drives and make new 12 OSTs from 
scratch, with fresh indices, and bring them online
4) Repeat steps 2 and 3 for the four remaining original HA pairs of OSSs
5) Take a break and let the dust settle
6) At a later date, have a scheduled outage and upgrade from 2.10.3 to whatever 
the current maintenance release is

Again, you feedback is appreciated.

Cheers
Scott

From: Patrick Farrell 
Sent: Thursday, 28 February 2019 11:06 PM
To: Jongwoo Han; Scott Wood
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Draining and replacing OSTs with larger volumes

Scott,

I’d like to strongly second all of Jongwoo’s advice, particularly that about 
adding new OSTs rather than replacing existing ones, if possible.  That 
procedure is so much simpler and involves a lot less messing around “under the 
hood”.  It takes you from a complex procedure with many steps to, essentially, 
copying a bunch of data around while your file system remains up, and adding 
and removing a few OSTs at either end.

It would also be non-destructive for your existing data.  One of the scary 
things about the original proposed process is that if something goes wrong 
partway through, the original data is already gone (or at least very hard to 
get).

Regards,
- Patrick

From: lustre-discuss  on behalf of 
Jongwoo Han 
Sent: Thursday, February 28, 2019 5:36:54 AM
To: Scott Wood
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Draining and replacing OSTs with larger volumes



On Thu, Feb 28, 2019 at 11:09 AM Scott Wood 
mailto:woodystr...@hotmail.com>> wrote:
Hi folks,

Big upgrade process in the works and I had some questions.  Our current 
infrastructure has 5 HA pairs of OSSs and arrays with an HA pair of management 
and metadata servers who also share an array, all running lustre 2.10.3.  
Pretty standard stuff.  Our upgrade plan is as follows:

1) Deploy a new HA pair of OSSs with arrays populated with OSTs that are twice 
the size of our originals.
2) Follow the process in section 14.9 of the lustre docs to drain all OSTs in 
one of existing the HA pairs' arrays
3) Repopulate the first old pair of deactivated and drained arrays with new 
larger drives
4) Upgrade the offline OSSs from 2.10.3 to 2.10.latest?
5) Return them to service
6) Repeat steps 2-4 for the other 4 old HA pairs of OSSs and OSTs

I'd expect this would be doable without downtime as we'd only be taking arrays 
offline that have no objects on them, and we've added new arrays and OSSs 
before with no issues.  I have a few questions before we begin the process:

1) My interpretation of the docs is that  we OK to install them with 2.10.6 (or 
2.10.7, if it's out), as rolling upgrades withing X.Y are supported.  Is that 
correct?

In theory, rolling upgrade should work, but generally recommended upgrade 
procedure is to stop filesystem and unmount all MDS and OSS, upgrade package 
and bring them up. This will prevent human errors during repeated per-server 
upgrade.
When it is done correctly, It will take not more than 2 hours.

2) Until the whole process is complete, we'll have imbalanced OSTs.  I know 
that's not ideal, but is it all that big an issue

Rolling upgrade will cause imbalance, but after long run, the files will be 
assigned will be evenly distributed. No need to worry about it on one-shot 
upgrade scenario.

3) When draining the OSTs of files, section 14.9.3, point 2.a. states that the 
lfs find |lfs migrate can take multiple OSTs as args, but I thought it would be 
better to run one instance 

Re: [lustre-discuss] Suspended jobs and rebooting lustre servers

2019-02-28 Thread Andrew Elwell
On Tue, 26 Feb 2019 at 23:25, Andreas Dilger  wrote:
> I agree that having an option that creates the OSTs as inactive might be 
> helpful, though I wouldn't want that to be the default as I'd imagine it 
> would also cause problems for the majority users that wouldn't know that they 
> need to enable the OSTs after they are mounted.

> Could you file a feature request for this in Jira?
Done https://jira.whamcloud.com/browse/LU-12036
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] upgrade from 2.10.5 to 2.12.0

2019-02-28 Thread Riccardo Veraldi

Hello,

I am planning a Lustre upgrade from 2.10.5/ZFS  to 2.12.0/ZFS

any particular cavetat on this procedure ?

Can I simply upgrade the Lustre package and mount the filesystem ?

thank you

Rick


___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Due Today: Lustre Usage Survey

2019-02-28 Thread OpenSFS Administration
Dear Lustre Community,

 

Today, February 28th, is the deadline to provide your response to the Lustre
usage survey. We are looking for trends in Lustre usage to assist with
future planning on releases and will present the results at LUG.

 

Please complete this short survey ( 
https://www.surveymonkey.com/r/P8P6QL3) to make sure your organization's
voice is heard!

 

Note that all questions are optional, so it is ok to submit a partially
completed survey if you prefer not to disclose some information.

 

Best regards,

OpenSFS Administration

__

OpenSFS Administration

3855 SW 153rd Drive Beaverton, OR 97003 USA

Phone: +1 503-619-0561 | Fax: +1 503-644-6708

Twitter:
 @OpenSFS 

Email:   ad...@opensfs.org | Website:
 www.opensfs.org 

 

___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Suspended jobs and rebooting lustre servers

2019-02-28 Thread Patrick Farrell
This is very good advice, and you can also vary it to aid in removing old OSTs 
(thinking of the previous message) - simply take the old ones you wish to 
remove out of the pool, then new files will not be created there.  Makes 
migration easier.

One thing though:
Setting a default layout everywhere may be prohibitively slow for a large fs.

If you set a default layout on the root of the file system, it is automatically 
used as the default for all directories that do not have another default set.

So if you have not previously set a default layout on any directories, there is 
no need to go through the fs changing them like this.  (And perhaps if you 
have, you can find those directories and handle them manually, rather than 
setting a pool on every directory.)

- Patrick

From: lustre-discuss  on behalf of 
Jongwoo Han 
Sent: Thursday, February 28, 2019 6:09:18 AM
To: Stephane Thiell
Cc: lustre-discuss
Subject: Re: [lustre-discuss] Suspended jobs and rebooting lustre servers


My strategy for adding new OSTs on live filesystem is to define a pool with 
currently running OST and apply pool stripe (lfs setstripe -p [live-ost-pool]) 
on all existing directories. It is better when it is done at first filesystem 
creation.

After that, you can safely add new OSTs without newly created files filling in 
like flood - newly added OST will remain silently until you add them to pool.

Try failover tests with new OSTs and OSSes while it do not store files. After 
the failover/restart test is done on new OSS and OSTs, you can add new OSTs to 
the pool then they will start to store files shortly after.

If you did not create a pool, create a pool with old OSTs and

# lfs find  -type d | while read DIR ; do echo "processing :" $DIR; 
lfs setstripe -p  $DIR ; done

will mark all subdirectories on the pool, so newly added OSTs are safe from 
files coming in until these new OSTs are added to the pool.

I always expand live filesystem in this manner, not to worry about heavily 
loaded situation.

On Thu, Feb 28, 2019 at 1:02 AM Stephane Thiell 
mailto:sthi...@stanford.edu>> wrote:
On one of our filesystem, we add a few new OSTs almost every month with no 
downtime, this is very convenient. The only thing that I would recommend is to 
avoid doing that during a peak of I/Os on your filesystem (we usually do it as 
early as possible in the morning), as the added OSTs will immediately an heavy 
I/O load, likely because they are empty.

Best,

Stephane


> On Feb 22, 2019, at 2:03 PM, Andreas Dilger 
> mailto:adil...@whamcloud.com>> wrote:
>
> This is not really correct.
>
> Lustre clients can handle the addition of OSTs to a running filesystem. The 
> MGS will register the new OSTs, and the clients will be notified by the MGS 
> that the OSTs have been added, so no need to unmount the clients during this 
> process.
>
>
> Cheers, Andreas
>
> On Feb 21, 2019, at 19:23, Raj 
> mailto:rajgau...@gmail.com>> wrote:
>
>> Hello Raj,
>> It’s best and safe to unmount from all the clients and then do the upgrade. 
>> Your FS is getting more OSTs and changing conf in the existing ones, your 
>> client needs to get the new layout by remounting it.
>> Also you mentioned about client eviction, during eviction the client has to 
>> drop it’s dirty pages and all the open file descriptors in the FS will be 
>> gone.
>>
>> On Thu, Feb 21, 2019 at 12:25 PM Raj Ayyampalayam 
>> mailto:ans...@gmail.com>> wrote:
>> What can I expect to happen to the jobs that are suspended during the file 
>> system restart?
>> Will the processes holding an open file handle die when I unsuspend them 
>> after the filesystem restart?
>>
>> Thanks!
>> -Raj
>>
>>
>> On Thu, Feb 21, 2019 at 12:52 PM Colin Faber 
>> mailto:cfa...@gmail.com>> wrote:
>> Ah yes,
>>
>> If you're adding to an existing OSS, then you will need to reconfigure the 
>> file system which requires writeconf event.
>>
>> On Thu, Feb 21, 2019 at 10:00 AM Raj Ayyampalayam 
>> mailto:ans...@gmail.com>> wrote:
>> The new OST's will be added to the existing file system (the OSS nodes are 
>> already part of the filesystem), I will have to re-configure the current HA 
>> resource configuration to tell it about the 4 new OST's.
>> Our exascaler's HA monitors the individual OST and I need to re-configure 
>> the HA on the existing filesystem.
>>
>> Our vendor support has confirmed that we would have to restart the 
>> filesystem if we want to regenerate the HA configs to include the new OST's.
>>
>> Thanks,
>> -Raj
>>
>>
>> On Thu, Feb 21, 2019 at 11:23 AM Colin Faber 
>> mailto:cfa...@gmail.com>> wrote:
>> It seems to me that steps may still be missing?
>>
>> You're going to rack/stack and provision the OSS nodes with new OSTs'.
>>
>> Then you're going to introduce failover options somewhere? new osts? 
>> existing system? etc?
>>
>> If you're introducing failover with the new OST's and leaving the existing 
>> system in place, you should be able to accomplish this 

Re: [lustre-discuss] Draining and replacing OSTs with larger volumes

2019-02-28 Thread Patrick Farrell
Scott,

I’d like to strongly second all of Jongwoo’s advice, particularly that about 
adding new OSTs rather than replacing existing ones, if possible.  That 
procedure is so much simpler and involves a lot less messing around “under the 
hood”.  It takes you from a complex procedure with many steps to, essentially, 
copying a bunch of data around while your file system remains up, and adding 
and removing a few OSTs at either end.

It would also be non-destructive for your existing data.  One of the scary 
things about the original proposed process is that if something goes wrong 
partway through, the original data is already gone (or at least very hard to 
get).

Regards,
- Patrick

From: lustre-discuss  on behalf of 
Jongwoo Han 
Sent: Thursday, February 28, 2019 5:36:54 AM
To: Scott Wood
Cc: lustre-discuss@lists.lustre.org
Subject: Re: [lustre-discuss] Draining and replacing OSTs with larger volumes



On Thu, Feb 28, 2019 at 11:09 AM Scott Wood 
mailto:woodystr...@hotmail.com>> wrote:
Hi folks,

Big upgrade process in the works and I had some questions.  Our current 
infrastructure has 5 HA pairs of OSSs and arrays with an HA pair of management 
and metadata servers who also share an array, all running lustre 2.10.3.  
Pretty standard stuff.  Our upgrade plan is as follows:

1) Deploy a new HA pair of OSSs with arrays populated with OSTs that are twice 
the size of our originals.
2) Follow the process in section 14.9 of the lustre docs to drain all OSTs in 
one of existing the HA pairs' arrays
3) Repopulate the first old pair of deactivated and drained arrays with new 
larger drives
4) Upgrade the offline OSSs from 2.10.3 to 2.10.latest?
5) Return them to service
6) Repeat steps 2-4 for the other 4 old HA pairs of OSSs and OSTs

I'd expect this would be doable without downtime as we'd only be taking arrays 
offline that have no objects on them, and we've added new arrays and OSSs 
before with no issues.  I have a few questions before we begin the process:

1) My interpretation of the docs is that  we OK to install them with 2.10.6 (or 
2.10.7, if it's out), as rolling upgrades withing X.Y are supported.  Is that 
correct?

In theory, rolling upgrade should work, but generally recommended upgrade 
procedure is to stop filesystem and unmount all MDS and OSS, upgrade package 
and bring them up. This will prevent human errors during repeated per-server 
upgrade.
When it is done correctly, It will take not more than 2 hours.

2) Until the whole process is complete, we'll have imbalanced OSTs.  I know 
that's not ideal, but is it all that big an issue

Rolling upgrade will cause imbalance, but after long run, the files will be 
assigned will be evenly distributed. No need to worry about it on one-shot 
upgrade scenario.

3) When draining the OSTs of files, section 14.9.3, point 2.a. states that the 
lfs find |lfs migrate can take multiple OSTs as args, but I thought it would be 
better to run one instance of that per OST and distribute them across multiple 
clients .  Is that reasonable (and faster)?

Parallel redistribute is generally faster than one-by-one. If the MDT can 
endure scanning load, run multiple migrate processes each for against one OST
4) When the drives are replaced with bigger ones, can the original OST 
configuration files be restored to them as described in Docs section 14.9.5, or 
due the the size mismatch, will that be bad?

Since this process will treat objects as files, the configurations should go as 
same.

5) What questions should I be asking that I haven't thought of?


I do not know the size of OSTs to deal with, but I think 
migrate(empty)-replace-migrate-replace is really painful process as it will 
take long time. If circumtances allow, I suggest add all new OST arrays to OSS 
with new OST nums, migrate OST objects, deactivate and remove old OSTs.

If that all goes well, and we did upgrade the OSSs to a newer 2.10.x, we'd 
follow it up with a migration of the MGT and MDT to one of the management 
servers, upgrade the other, fail them back, upgrade the second, and rebalance 
the MDT and MGT services back across the two.  We'd expect the usual pause in 
services as those migrate but other than that, fingers crossed, should all be 
good.  Are we missing anything?


If this plan is forced, rolling migrate and upgrade should be planned 
carefully. It will be better to set up correct procedure checklist by 
practicing on a virtual environment with identical versions.

Cheers
Scott
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


--
Jongwoo Han
+82-505-227-6108
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Suspended jobs and rebooting lustre servers

2019-02-28 Thread Jongwoo Han
My strategy for adding new OSTs on live filesystem is to define a pool with
currently running OST and apply pool stripe (lfs setstripe -p
[live-ost-pool]) on all existing directories. It is better when it is done
at first filesystem creation.

After that, you can safely add new OSTs without newly created files filling
in like flood - newly added OST will remain silently until you add them to
pool.

Try failover tests with new OSTs and OSSes while it do not store files.
After the failover/restart test is done on new OSS and OSTs, you can add
new OSTs to the pool then they will start to store files shortly after.

If you did not create a pool, create a pool with old OSTs and

# lfs find  -type d | while read DIR ; do echo "processing :"
$DIR; lfs setstripe -p  $DIR ; done

will mark all subdirectories on the pool, so newly added OSTs are safe from
files coming in until these new OSTs are added to the pool.

I always expand live filesystem in this manner, not to worry about heavily
loaded situation.

On Thu, Feb 28, 2019 at 1:02 AM Stephane Thiell 
wrote:

> On one of our filesystem, we add a few new OSTs almost every month with no
> downtime, this is very convenient. The only thing that I would recommend is
> to avoid doing that during a peak of I/Os on your filesystem (we usually do
> it as early as possible in the morning), as the added OSTs will immediately
> an heavy I/O load, likely because they are empty.
>
> Best,
>
> Stephane
>
>
> > On Feb 22, 2019, at 2:03 PM, Andreas Dilger 
> wrote:
> >
> > This is not really correct.
> >
> > Lustre clients can handle the addition of OSTs to a running filesystem.
> The MGS will register the new OSTs, and the clients will be notified by the
> MGS that the OSTs have been added, so no need to unmount the clients during
> this process.
> >
> >
> > Cheers, Andreas
> >
> > On Feb 21, 2019, at 19:23, Raj  wrote:
> >
> >> Hello Raj,
> >> It’s best and safe to unmount from all the clients and then do the
> upgrade. Your FS is getting more OSTs and changing conf in the existing
> ones, your client needs to get the new layout by remounting it.
> >> Also you mentioned about client eviction, during eviction the client
> has to drop it’s dirty pages and all the open file descriptors in the FS
> will be gone.
> >>
> >> On Thu, Feb 21, 2019 at 12:25 PM Raj Ayyampalayam 
> wrote:
> >> What can I expect to happen to the jobs that are suspended during the
> file system restart?
> >> Will the processes holding an open file handle die when I unsuspend
> them after the filesystem restart?
> >>
> >> Thanks!
> >> -Raj
> >>
> >>
> >> On Thu, Feb 21, 2019 at 12:52 PM Colin Faber  wrote:
> >> Ah yes,
> >>
> >> If you're adding to an existing OSS, then you will need to reconfigure
> the file system which requires writeconf event.
> >>
> >> On Thu, Feb 21, 2019 at 10:00 AM Raj Ayyampalayam 
> wrote:
> >> The new OST's will be added to the existing file system (the OSS nodes
> are already part of the filesystem), I will have to re-configure the
> current HA resource configuration to tell it about the 4 new OST's.
> >> Our exascaler's HA monitors the individual OST and I need to
> re-configure the HA on the existing filesystem.
> >>
> >> Our vendor support has confirmed that we would have to restart the
> filesystem if we want to regenerate the HA configs to include the new OST's.
> >>
> >> Thanks,
> >> -Raj
> >>
> >>
> >> On Thu, Feb 21, 2019 at 11:23 AM Colin Faber  wrote:
> >> It seems to me that steps may still be missing?
> >>
> >> You're going to rack/stack and provision the OSS nodes with new OSTs'.
> >>
> >> Then you're going to introduce failover options somewhere? new osts?
> existing system? etc?
> >>
> >> If you're introducing failover with the new OST's and leaving the
> existing system in place, you should be able to accomplish this without
> bringing the system offline.
> >>
> >> If you're going to be introducing failover to your existing system then
> you will need to reconfigure the file system to accommodate the new
> failover settings (failover nides, etc.)
> >>
> >> -cf
> >>
> >>
> >> On Thu, Feb 21, 2019 at 9:13 AM Raj Ayyampalayam 
> wrote:
> >> Our upgrade strategy is as follows:
> >>
> >> 1) Load all disks into the storage array.
> >> 2) Create RAID pools and virtual disks.
> >> 3) Create lustre file system using mkfs.lustre command. (I still have
> to figure out all the parameters used on the existing OSTs).
> >> 4) Create mount points on all OSSs.
> >> 5) Mount the lustre OSTs.
> >> 6) Maybe rebalance the filesystem.
> >> My understanding is that the above can be done without bringing the
> filesystem down. I want to create the HA configuration (corosync and
> pacemaker) for the new OSTs. This step requires the filesystem to be down.
> I want to know what would happen to the suspended processes across the
> cluster when I bring the filesystem down to re-generate the HA configs.
> >>
> >> Thanks,
> >> -Raj
> >>
> >> On Thu, Feb 21, 2019 at 12:59 AM Colin Faber  wrote:

Re: [lustre-discuss] Draining and replacing OSTs with larger volumes

2019-02-28 Thread Jongwoo Han
On Thu, Feb 28, 2019 at 11:09 AM Scott Wood  wrote:

> Hi folks,
>
> Big upgrade process in the works and I had some questions.  Our current
> infrastructure has 5 HA pairs of OSSs and arrays with an HA pair of
> management and metadata servers who also share an array, all running lustre
> 2.10.3.  Pretty standard stuff.  Our upgrade plan is as follows:
>
> 1) Deploy a new HA pair of OSSs with arrays populated with OSTs that are
> twice the size of our originals.
> 2) Follow the process in section 14.9 of the lustre docs to drain all OSTs
> in one of existing the HA pairs' arrays
> 3) Repopulate the first old pair of deactivated and drained arrays with
> new larger drives
> 4) Upgrade the offline OSSs from 2.10.3 to 2.10.latest?
> 5) Return them to service
> 6) Repeat steps 2-4 for the other 4 old HA pairs of OSSs and OSTs
>
> I'd expect this would be doable without downtime as we'd only be taking
> arrays offline that have no objects on them, and we've added new arrays and
> OSSs before with no issues.  I have a few questions before we begin the
> process:
>
> 1) My interpretation of the docs is that  we OK to install them with
> 2.10.6 (or 2.10.7, if it's out), as rolling upgrades withing X.Y are
> supported.  Is that correct?
>

In theory, rolling upgrade should work, but generally recommended upgrade
procedure is to stop filesystem and unmount all MDS and OSS, upgrade
package and bring them up. This will prevent human errors during repeated
per-server upgrade.
When it is done correctly, It will take not more than 2 hours.


> 2) Until the whole process is complete, we'll have imbalanced OSTs.  I
> know that's not ideal, but is it all that big an issue
>

Rolling upgrade will cause imbalance, but after long run, the files will be
assigned will be evenly distributed. No need to worry about it on one-shot
upgrade scenario.


> 3) When draining the OSTs of files, section 14.9.3, point 2.a. states that
> the lfs find |lfs migrate can take multiple OSTs as args, but I thought it
> would be better to run one instance of that per OST and distribute them
> across multiple clients .  Is that reasonable (and faster)?
>

Parallel redistribute is generally faster than one-by-one. If the MDT can
endure scanning load, run multiple migrate processes each for against one
OST

> 4) When the drives are replaced with bigger ones, can the original OST
> configuration files be restored to them as described in Docs section
> 14.9.5, or due the the size mismatch, will that be bad?
>

Since this process will treat objects as files, the configurations should
go as same.


> 5) What questions should I be asking that I haven't thought of?
>
>

I do not know the size of OSTs to deal with, but I think
migrate(empty)-replace-migrate-replace is really painful process as it will
take long time. If circumtances allow, I suggest add all new OST arrays to
OSS with new OST nums, migrate OST objects, deactivate and remove old
OSTs.


> If that all goes well, and we did upgrade the OSSs to a newer 2.10.x, we'd
> follow it up with a migration of the MGT and MDT to one of the management
> servers, upgrade the other, fail them back, upgrade the second, and
> rebalance the MDT and MGT services back across the two.  We'd expect the
> usual pause in services as those migrate but other than that, fingers
> crossed, should all be good.  Are we missing anything?
>
>
If this plan is forced, rolling migrate and upgrade should be planned
carefully. It will be better to set up correct procedure checklist by
practicing on a virtual environment with identical versions.


> Cheers
> Scott
> ___
> lustre-discuss mailing list
> lustre-discuss@lists.lustre.org
> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
>


-- 
Jongwoo Han
+82-505-227-6108
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org