Re: [gpfsug-discuss] mmapplypolicy didn't migrate everything it should have - why not?

2017-04-17 Thread Buterbaugh, Kevin L
Hi Marc, Alex, all,

Thank you for the responses.  To answer Alex’s questions first … the full 
command line I used (except for some stuff I’m redacting but you don’t need the 
exact details anyway) was:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g  -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N 
some,list,of,NSD,server,nodes

And yes, it printed out the very normal, “Hey, I migrated all 1.8 million files 
I said I would successfully, so I’m done here” message:

[I] A total of 1869469 files have been migrated, deleted or processed by an 
EXTERNAL EXEC/script;
   0 'skipped' files and/or errors.

Marc - I ran what you suggest in your response below - section 3a.  The output 
of a “test” mmapplypolicy and mmdf was very consistent.  Therefore, I’m moving 
on to 3b and running against the full filesystem again … the only difference 
between the command line above and what I’m doing now is that I’m running with 
“-L 2” this time around.  I’m not fond of doing this during the week but I need 
to figure out what’s going on and I *really* need to get some stuff moved from 
my “data” pool to my “capacity” pool.

I will respond back on the list again where there’s something to report.  
Thanks again, all…

Kevin

On Apr 17, 2017, at 3:11 PM, Marc A Kaplan 
> wrote:

Kevin,

1. Running with both fairly simple rules so that you migrate "in both 
directions" is fine.  It was designed to do that!

2. Glad you understand the logic of "rules hit" vs "files chosen".

3. To begin to understand "what the hxxx is going on" (as our fearless leader 
liked to say before he was in charge ;-) ) I suggest:

(a) Run mmapplypolicy on directory of just a few files  `mmapplypolicy 
/gpfs23/test-directory -I test ...` and check that the
[I] ... Current data pool utilization
message is consistent with the output of `mmdf gpfs23`.

They should be, but if they're not, that's a weird problem right there since 
they're supposed to be looking at the same metadata!

You can do this anytime, should complete almost instantly...

(b) When time and resources permit, re-run mmapplypolicy on the full FS with 
your desired migration policy.
Again, do the "Current", "Chosen" and "Predicted" messages make sense, and "add 
up"?
Do the file counts seem reasonable, considering that you recently did 
migrations/deletions that should have changed the counts compared to previous 
runs
of mmapplypolicy?  If you just want to look and not actually change anything, 
use `-I test` which will skip the migration steps.  If you want to see the list 
of files chosen

(c) If you continue to see significant discrepancies between mmapplypolicy and 
mmdf, let us know.

(d) Also at some point you may consider running mmrestripefs with options to 
make sure every file has its data blocks where they are supposed to be and is 
replicated
as you have specified.

Let's see where those steps take us...

-- marc of Spectrum Scale (né GPFS)



From:"Buterbaugh, Kevin L" 
>
To:gpfsug main discussion list 
>
Date:04/17/2017 11:25 AM
Subject:Re: [gpfsug-discuss] mmapplypolicy didn't migrate everything it 
shouldhave - why not?
Sent by:
gpfsug-discuss-boun...@spectrumscale.org




Hi Marc,

I do understand what you’re saying about mmapplypolicy deciding it only needed 
to move ~1.8 million files to fill the capacity pool to ~98% full.  However, it 
is now more than 24 hours since the mmapplypolicy finished “successfully” and:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd   58.2T   35 No   Yes  29.66T ( 51%)  
  64.16G ( 0%)
eon35Dnsd   58.2T   35 No   Yes  29.66T ( 51%)  
  64.61G ( 0%)
-  
---
(pool total)   116.4T59.33T ( 51%)  
  128.8G ( 0%)

And yes, I did run the mmapplypolicy with “-I yes” … here’s the partially 
redacted command line:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g  -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N 
some,list,of,NSD,server,nodes

And here’s that policy file:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE (access_age < 14)

The one thing that has changed is that formerly I only ran the migration in one 
direction at a time … i.e. I used to have those two rules in 

Re: [gpfsug-discuss] mmapplypolicy didn't migrate everything it should have - why not?

2017-04-17 Thread Marc A Kaplan
Oops...  If you want to see the list of what would be migrated  '-I test 
-L 2'   If you want to migrate and see each file migrated '-I yes -L 2'

I don't recommend -L 4 or higher, unless you want to see the files that do 
not match your rules.
-L 3 will show you all the files that match the rules, including those 
that are NOT chosen for migration.  See the command gu




___
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


Re: [gpfsug-discuss] mmapplypolicy didn't migrate everything it should have - why not?

2017-04-17 Thread Marc A Kaplan
Kevin,

1. Running with both fairly simple rules so that you migrate "in both 
directions" is fine.  It was designed to do that!

2. Glad you understand the logic of "rules hit" vs "files chosen". 

3. To begin to understand "what the hxxx is going on" (as our fearless 
leader liked to say before he was in charge ;-) ) I suggest:

(a) Run mmapplypolicy on directory of just a few files  `mmapplypolicy 
/gpfs23/test-directory -I test ...` and check that the 
[I] ... Current data pool utilization 
message is consistent with the output of `mmdf gpfs23`. 

They should be, but if they're not, that's a weird problem right there 
since they're supposed to be looking at the same metadata!

You can do this anytime, should complete almost instantly...

(b) When time and resources permit, re-run mmapplypolicy on the full FS 
with your desired migration policy.
Again, do the "Current", "Chosen" and "Predicted" messages make sense, and 
"add up"?
Do the file counts seem reasonable, considering that you recently did 
migrations/deletions that should have changed the counts compared to 
previous runs
of mmapplypolicy?  If you just want to look and not actually change 
anything, use `-I test` which will skip the migration steps.  If you want 
to see the list of files chosen

(c) If you continue to see significant discrepancies between mmapplypolicy 
and mmdf, let us know.

(d) Also at some point you may consider running mmrestripefs with options 
to make sure every file has its data blocks where they are supposed to be 
and is replicated
as you have specified.

Let's see where those steps take us...

-- marc of Spectrum Scale (né GPFS)



From:   "Buterbaugh, Kevin L" 
To: gpfsug main discussion list 
Date:   04/17/2017 11:25 AM
Subject:Re: [gpfsug-discuss] mmapplypolicy didn't migrate 
everything it shouldhave - why not?
Sent by:gpfsug-discuss-boun...@spectrumscale.org



Hi Marc, 

I do understand what you’re saying about mmapplypolicy deciding it only 
needed to move ~1.8 million files to fill the capacity pool to ~98% full. 
However, it is now more than 24 hours since the mmapplypolicy finished 
“successfully” and:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 
TB)
eon35Ansd   58.2T   35 No   Yes  29.66T ( 51%) 
   64.16G ( 0%) 
eon35Dnsd   58.2T   35 No   Yes  29.66T ( 51%) 
   64.61G ( 0%) 
-  
---
(pool total)   116.4T59.33T ( 51%) 
   128.8G ( 0%)

And yes, I did run the mmapplypolicy with “-I yes” … here’s the partially 
redacted command line:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g  -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N 
some,list,of,NSD,server,nodes

And here’s that policy file:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE (access_age < 14)

The one thing that has changed is that formerly I only ran the migration 
in one direction at a time … i.e. I used to have those two rules in two 
separate files and would run an mmapplypolicy using the OldStuff rule the 
1st weekend of the month and run the other rule the other weekends of the 
month.  This is the 1st weekend that I attempted to run an mmapplypolicy 
that did both at the same time.  Did I mess something up with that?

I have not run it again yet because we also run migrations on the other 
filesystem that we are still in the process of migrating off of.  So 
gpfs23 goes 1st and as soon as it’s done the other filesystem migration 
kicks off.  I don’t like to run two migrations simultaneously if at all 
possible.  The 2nd migration ran until this morning, when it was 
unfortunately terminated by a network switch crash that has also had me 
tied up all morning until now.  :-(

And yes, there is something else going on … well, was going on - the 
network switch crash killed this too … I have been running an rsync on one 
particular ~80TB directory tree from the old filesystem to gpfs23.  I 
understand that the migration wouldn’t know about those files and that’s 
fine … I just don’t understand why mmapplypolicy said it was going to fill 
the capacity pool to 98% but didn’t do it … wait, mmapplypolicy hasn’t 
gone into politics, has it?!?  ;-)

Thanks - and again, if I should open a PMR for this please let me know...

Kevin

On Apr 16, 2017, at 2:15 PM, Marc A Kaplan  wrote:

Let's look at how mmapplypolicy does the reckoning.
Before it starts, it see your pools as:

[I] GPFS Current Data Pool 

Re: [gpfsug-discuss] mmapplypolicy didn't migrate everything it should have - why not?

2017-04-17 Thread Alex Chekholko

Hi Kevin,

IMHO, safe to just run it again.

You can also run it with '-I test -L 6' again and look through the 
output.  But I don't think you can "break" anything by having it scan 
and/or move data.


Can you post the full command line that you use to run it?

The behavior you describe is odd; you say it prints out the "files 
migrated successfully" message, but the files didn't actually get 
migrated?  Turn up the debug param and have it print every file as it is 
moving it or something.


Regards,
Alex

On 4/17/17 8:24 AM, Buterbaugh, Kevin L wrote:

Hi Marc,

I do understand what you’re saying about mmapplypolicy deciding it only
needed to move ~1.8 million files to fill the capacity pool to ~98%
full.  However, it is now more than 24 hours since the mmapplypolicy
finished “successfully” and:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd   58.2T   35 No   Yes  29.66T (
51%)64.16G ( 0%)
eon35Dnsd   58.2T   35 No   Yes  29.66T (
51%)64.61G ( 0%)
-
 ---
(pool total)   116.4T59.33T (
51%)128.8G ( 0%)

And yes, I did run the mmapplypolicy with “-I yes” … here’s the
partially redacted command line:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g  -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy
-N some,list,of,NSD,server,nodes

And here’s that policy file:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE (access_age < 14)

The one thing that has changed is that formerly I only ran the migration
in one direction at a time … i.e. I used to have those two rules in two
separate files and would run an mmapplypolicy using the OldStuff rule
the 1st weekend of the month and run the other rule the other weekends
of the month.  This is the 1st weekend that I attempted to run an
mmapplypolicy that did both at the same time.  Did I mess something up
with that?

I have not run it again yet because we also run migrations on the other
filesystem that we are still in the process of migrating off of.  So
gpfs23 goes 1st and as soon as it’s done the other filesystem migration
kicks off.  I don’t like to run two migrations simultaneously if at all
possible.  The 2nd migration ran until this morning, when it was
unfortunately terminated by a network switch crash that has also had me
tied up all morning until now.  :-(

And yes, there is something else going on … well, was going on - the
network switch crash killed this too … I have been running an rsync on
one particular ~80TB directory tree from the old filesystem to gpfs23.
 I understand that the migration wouldn’t know about those files and
that’s fine … I just don’t understand why mmapplypolicy said it was
going to fill the capacity pool to 98% but didn’t do it … wait,
mmapplypolicy hasn’t gone into politics, has it?!?  ;-)

Thanks - and again, if I should open a PMR for this please let me know...

Kevin


On Apr 16, 2017, at 2:15 PM, Marc A Kaplan > wrote:

Let's look at how mmapplypolicy does the reckoning.
Before it starts, it see your pools as:

[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name   KB_OccupiedKB_Total  Percent_Occupied
gpfs23capacity  55365193728124983549952 44.297984614%
gpfs23data 166747037696343753326592 48.507759721%
system0   0
 0.0% (no user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.

Your rule says you want to migrate data to gpfs23capacity, up to 98% full:

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98) WHERE ...

We scan your files and find and reckon...
[I] Summary of Rule Applicability and File Choices:
 Rule#  Hit_Cnt  KB_Hit  Chosen   KB_Chosen
   KB_Ill Rule
 0  5255960 2376750813441868858 67355430720
0 RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO
POOL 'gpfs23capacity' LIMIT(98.00) WHERE(.)

So yes, 5.25Million files match the rule, but the utility chooses
1.868Million files that add up to 67,355GB and figures that if it
migrates those to gpfs23capacity,
(and also figuring the other migrations  by your second rule)then
gpfs23 will end up  97.% full.
We show you that with our "predictions" message.

Predicted Data Pool Utilization in KB and %:
Pool_Name   KB_OccupiedKB_Total  Percent_Occupied
gpfs23capacity 122483878944124983549952 97.3%

Re: [gpfsug-discuss] mmapplypolicy didn't migrate everything it should have - why not?

2017-04-17 Thread Buterbaugh, Kevin L
Hi Marc,

I do understand what you’re saying about mmapplypolicy deciding it only needed 
to move ~1.8 million files to fill the capacity pool to ~98% full.  However, it 
is now more than 24 hours since the mmapplypolicy finished “successfully” and:

Disks in storage pool: gpfs23capacity (Maximum disk size allowed is 519 TB)
eon35Ansd   58.2T   35 No   Yes  29.66T ( 51%)  
  64.16G ( 0%)
eon35Dnsd   58.2T   35 No   Yes  29.66T ( 51%)  
  64.61G ( 0%)
-  
---
(pool total)   116.4T59.33T ( 51%)  
  128.8G ( 0%)

And yes, I did run the mmapplypolicy with “-I yes” … here’s the partially 
redacted command line:

/usr/lpp/mmfs/bin/mmapplypolicy gpfs23 -A 75 -a 4 -g  -I yes -L 1 -P ~/gpfs/gpfs23_migration.policy -N 
some,list,of,NSD,server,nodes

And here’s that policy file:

define(access_age,(DAYS(CURRENT_TIMESTAMP) - DAYS(ACCESS_TIME)))
define(GB_ALLOCATED,(KB_ALLOCATED/1048576.0))

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98)
  WHERE ((access_age > 14) AND (KB_ALLOCATED > 3584))

RULE 'INeedThatAfterAll'
  MIGRATE FROM POOL 'gpfs23capacity'
  TO POOL 'gpfs23data'
  LIMIT(75)
  WHERE (access_age < 14)

The one thing that has changed is that formerly I only ran the migration in one 
direction at a time … i.e. I used to have those two rules in two separate files 
and would run an mmapplypolicy using the OldStuff rule the 1st weekend of the 
month and run the other rule the other weekends of the month.  This is the 1st 
weekend that I attempted to run an mmapplypolicy that did both at the same 
time.  Did I mess something up with that?

I have not run it again yet because we also run migrations on the other 
filesystem that we are still in the process of migrating off of.  So gpfs23 
goes 1st and as soon as it’s done the other filesystem migration kicks off.  I 
don’t like to run two migrations simultaneously if at all possible.  The 2nd 
migration ran until this morning, when it was unfortunately terminated by a 
network switch crash that has also had me tied up all morning until now.  :-(

And yes, there is something else going on … well, was going on - the network 
switch crash killed this too … I have been running an rsync on one particular 
~80TB directory tree from the old filesystem to gpfs23.  I understand that the 
migration wouldn’t know about those files and that’s fine … I just don’t 
understand why mmapplypolicy said it was going to fill the capacity pool to 98% 
but didn’t do it … wait, mmapplypolicy hasn’t gone into politics, has it?!?  ;-)

Thanks - and again, if I should open a PMR for this please let me know...

Kevin

On Apr 16, 2017, at 2:15 PM, Marc A Kaplan 
> wrote:

Let's look at how mmapplypolicy does the reckoning.
Before it starts, it see your pools as:

[I] GPFS Current Data Pool Utilization in KB and %
Pool_Name   KB_OccupiedKB_Total  Percent_Occupied
gpfs23capacity  55365193728124983549952 44.297984614%
gpfs23data 166747037696343753326592 48.507759721%
system0   0  0.0% (no 
user data)
[I] 75142046 of 209715200 inodes used: 35.830520%.

Your rule says you want to migrate data to gpfs23capacity, up to 98% full:

RULE 'OldStuff'
  MIGRATE FROM POOL 'gpfs23data'
  TO POOL 'gpfs23capacity'
  LIMIT(98) WHERE ...

We scan your files and find and reckon...
[I] Summary of Rule Applicability and File Choices:
 Rule#  Hit_Cnt  KB_Hit  Chosen   KB_Chosen  
KB_Ill Rule
 0  5255960 2376750813441868858 67355430720 
  0 RULE 'OldStuff' MIGRATE FROM POOL 'gpfs23data' TO POOL 'gpfs23capacity' 
LIMIT(98.00) WHERE(.)

So yes, 5.25Million files match the rule, but the utility chooses 1.868Million 
files that add up to 67,355GB and figures that if it migrates those to 
gpfs23capacity,
(and also figuring the other migrations  by your second rule)then gpfs23 will 
end up  97.% full.
We show you that with our "predictions" message.

Predicted Data Pool Utilization in KB and %:
Pool_Name   KB_OccupiedKB_Total  Percent_Occupied
gpfs23capacity 122483878944124983549952 97.3%
gpfs23data 104742360032343753326592 30.470209865%

So that's why it chooses to migrate "only" 67GB

See? Makes sense to me.

Questions:
Did you run with -I yes or -I defer ?

Were some of the files illreplicated or illplaced?

Did you give the cluster-wide space reckoning protocols time to see the 
changes?  mmdf is usually "behind" by some non-neglible amount of time.

What else is going on?
If  you're moving  or deleting or creating data by other means while 
mmapplypolicy