Re: [Gluster-devel] tiering: emergency demotions

2016-10-13 Thread Dan Lambright

- Original Message -
> From: "Milind Changire" <mchan...@redhat.com>
> To: gluster-devel@gluster.org
> Sent: Thursday, October 13, 2016 7:53:48 AM
> Subject: Re: [Gluster-devel] tiering: emergency demotions
> 
> Dilemma:
> *without* my patch, the demotions in degraded (hi-watermark breached)
> mode happen every 10 seconds by listing *all* files colder than the
> last 10 seconds and sorting them in ascending order w.r.t. the
> (write,read) access time ... so the existing query could take more than
> a minute to list files if there are millions of them
> 
> *with* my patch we currently select a random set of 20 files and demote
> them ... even if they are actively used ... so we either wait for more
> than a minute for the exact listing of cold files in the worst case or
> trade off by demoting hot files without imposing a file selection
> criteria for a quicker turnaround time
> 
> The exponential time window schema to select files discussed over Google
> Hangout has an issue with deciding the start time of the time window,
> although we know the end time being the current time
> 
> So, I think it would be either of the strategies discussed above with a
> trade-off in one way or the other.
> 
> Comments are requested regarding the approach to take for the
> implementation.

Reaching a full hot tier is a catastrophic event; the operator can no longer 
use the volume. If we find ourselves getting close to this situation we should 
take every means to get out of it as soon as possible. Performance is a 
secondary concern in this case.

Right now, a long database query will be O(n). It will therefore always take a 
long time (a minute or more) when there are large numbers of files (e.g. 
>10^6). This is only our current scheme and subject to change someday, but for 
now we must live with O(n).

On the other hand. It may or may not be true that the sample of files we choose 
to demote will include a file that is being accessed. We could potentially 
avoid demoting "hot" files by skipping them from the approximate "sample" we 
take, the criteria for skipping could be an elastic window of time that grows 
to ensure we eventually demote enough data.

So I think the "approximate" solution is better because the long query time 
(order of minutes) is something we cannot incur and must avoid, whereas the 
active file issue is something we can manage.

Avoiding filling up storage units is very much a classic problem. As we know 
DHT only partially solves it at the moment (write appends can fill up a 
subvolume). I am querying how ceph tackles this to see if they have any 
insights.

> 
> Rafi has also suggested to avoid file creation on the hot tier if the
> hot tier has hi-watermark breached to avoid further stress on storage
> capacity and eventual file migration to the cold tier.
> 
> Do we introduce demotion policies like "strict" and "approximate" to
> let user choose the demotion strategy ?
> 1. strict
> Choosing this strategy could mean we wait for the full and ordered
> query to complete and only then start demoting the coldest file first
> 
> 2. approximate
> Choosing this strategy could mean we choose the the first available
> file from the database query and demote it even if it is hot and
> actively written to
> 
> 
> Milind
> 
> On 08/12/2016 08:25 PM, Milind Changire wrote:
> > Patch for review: http://review.gluster.org/15158
> >
> > Milind
> >
> > On 08/12/2016 07:27 PM, Milind Changire wrote:
> >> On 08/10/2016 12:06 PM, Milind Changire wrote:
> >>> Emergency demotions will be required whenever writes breach the
> >>> hi-watermark. Emergency demotions are required to avoid ENOSPC in case
> >>> of continuous writes that originate on the hot tier.
> >>>
> >>> There are two concerns in this area:
> >>>
> >>> 1. enforcing max-cycle-time during emergency demotions
> >>>max-cycle-time is the time the tiering daemon spends in promotions or
> >>>demotions
> >>>I tend to think that the tiering daemon skip this check for the
> >>>emergency situation and continue demotions until the watermark drops
> >>>below the hi-watermark
> >>
> >> Update:
> >> To keep matters simple and manageable, it has been decided to *enforce*
> >> max-cycle-time to yield the worker threads to attend to impending tier
> >> management tasks if the need arises.
> >>
> >>>
> >>> 2. file demotion policy
> >>>I tend to think that evicting the largest file with the most recent
> >>>*write* s

Re: [Gluster-devel] tiering: emergency demotions

2016-10-13 Thread Milind Changire

Dilemma:
*without* my patch, the demotions in degraded (hi-watermark breached)
mode happen every 10 seconds by listing *all* files colder than the
last 10 seconds and sorting them in ascending order w.r.t. the
(write,read) access time ... so the existing query could take more than
a minute to list files if there are millions of them

*with* my patch we currently select a random set of 20 files and demote
them ... even if they are actively used ... so we either wait for more
than a minute for the exact listing of cold files in the worst case or
trade off by demoting hot files without imposing a file selection
criteria for a quicker turnaround time

The exponential time window schema to select files discussed over Google 
Hangout has an issue with deciding the start time of the time window, 
although we know the end time being the current time


So, I think it would be either of the strategies discussed above with a 
trade-off in one way or the other.


Comments are requested regarding the approach to take for the
implementation.

Rafi has also suggested to avoid file creation on the hot tier if the
hot tier has hi-watermark breached to avoid further stress on storage
capacity and eventual file migration to the cold tier.

Do we introduce demotion policies like "strict" and "approximate" to
let user choose the demotion strategy ?
1. strict
   Choosing this strategy could mean we wait for the full and ordered
   query to complete and only then start demoting the coldest file first

2. approximate
   Choosing this strategy could mean we choose the the first available
   file from the database query and demote it even if it is hot and
   actively written to


Milind

On 08/12/2016 08:25 PM, Milind Changire wrote:

Patch for review: http://review.gluster.org/15158

Milind

On 08/12/2016 07:27 PM, Milind Changire wrote:

On 08/10/2016 12:06 PM, Milind Changire wrote:

Emergency demotions will be required whenever writes breach the
hi-watermark. Emergency demotions are required to avoid ENOSPC in case
of continuous writes that originate on the hot tier.

There are two concerns in this area:

1. enforcing max-cycle-time during emergency demotions
   max-cycle-time is the time the tiering daemon spends in promotions or
   demotions
   I tend to think that the tiering daemon skip this check for the
   emergency situation and continue demotions until the watermark drops
   below the hi-watermark


Update:
To keep matters simple and manageable, it has been decided to *enforce*
max-cycle-time to yield the worker threads to attend to impending tier
management tasks if the need arises.



2. file demotion policy
   I tend to think that evicting the largest file with the most recent
   *write* should be chosen for eviction when write-freq-threshold is
   NON-ZERO.
   Choosing a least written file is just going to delay file migration
   of an active file which might consume hot tier disk space resulting
   in a ENOSPC, in the worst case.
   In cases where write-freq-threshold are ZERO, the most recently
   *written* file can be chosen for eviction.
   In the case of choosing the largest file within the
   write-freq-threshold, a stat() on the files would be required to
   calculate the number of files that need to be demoted to take the
   watermark below the hi-watermark. Finding the number of most recently
   written files to demote could also help make demotions in parallel
   rather than in the sequential manner currently in place.


Update:
The idea of choosing the files wrt file size has been dropped.
Iteratively, the most recently written file will be chosen for eviction
from the hot tier in case of a hi-watermark breach and until the
watermark drops below hi-watermark.
The idea of parallelizing multiple promotions/demotions has been
deferred.

-

Sustained writes creating larges files in the hot tier which
cumulatively breach the hi-watermark does NOT seem to be a good
workload for making use of tiering. The assumption is that, to make the
most of of the hot tier, the hi-watermark would be closer to 100.
In this case a sustained large file copy might easily breach the
hi-watermark and may even consume the entire hot tier space, resulting
in a ENOSPC.

eg. an example of a sustained write

# cp file1 /mnt/glustervol/dir

Workloads that would seem to make the most of tiering are:
1. Many smaller files, which are created in small bursts of write
   activity and then closed
2. Few large files where updates are in-place and the file size
   does not grow beyond the hi-watermark eg. database, with frequent
   in-line compaction/de-fragmentation policy enabled
3. Frequent reads of few large files, mostly static in size, which
   cumulatively don't breach the hi-watermark. Frequently reading
   a large number of smaller, mostly static, files would be good
   tiering workload candidates as well.




Comments are requested.


___
Gluster-devel mailing list

Re: [Gluster-devel] tiering: emergency demotions

2016-08-12 Thread Milind Changire

Patch for review: http://review.gluster.org/15158

Milind

On 08/12/2016 07:27 PM, Milind Changire wrote:

On 08/10/2016 12:06 PM, Milind Changire wrote:

Emergency demotions will be required whenever writes breach the
hi-watermark. Emergency demotions are required to avoid ENOSPC in case
of continuous writes that originate on the hot tier.

There are two concerns in this area:

1. enforcing max-cycle-time during emergency demotions
   max-cycle-time is the time the tiering daemon spends in promotions or
   demotions
   I tend to think that the tiering daemon skip this check for the
   emergency situation and continue demotions until the watermark drops
   below the hi-watermark


Update:
To keep matters simple and manageable, it has been decided to *enforce*
max-cycle-time to yield the worker threads to attend to impending tier
management tasks if the need arises.



2. file demotion policy
   I tend to think that evicting the largest file with the most recent
   *write* should be chosen for eviction when write-freq-threshold is
   NON-ZERO.
   Choosing a least written file is just going to delay file migration
   of an active file which might consume hot tier disk space resulting
   in a ENOSPC, in the worst case.
   In cases where write-freq-threshold are ZERO, the most recently
   *written* file can be chosen for eviction.
   In the case of choosing the largest file within the
   write-freq-threshold, a stat() on the files would be required to
   calculate the number of files that need to be demoted to take the
   watermark below the hi-watermark. Finding the number of most recently
   written files to demote could also help make demotions in parallel
   rather than in the sequential manner currently in place.


Update:
The idea of choosing the files wrt file size has been dropped.
Iteratively, the most recently written file will be chosen for eviction
from the hot tier in case of a hi-watermark breach and until the
watermark drops below hi-watermark.
The idea of parallelizing multiple promotions/demotions has been
deferred.

-

Sustained writes creating larges files in the hot tier which
cumulatively breach the hi-watermark does NOT seem to be a good
workload for making use of tiering. The assumption is that, to make the
most of of the hot tier, the hi-watermark would be closer to 100.
In this case a sustained large file copy might easily breach the
hi-watermark and may even consume the entire hot tier space, resulting
in a ENOSPC.

eg. an example of a sustained write

# cp file1 /mnt/glustervol/dir

Workloads that would seem to make the most of tiering are:
1. Many smaller files, which are created in small bursts of write
   activity and then closed
2. Few large files where updates are in-place and the file size
   does not grow beyond the hi-watermark eg. database, with frequent
   in-line compaction/de-fragmentation policy enabled
3. Frequent reads of few large files, mostly static in size, which
   cumulatively don't breach the hi-watermark. Frequently reading
   a large number of smaller, mostly static, files would be good
   tiering workload candidates as well.




Comments are requested.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] tiering: emergency demotions

2016-08-12 Thread Milind Changire

On 08/10/2016 12:06 PM, Milind Changire wrote:

Emergency demotions will be required whenever writes breach the
hi-watermark. Emergency demotions are required to avoid ENOSPC in case
of continuous writes that originate on the hot tier.

There are two concerns in this area:

1. enforcing max-cycle-time during emergency demotions
   max-cycle-time is the time the tiering daemon spends in promotions or
   demotions
   I tend to think that the tiering daemon skip this check for the
   emergency situation and continue demotions until the watermark drops
   below the hi-watermark


Update:
To keep matters simple and manageable, it has been decided to *enforce*
max-cycle-time to yield the worker threads to attend to impending tier
management tasks if the need arises.



2. file demotion policy
   I tend to think that evicting the largest file with the most recent
   *write* should be chosen for eviction when write-freq-threshold is
   NON-ZERO.
   Choosing a least written file is just going to delay file migration
   of an active file which might consume hot tier disk space resulting
   in a ENOSPC, in the worst case.
   In cases where write-freq-threshold are ZERO, the most recently
   *written* file can be chosen for eviction.
   In the case of choosing the largest file within the
   write-freq-threshold, a stat() on the files would be required to
   calculate the number of files that need to be demoted to take the
   watermark below the hi-watermark. Finding the number of most recently
   written files to demote could also help make demotions in parallel
   rather than in the sequential manner currently in place.


Update:
The idea of choosing the files wrt file size has been dropped.
Iteratively, the most recently written file will be chosen for eviction
from the hot tier in case of a hi-watermark breach and until the
watermark drops below hi-watermark.
The idea of parallelizing multiple promotions/demotions has been
deferred.

-

Sustained writes creating larges files in the hot tier which
cumulatively breach the hi-watermark does NOT seem to be a good
workload for making use of tiering. The assumption is that, to make the 
most of of the hot tier, the hi-watermark would be closer to 100.

In this case a sustained large file copy might easily breach the
hi-watermark and may even consume the entire hot tier space, resulting
in a ENOSPC.

eg. an example of a sustained write

# cp file1 /mnt/glustervol/dir

Workloads that would seem to make the most of tiering are:
1. Many smaller files, which are created in small bursts of write
   activity and then closed
2. Few large files where updates are in-place and the file size
   does not grow beyond the hi-watermark eg. database, with frequent
   in-line compaction/de-fragmentation policy enabled
3. Frequent reads of few large files, mostly static in size, which
   cumulatively don't breach the hi-watermark. Frequently reading
   a large number of smaller, mostly static, files would be good
   tiering workload candidates as well.




Comments are requested.


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel