Re: libata FUA revisited

2007-02-22 Thread Robert Hancock

Ric Wheeler wrote:

I think that FUA was designed for a different use case than what Linux
is using barriers for currently. The advantage with FUA is when you have
"before barrier", "after barrier" and "don't care" sets, where only the
specific things you care about ordering are in the before/after barrier
sets. Then you can do this:

Issue all before barrier requests with FUA bit set
Wait for all those to complete
Issue all after barrier requests with FUA bit set
Wait for all those to complete


A couple of issues with this would be in how to support our current 
semantics of fsync().  Today, the flush behavior of the barrier/fsync 
combination means that applications can have a hard promise of data on 
platter for any file after a successful fsync command.


If I understand correctly, to get a similar semantic from a pure FUA 
implementation would require us to tag all file IO as FUA.


I suspect that this would actually be less efficient since it would not 
allow the drives to reorder IO's up to the point that we actually care 
(fsync time).


I think for the fsync case a cache flush would likely still be needed, 
unless the app was only writing small amounts of data in between the 
syncs (it may be complicated to figure out when to do that, though).


The other big user of barriers is the internal transaction of journaled 
file systems.  It would seem that we would need to tag each write from 
the journal with a FUA IO as well.  Again, we might actually go more 
slowly in some cases as you mention below.


The limited queue depth of NCQ would seem to make it much harder to have 
a win in this case...


I think the journal write case is less problematic as there are likely 
to be much fewer/smaller requests involved which would be more likely to 
fit inside the queue..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-22 Thread Ric Wheeler

Tejun Heo wrote:

Jens Axboe wrote:

On Wed, Feb 21 2007, Tejun Heo wrote:

[cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]

Robert Hancock wrote:

Jens Axboe wrote:

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:

3. w/FUA+ORDERED

normal operation -> barrier issued -> write barrier FUA+ORDERED
 -> normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.

Yeah, I think if we have tagged write command and flush tagged (or
barrier tagged) things can be pretty efficient.  Again, I'm much more
comfortable with separate opcodes for those rather than bits changing
the behavior.

ORDERED+FUA NCQ would still be preferable to an NCQ enabled flush
command, though.


I think we're talking about two different things here.

1. The barrier write (FUA write) combined with flush.  I think it would
help improving the performance but I think issuing two commands
shouldn't be too slower than issuing one combined command unless it
causes extra physical activity (moving head, etc...).

2. FLUSH currently flushes all writes.  If we can mark certain commands
requiring ordering, we can selectively flush or order necessary writes.
 (No need to flush 16M buffer all over the disk when only journal needs
barriering)


We can certainly (given time to play in the lab!) try to measure this in 
with a micro-benchmark (with an analyzer or with block trace?).


A normal flush command in my old tests seemed to be in the 20 ms range 
(mixed in with and occasional "freebie" cache flush which returns in 50 
usecs or so - cache must be empty).




Another idea Dongjun talked about while drinking in LSF was ranged
flush.  Not as flexible/efficient as the previous option but much less
intrusive and should help quite a bit, I think.

But that requires extensive tracking, I'm not so sure the implementation
of that for barriers would be very clean. It'd probably be good for
fsync, though.


I was mostly thinking about journal area.  Using it for other purposes
would incur a lot of complexity.  :-(



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-22 Thread Ric Wheeler

Tejun Heo wrote:

[cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]

Robert Hancock wrote:

Jens Axboe wrote:

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:

3. w/FUA+ORDERED

normal operation -> barrier issued -> write barrier FUA+ORDERED
 -> normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.


Yeah, I think if we have tagged write command and flush tagged (or
barrier tagged) things can be pretty efficient.  Again, I'm much more
comfortable with separate opcodes for those rather than bits changing
the behavior.

Another idea Dongjun talked about while drinking in LSF was ranged
flush.  Not as flexible/efficient as the previous option but much less
intrusive and should help quite a bit, I think.


I think that FUA was designed for a different use case than what Linux
is using barriers for currently. The advantage with FUA is when you have
"before barrier", "after barrier" and "don't care" sets, where only the
specific things you care about ordering are in the before/after barrier
sets. Then you can do this:

Issue all before barrier requests with FUA bit set
Wait for all those to complete
Issue all after barrier requests with FUA bit set
Wait for all those to complete


A couple of issues with this would be in how to support our current 
semantics of fsync().  Today, the flush behavior of the barrier/fsync 
combination means that applications can have a hard promise of data on 
platter for any file after a successful fsync command.


If I understand correctly, to get a similar semantic from a pure FUA 
implementation would require us to tag all file IO as FUA.


I suspect that this would actually be less efficient since it would not 
allow the drives to reorder IO's up to the point that we actually care 
(fsync time).


The other big user of barriers is the internal transaction of journaled 
file systems.  It would seem that we would need to tag each write from 
the journal with a FUA IO as well.  Again, we might actually go more 
slowly in some cases as you mention below.


The limited queue depth of NCQ would seem to make it much harder to have 
a win in this case...




Meanwhile a bunch of "don't care" requests could be going through on the
device in the background. If we could do this, then I think there would
be an advantage. Right now, it just saves a command to the drive when
we're flushing on the post-barrier writes.

This would only be efficient with NCQ FUA, because regular FUA forces
the requests to complete serially, whereas in this case we don't really
care what order the individual requests finish, we just care about the
ordering of the pre vs. post barrier requests.


Yeap, that makes sense too but that possibly requires intrusive changes
in fs layer and limited NCQ queue depth might become a bottleneck too.


I'm not too nervous about the FUA write commands, I hope we can safely
assume that if you set the FUA supported bit in the id AND the write fua
command doesn't get aborted, that FUA must work. Anything else would
just be an immensely stupid implementation. NCQ+FUA is more tricky, I
agree that it being just a command bit does make it more likely that it
could be ignored. And that is indeed a danger. Given state of NCQ in
early firmware drives, I would not at all be surprised if the drive
vendors screwed that up too.


Yeap, I bet someone did.  :-)


But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
to drain the drive queue before issuing the WRITE/FUA. And at that point
we may as well not use the NCQ command, just go for the regular non-NCQ
FUA write. I think that should be safe.


Yeap.


Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives
don't support non-NCQ FUA writes..


To me, using the NCQ FUA bit on such drives doesn't seem to be a good
idea.  Maybe I'm just too chicken but it's not like we can gain a lot
from doing FUA at this point.  Are there a lot of drives which support
NCQ but not FUA opcodes?

Thanks.



Anything new (firmware included) is likely to be shaky on initial 
deployment.  Caution is certainly the way to go on this ;-)


ric


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-22 Thread Ric Wheeler

Jens Axboe wrote:

On Wed, Feb 21 2007, Tejun Heo wrote:

[cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]

Robert Hancock wrote:

Jens Axboe wrote:

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:

3. w/FUA+ORDERED

normal operation -> barrier issued -> write barrier FUA+ORDERED
 -> normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.

Yeah, I think if we have tagged write command and flush tagged (or
barrier tagged) things can be pretty efficient.  Again, I'm much more
comfortable with separate opcodes for those rather than bits changing
the behavior.


ORDERED+FUA NCQ would still be preferable to an NCQ enabled flush
command, though.


Another idea Dongjun talked about while drinking in LSF was ranged
flush.  Not as flexible/efficient as the previous option but much less
intrusive and should help quite a bit, I think.


But that requires extensive tracking, I'm not so sure the implementation
of that for barriers would be very clean. It'd probably be good for
fsync, though.



If we could invent any mechanism, it would seem that it would be nicest 
if we could have independent sequences of IO requests (say with a 
distinct tag per sequence) and an ability to issue a per sequence flush 
request.  That might tie into the QOS support, but would still have 
issues when you try to map it back up the stack through the journal and 
into application level promises of data integrity.


For example, in data journal mode, we would probably need to flush not 
only the transaction level data, but also all data sequences that had 
IO's in that transaction first.


Pretty rapidly, we start to get into the database notions of nested 
transactions and so on ;-)


ric

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-22 Thread Ric Wheeler

Jens Axboe wrote:

On Wed, Feb 21 2007, Tejun Heo wrote:

[cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]

Robert Hancock wrote:

Jens Axboe wrote:

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:

3. w/FUA+ORDERED

normal operation - barrier issued - write barrier FUA+ORDERED
 - normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.

Yeah, I think if we have tagged write command and flush tagged (or
barrier tagged) things can be pretty efficient.  Again, I'm much more
comfortable with separate opcodes for those rather than bits changing
the behavior.


ORDERED+FUA NCQ would still be preferable to an NCQ enabled flush
command, though.


Another idea Dongjun talked about while drinking in LSF was ranged
flush.  Not as flexible/efficient as the previous option but much less
intrusive and should help quite a bit, I think.


But that requires extensive tracking, I'm not so sure the implementation
of that for barriers would be very clean. It'd probably be good for
fsync, though.



If we could invent any mechanism, it would seem that it would be nicest 
if we could have independent sequences of IO requests (say with a 
distinct tag per sequence) and an ability to issue a per sequence flush 
request.  That might tie into the QOS support, but would still have 
issues when you try to map it back up the stack through the journal and 
into application level promises of data integrity.


For example, in data journal mode, we would probably need to flush not 
only the transaction level data, but also all data sequences that had 
IO's in that transaction first.


Pretty rapidly, we start to get into the database notions of nested 
transactions and so on ;-)


ric

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-22 Thread Ric Wheeler

Tejun Heo wrote:

[cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]

Robert Hancock wrote:

Jens Axboe wrote:

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:

3. w/FUA+ORDERED

normal operation - barrier issued - write barrier FUA+ORDERED
 - normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.


Yeah, I think if we have tagged write command and flush tagged (or
barrier tagged) things can be pretty efficient.  Again, I'm much more
comfortable with separate opcodes for those rather than bits changing
the behavior.

Another idea Dongjun talked about while drinking in LSF was ranged
flush.  Not as flexible/efficient as the previous option but much less
intrusive and should help quite a bit, I think.


I think that FUA was designed for a different use case than what Linux
is using barriers for currently. The advantage with FUA is when you have
before barrier, after barrier and don't care sets, where only the
specific things you care about ordering are in the before/after barrier
sets. Then you can do this:

Issue all before barrier requests with FUA bit set
Wait for all those to complete
Issue all after barrier requests with FUA bit set
Wait for all those to complete


A couple of issues with this would be in how to support our current 
semantics of fsync().  Today, the flush behavior of the barrier/fsync 
combination means that applications can have a hard promise of data on 
platter for any file after a successful fsync command.


If I understand correctly, to get a similar semantic from a pure FUA 
implementation would require us to tag all file IO as FUA.


I suspect that this would actually be less efficient since it would not 
allow the drives to reorder IO's up to the point that we actually care 
(fsync time).


The other big user of barriers is the internal transaction of journaled 
file systems.  It would seem that we would need to tag each write from 
the journal with a FUA IO as well.  Again, we might actually go more 
slowly in some cases as you mention below.


The limited queue depth of NCQ would seem to make it much harder to have 
a win in this case...




Meanwhile a bunch of don't care requests could be going through on the
device in the background. If we could do this, then I think there would
be an advantage. Right now, it just saves a command to the drive when
we're flushing on the post-barrier writes.

This would only be efficient with NCQ FUA, because regular FUA forces
the requests to complete serially, whereas in this case we don't really
care what order the individual requests finish, we just care about the
ordering of the pre vs. post barrier requests.


Yeap, that makes sense too but that possibly requires intrusive changes
in fs layer and limited NCQ queue depth might become a bottleneck too.


I'm not too nervous about the FUA write commands, I hope we can safely
assume that if you set the FUA supported bit in the id AND the write fua
command doesn't get aborted, that FUA must work. Anything else would
just be an immensely stupid implementation. NCQ+FUA is more tricky, I
agree that it being just a command bit does make it more likely that it
could be ignored. And that is indeed a danger. Given state of NCQ in
early firmware drives, I would not at all be surprised if the drive
vendors screwed that up too.


Yeap, I bet someone did.  :-)


But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
to drain the drive queue before issuing the WRITE/FUA. And at that point
we may as well not use the NCQ command, just go for the regular non-NCQ
FUA write. I think that should be safe.


Yeap.


Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives
don't support non-NCQ FUA writes..


To me, using the NCQ FUA bit on such drives doesn't seem to be a good
idea.  Maybe I'm just too chicken but it's not like we can gain a lot
from doing FUA at this point.  Are there a lot of drives which support
NCQ but not FUA opcodes?

Thanks.



Anything new (firmware included) is likely to be shaky on initial 
deployment.  Caution is certainly the way to go on this ;-)


ric


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-22 Thread Ric Wheeler

Tejun Heo wrote:

Jens Axboe wrote:

On Wed, Feb 21 2007, Tejun Heo wrote:

[cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]

Robert Hancock wrote:

Jens Axboe wrote:

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:

3. w/FUA+ORDERED

normal operation - barrier issued - write barrier FUA+ORDERED
 - normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.

Yeah, I think if we have tagged write command and flush tagged (or
barrier tagged) things can be pretty efficient.  Again, I'm much more
comfortable with separate opcodes for those rather than bits changing
the behavior.

ORDERED+FUA NCQ would still be preferable to an NCQ enabled flush
command, though.


I think we're talking about two different things here.

1. The barrier write (FUA write) combined with flush.  I think it would
help improving the performance but I think issuing two commands
shouldn't be too slower than issuing one combined command unless it
causes extra physical activity (moving head, etc...).

2. FLUSH currently flushes all writes.  If we can mark certain commands
requiring ordering, we can selectively flush or order necessary writes.
 (No need to flush 16M buffer all over the disk when only journal needs
barriering)


We can certainly (given time to play in the lab!) try to measure this in 
with a micro-benchmark (with an analyzer or with block trace?).


A normal flush command in my old tests seemed to be in the 20 ms range 
(mixed in with and occasional freebie cache flush which returns in 50 
usecs or so - cache must be empty).




Another idea Dongjun talked about while drinking in LSF was ranged
flush.  Not as flexible/efficient as the previous option but much less
intrusive and should help quite a bit, I think.

But that requires extensive tracking, I'm not so sure the implementation
of that for barriers would be very clean. It'd probably be good for
fsync, though.


I was mostly thinking about journal area.  Using it for other purposes
would incur a lot of complexity.  :-(



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-22 Thread Robert Hancock

Ric Wheeler wrote:

I think that FUA was designed for a different use case than what Linux
is using barriers for currently. The advantage with FUA is when you have
before barrier, after barrier and don't care sets, where only the
specific things you care about ordering are in the before/after barrier
sets. Then you can do this:

Issue all before barrier requests with FUA bit set
Wait for all those to complete
Issue all after barrier requests with FUA bit set
Wait for all those to complete


A couple of issues with this would be in how to support our current 
semantics of fsync().  Today, the flush behavior of the barrier/fsync 
combination means that applications can have a hard promise of data on 
platter for any file after a successful fsync command.


If I understand correctly, to get a similar semantic from a pure FUA 
implementation would require us to tag all file IO as FUA.


I suspect that this would actually be less efficient since it would not 
allow the drives to reorder IO's up to the point that we actually care 
(fsync time).


I think for the fsync case a cache flush would likely still be needed, 
unless the app was only writing small amounts of data in between the 
syncs (it may be complicated to figure out when to do that, though).


The other big user of barriers is the internal transaction of journaled 
file systems.  It would seem that we would need to tag each write from 
the journal with a FUA IO as well.  Again, we might actually go more 
slowly in some cases as you mention below.


The limited queue depth of NCQ would seem to make it much harder to have 
a win in this case...


I think the journal write case is less problematic as there are likely 
to be much fewer/smaller requests involved which would be more likely to 
fit inside the queue..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Robert Hancock

Tejun Heo wrote:

Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives
don't support non-NCQ FUA writes..


To me, using the NCQ FUA bit on such drives doesn't seem to be a good
idea.  Maybe I'm just too chicken but it's not like we can gain a lot
from doing FUA at this point.  Are there a lot of drives which support
NCQ but not FUA opcodes?


Well, it's hard to say whether "lots" have this issue, but the ones I 
have in my machine, Seagate 7200.7 NCQ 160GB (ST3160827AS) and 7200.10 
320GB (ST3320620AS), both support NCQ and don't support non-NCQ FUA, and 
those (especially the latter) seem to be very popular models.


Likely Seagate didn't implement that command since they figured nobody 
would use that if they had NCQ..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Jens Axboe
On Wed, Feb 21 2007, Tejun Heo wrote:
> Jens Axboe wrote:
> > On Wed, Feb 21 2007, Tejun Heo wrote:
> >> [cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people 
> >> in.]
> >>
> >> Robert Hancock wrote:
> >>> Jens Axboe wrote:
>  But we can't really change that, since you need the cache flushed before
>  issuing the FUA write. I've been advocating for an ordered bit for
>  years, so that we could just do:
> 
>  3. w/FUA+ORDERED
> 
>  normal operation -> barrier issued -> write barrier FUA+ORDERED
>   -> normal operation resumes
> 
>  So we don't have to serialize everything both at the block and device
>  level. I would have made FUA imply this already, but apparently it's not
>  what MS wanted FUA for, so... The current implementations take the FUA
>  bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
>  almost certainly going to jump ahead of already queued writes. Which we
>  of course really do not.
> >> Yeah, I think if we have tagged write command and flush tagged (or
> >> barrier tagged) things can be pretty efficient.  Again, I'm much more
> >> comfortable with separate opcodes for those rather than bits changing
> >> the behavior.
> > 
> > ORDERED+FUA NCQ would still be preferable to an NCQ enabled flush
> > command, though.
> 
> I think we're talking about two different things here.
> 
> 1. The barrier write (FUA write) combined with flush.  I think it would
> help improving the performance but I think issuing two commands
> shouldn't be too slower than issuing one combined command unless it
> causes extra physical activity (moving head, etc...).

The command overhead is dwarfed by other factors, agree.

> 2. FLUSH currently flushes all writes.  If we can mark certain commands
> requiring ordering, we can selectively flush or order necessary writes.
>  (No need to flush 16M buffer all over the disk when only journal needs
> barriering)

Sure, anything is better than the sledge hammer flush. But my claim is
that an ORDERED+FUA enabled write for critical data would be a good
approach, and simple in software.

> >> Another idea Dongjun talked about while drinking in LSF was ranged
> >> flush.  Not as flexible/efficient as the previous option but much less
> >> intrusive and should help quite a bit, I think.
> > 
> > But that requires extensive tracking, I'm not so sure the implementation
> > of that for barriers would be very clean. It'd probably be good for
> > fsync, though.
> 
> I was mostly thinking about journal area.  Using it for other purposes
> would incur a lot of complexity.  :-(

Yep if it's just for the journal, the range is known and fixed, so the
flush range would work nicely there.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Tejun Heo
Jens Axboe wrote:
> On Wed, Feb 21 2007, Tejun Heo wrote:
>> [cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]
>>
>> Robert Hancock wrote:
>>> Jens Axboe wrote:
 But we can't really change that, since you need the cache flushed before
 issuing the FUA write. I've been advocating for an ordered bit for
 years, so that we could just do:

 3. w/FUA+ORDERED

 normal operation -> barrier issued -> write barrier FUA+ORDERED
  -> normal operation resumes

 So we don't have to serialize everything both at the block and device
 level. I would have made FUA imply this already, but apparently it's not
 what MS wanted FUA for, so... The current implementations take the FUA
 bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
 almost certainly going to jump ahead of already queued writes. Which we
 of course really do not.
>> Yeah, I think if we have tagged write command and flush tagged (or
>> barrier tagged) things can be pretty efficient.  Again, I'm much more
>> comfortable with separate opcodes for those rather than bits changing
>> the behavior.
> 
> ORDERED+FUA NCQ would still be preferable to an NCQ enabled flush
> command, though.

I think we're talking about two different things here.

1. The barrier write (FUA write) combined with flush.  I think it would
help improving the performance but I think issuing two commands
shouldn't be too slower than issuing one combined command unless it
causes extra physical activity (moving head, etc...).

2. FLUSH currently flushes all writes.  If we can mark certain commands
requiring ordering, we can selectively flush or order necessary writes.
 (No need to flush 16M buffer all over the disk when only journal needs
barriering)

>> Another idea Dongjun talked about while drinking in LSF was ranged
>> flush.  Not as flexible/efficient as the previous option but much less
>> intrusive and should help quite a bit, I think.
> 
> But that requires extensive tracking, I'm not so sure the implementation
> of that for barriers would be very clean. It'd probably be good for
> fsync, though.

I was mostly thinking about journal area.  Using it for other purposes
would incur a lot of complexity.  :-(

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Jens Axboe
On Mon, Feb 19 2007, Robert Hancock wrote:
> Jens Axboe wrote:
> >But we can't really change that, since you need the cache flushed before
> >issuing the FUA write. I've been advocating for an ordered bit for
> >years, so that we could just do:
> >
> >3. w/FUA+ORDERED
> >
> >normal operation -> barrier issued -> write barrier FUA+ORDERED
> > -> normal operation resumes
> >
> >So we don't have to serialize everything both at the block and device
> >level. I would have made FUA imply this already, but apparently it's not
> >what MS wanted FUA for, so... The current implementations take the FUA
> >bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
> >almost certainly going to jump ahead of already queued writes. Which we
> >of course really do not.
> 
> I think that FUA was designed for a different use case than what Linux 
> is using barriers for currently. The advantage with FUA is when you have 

[snip]

Yes that's pretty obvious, my point is just that FUA+ORDERED would be a
nice thing to have for us.

> >I'm not too nervous about the FUA write commands, I hope we can safely
> >assume that if you set the FUA supported bit in the id AND the write fua
> >command doesn't get aborted, that FUA must work. Anything else would
> >just be an immensely stupid implementation. NCQ+FUA is more tricky, I
> >agree that it being just a command bit does make it more likely that it
> >could be ignored. And that is indeed a danger. Given state of NCQ in
> >early firmware drives, I would not at all be surprised if the drive
> >vendors screwed that up too.
> >
> >But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
> >to drain the drive queue before issuing the WRITE/FUA. And at that point
> >we may as well not use the NCQ command, just go for the regular non-NCQ
> >FUA write. I think that should be safe.
> 
> Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives 
> don't support non-NCQ FUA writes..

"Lots" meaning how many? All the ones I have here support FUA.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Jens Axboe
On Wed, Feb 21 2007, Tejun Heo wrote:
> [cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]
> 
> Robert Hancock wrote:
> > Jens Axboe wrote:
> >> But we can't really change that, since you need the cache flushed before
> >> issuing the FUA write. I've been advocating for an ordered bit for
> >> years, so that we could just do:
> >>
> >> 3. w/FUA+ORDERED
> >>
> >> normal operation -> barrier issued -> write barrier FUA+ORDERED
> >>  -> normal operation resumes
> >>
> >> So we don't have to serialize everything both at the block and device
> >> level. I would have made FUA imply this already, but apparently it's not
> >> what MS wanted FUA for, so... The current implementations take the FUA
> >> bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
> >> almost certainly going to jump ahead of already queued writes. Which we
> >> of course really do not.
> 
> Yeah, I think if we have tagged write command and flush tagged (or
> barrier tagged) things can be pretty efficient.  Again, I'm much more
> comfortable with separate opcodes for those rather than bits changing
> the behavior.

ORDERED+FUA NCQ would still be preferable to an NCQ enabled flush
command, though.

> Another idea Dongjun talked about while drinking in LSF was ranged
> flush.  Not as flexible/efficient as the previous option but much less
> intrusive and should help quite a bit, I think.

But that requires extensive tracking, I'm not so sure the implementation
of that for barriers would be very clean. It'd probably be good for
fsync, though.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Tejun Heo
[cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]

Robert Hancock wrote:
> Jens Axboe wrote:
>> But we can't really change that, since you need the cache flushed before
>> issuing the FUA write. I've been advocating for an ordered bit for
>> years, so that we could just do:
>>
>> 3. w/FUA+ORDERED
>>
>> normal operation -> barrier issued -> write barrier FUA+ORDERED
>>  -> normal operation resumes
>>
>> So we don't have to serialize everything both at the block and device
>> level. I would have made FUA imply this already, but apparently it's not
>> what MS wanted FUA for, so... The current implementations take the FUA
>> bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
>> almost certainly going to jump ahead of already queued writes. Which we
>> of course really do not.

Yeah, I think if we have tagged write command and flush tagged (or
barrier tagged) things can be pretty efficient.  Again, I'm much more
comfortable with separate opcodes for those rather than bits changing
the behavior.

Another idea Dongjun talked about while drinking in LSF was ranged
flush.  Not as flexible/efficient as the previous option but much less
intrusive and should help quite a bit, I think.

> I think that FUA was designed for a different use case than what Linux
> is using barriers for currently. The advantage with FUA is when you have
> "before barrier", "after barrier" and "don't care" sets, where only the
> specific things you care about ordering are in the before/after barrier
> sets. Then you can do this:
> 
> Issue all before barrier requests with FUA bit set
> Wait for all those to complete
> Issue all after barrier requests with FUA bit set
> Wait for all those to complete
> 
> Meanwhile a bunch of "don't care" requests could be going through on the
> device in the background. If we could do this, then I think there would
> be an advantage. Right now, it just saves a command to the drive when
> we're flushing on the post-barrier writes.
> 
> This would only be efficient with NCQ FUA, because regular FUA forces
> the requests to complete serially, whereas in this case we don't really
> care what order the individual requests finish, we just care about the
> ordering of the pre vs. post barrier requests.

Yeap, that makes sense too but that possibly requires intrusive changes
in fs layer and limited NCQ queue depth might become a bottleneck too.

>> I'm not too nervous about the FUA write commands, I hope we can safely
>> assume that if you set the FUA supported bit in the id AND the write fua
>> command doesn't get aborted, that FUA must work. Anything else would
>> just be an immensely stupid implementation. NCQ+FUA is more tricky, I
>> agree that it being just a command bit does make it more likely that it
>> could be ignored. And that is indeed a danger. Given state of NCQ in
>> early firmware drives, I would not at all be surprised if the drive
>> vendors screwed that up too.

Yeap, I bet someone did.  :-)

>> But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
>> to drain the drive queue before issuing the WRITE/FUA. And at that point
>> we may as well not use the NCQ command, just go for the regular non-NCQ
>> FUA write. I think that should be safe.

Yeap.

> Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives
> don't support non-NCQ FUA writes..

To me, using the NCQ FUA bit on such drives doesn't seem to be a good
idea.  Maybe I'm just too chicken but it's not like we can gain a lot
from doing FUA at this point.  Are there a lot of drives which support
NCQ but not FUA opcodes?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Tejun Heo
[cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]

Robert Hancock wrote:
 Jens Axboe wrote:
 But we can't really change that, since you need the cache flushed before
 issuing the FUA write. I've been advocating for an ordered bit for
 years, so that we could just do:

 3. w/FUA+ORDERED

 normal operation - barrier issued - write barrier FUA+ORDERED
  - normal operation resumes

 So we don't have to serialize everything both at the block and device
 level. I would have made FUA imply this already, but apparently it's not
 what MS wanted FUA for, so... The current implementations take the FUA
 bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
 almost certainly going to jump ahead of already queued writes. Which we
 of course really do not.

Yeah, I think if we have tagged write command and flush tagged (or
barrier tagged) things can be pretty efficient.  Again, I'm much more
comfortable with separate opcodes for those rather than bits changing
the behavior.

Another idea Dongjun talked about while drinking in LSF was ranged
flush.  Not as flexible/efficient as the previous option but much less
intrusive and should help quite a bit, I think.

 I think that FUA was designed for a different use case than what Linux
 is using barriers for currently. The advantage with FUA is when you have
 before barrier, after barrier and don't care sets, where only the
 specific things you care about ordering are in the before/after barrier
 sets. Then you can do this:
 
 Issue all before barrier requests with FUA bit set
 Wait for all those to complete
 Issue all after barrier requests with FUA bit set
 Wait for all those to complete
 
 Meanwhile a bunch of don't care requests could be going through on the
 device in the background. If we could do this, then I think there would
 be an advantage. Right now, it just saves a command to the drive when
 we're flushing on the post-barrier writes.
 
 This would only be efficient with NCQ FUA, because regular FUA forces
 the requests to complete serially, whereas in this case we don't really
 care what order the individual requests finish, we just care about the
 ordering of the pre vs. post barrier requests.

Yeap, that makes sense too but that possibly requires intrusive changes
in fs layer and limited NCQ queue depth might become a bottleneck too.

 I'm not too nervous about the FUA write commands, I hope we can safely
 assume that if you set the FUA supported bit in the id AND the write fua
 command doesn't get aborted, that FUA must work. Anything else would
 just be an immensely stupid implementation. NCQ+FUA is more tricky, I
 agree that it being just a command bit does make it more likely that it
 could be ignored. And that is indeed a danger. Given state of NCQ in
 early firmware drives, I would not at all be surprised if the drive
 vendors screwed that up too.

Yeap, I bet someone did.  :-)

 But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
 to drain the drive queue before issuing the WRITE/FUA. And at that point
 we may as well not use the NCQ command, just go for the regular non-NCQ
 FUA write. I think that should be safe.

Yeap.

 Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives
 don't support non-NCQ FUA writes..

To me, using the NCQ FUA bit on such drives doesn't seem to be a good
idea.  Maybe I'm just too chicken but it's not like we can gain a lot
from doing FUA at this point.  Are there a lot of drives which support
NCQ but not FUA opcodes?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Jens Axboe
On Wed, Feb 21 2007, Tejun Heo wrote:
 [cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]
 
 Robert Hancock wrote:
  Jens Axboe wrote:
  But we can't really change that, since you need the cache flushed before
  issuing the FUA write. I've been advocating for an ordered bit for
  years, so that we could just do:
 
  3. w/FUA+ORDERED
 
  normal operation - barrier issued - write barrier FUA+ORDERED
   - normal operation resumes
 
  So we don't have to serialize everything both at the block and device
  level. I would have made FUA imply this already, but apparently it's not
  what MS wanted FUA for, so... The current implementations take the FUA
  bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
  almost certainly going to jump ahead of already queued writes. Which we
  of course really do not.
 
 Yeah, I think if we have tagged write command and flush tagged (or
 barrier tagged) things can be pretty efficient.  Again, I'm much more
 comfortable with separate opcodes for those rather than bits changing
 the behavior.

ORDERED+FUA NCQ would still be preferable to an NCQ enabled flush
command, though.

 Another idea Dongjun talked about while drinking in LSF was ranged
 flush.  Not as flexible/efficient as the previous option but much less
 intrusive and should help quite a bit, I think.

But that requires extensive tracking, I'm not so sure the implementation
of that for barriers would be very clean. It'd probably be good for
fsync, though.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Jens Axboe
On Mon, Feb 19 2007, Robert Hancock wrote:
 Jens Axboe wrote:
 But we can't really change that, since you need the cache flushed before
 issuing the FUA write. I've been advocating for an ordered bit for
 years, so that we could just do:
 
 3. w/FUA+ORDERED
 
 normal operation - barrier issued - write barrier FUA+ORDERED
  - normal operation resumes
 
 So we don't have to serialize everything both at the block and device
 level. I would have made FUA imply this already, but apparently it's not
 what MS wanted FUA for, so... The current implementations take the FUA
 bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
 almost certainly going to jump ahead of already queued writes. Which we
 of course really do not.
 
 I think that FUA was designed for a different use case than what Linux 
 is using barriers for currently. The advantage with FUA is when you have 

[snip]

Yes that's pretty obvious, my point is just that FUA+ORDERED would be a
nice thing to have for us.

 I'm not too nervous about the FUA write commands, I hope we can safely
 assume that if you set the FUA supported bit in the id AND the write fua
 command doesn't get aborted, that FUA must work. Anything else would
 just be an immensely stupid implementation. NCQ+FUA is more tricky, I
 agree that it being just a command bit does make it more likely that it
 could be ignored. And that is indeed a danger. Given state of NCQ in
 early firmware drives, I would not at all be surprised if the drive
 vendors screwed that up too.
 
 But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
 to drain the drive queue before issuing the WRITE/FUA. And at that point
 we may as well not use the NCQ command, just go for the regular non-NCQ
 FUA write. I think that should be safe.
 
 Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives 
 don't support non-NCQ FUA writes..

Lots meaning how many? All the ones I have here support FUA.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Tejun Heo
Jens Axboe wrote:
 On Wed, Feb 21 2007, Tejun Heo wrote:
 [cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people in.]

 Robert Hancock wrote:
 Jens Axboe wrote:
 But we can't really change that, since you need the cache flushed before
 issuing the FUA write. I've been advocating for an ordered bit for
 years, so that we could just do:

 3. w/FUA+ORDERED

 normal operation - barrier issued - write barrier FUA+ORDERED
  - normal operation resumes

 So we don't have to serialize everything both at the block and device
 level. I would have made FUA imply this already, but apparently it's not
 what MS wanted FUA for, so... The current implementations take the FUA
 bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
 almost certainly going to jump ahead of already queued writes. Which we
 of course really do not.
 Yeah, I think if we have tagged write command and flush tagged (or
 barrier tagged) things can be pretty efficient.  Again, I'm much more
 comfortable with separate opcodes for those rather than bits changing
 the behavior.
 
 ORDERED+FUA NCQ would still be preferable to an NCQ enabled flush
 command, though.

I think we're talking about two different things here.

1. The barrier write (FUA write) combined with flush.  I think it would
help improving the performance but I think issuing two commands
shouldn't be too slower than issuing one combined command unless it
causes extra physical activity (moving head, etc...).

2. FLUSH currently flushes all writes.  If we can mark certain commands
requiring ordering, we can selectively flush or order necessary writes.
 (No need to flush 16M buffer all over the disk when only journal needs
barriering)

 Another idea Dongjun talked about while drinking in LSF was ranged
 flush.  Not as flexible/efficient as the previous option but much less
 intrusive and should help quite a bit, I think.
 
 But that requires extensive tracking, I'm not so sure the implementation
 of that for barriers would be very clean. It'd probably be good for
 fsync, though.

I was mostly thinking about journal area.  Using it for other purposes
would incur a lot of complexity.  :-(

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Jens Axboe
On Wed, Feb 21 2007, Tejun Heo wrote:
 Jens Axboe wrote:
  On Wed, Feb 21 2007, Tejun Heo wrote:
  [cc'ing Ric, Hannes and Dongjun, Hello.  Feel free to drag other people 
  in.]
 
  Robert Hancock wrote:
  Jens Axboe wrote:
  But we can't really change that, since you need the cache flushed before
  issuing the FUA write. I've been advocating for an ordered bit for
  years, so that we could just do:
 
  3. w/FUA+ORDERED
 
  normal operation - barrier issued - write barrier FUA+ORDERED
   - normal operation resumes
 
  So we don't have to serialize everything both at the block and device
  level. I would have made FUA imply this already, but apparently it's not
  what MS wanted FUA for, so... The current implementations take the FUA
  bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
  almost certainly going to jump ahead of already queued writes. Which we
  of course really do not.
  Yeah, I think if we have tagged write command and flush tagged (or
  barrier tagged) things can be pretty efficient.  Again, I'm much more
  comfortable with separate opcodes for those rather than bits changing
  the behavior.
  
  ORDERED+FUA NCQ would still be preferable to an NCQ enabled flush
  command, though.
 
 I think we're talking about two different things here.
 
 1. The barrier write (FUA write) combined with flush.  I think it would
 help improving the performance but I think issuing two commands
 shouldn't be too slower than issuing one combined command unless it
 causes extra physical activity (moving head, etc...).

The command overhead is dwarfed by other factors, agree.

 2. FLUSH currently flushes all writes.  If we can mark certain commands
 requiring ordering, we can selectively flush or order necessary writes.
  (No need to flush 16M buffer all over the disk when only journal needs
 barriering)

Sure, anything is better than the sledge hammer flush. But my claim is
that an ORDERED+FUA enabled write for critical data would be a good
approach, and simple in software.

  Another idea Dongjun talked about while drinking in LSF was ranged
  flush.  Not as flexible/efficient as the previous option but much less
  intrusive and should help quite a bit, I think.
  
  But that requires extensive tracking, I'm not so sure the implementation
  of that for barriers would be very clean. It'd probably be good for
  fsync, though.
 
 I was mostly thinking about journal area.  Using it for other purposes
 would incur a lot of complexity.  :-(

Yep if it's just for the journal, the range is known and fixed, so the
flush range would work nicely there.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-21 Thread Robert Hancock

Tejun Heo wrote:

Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives
don't support non-NCQ FUA writes..


To me, using the NCQ FUA bit on such drives doesn't seem to be a good
idea.  Maybe I'm just too chicken but it's not like we can gain a lot
from doing FUA at this point.  Are there a lot of drives which support
NCQ but not FUA opcodes?


Well, it's hard to say whether lots have this issue, but the ones I 
have in my machine, Seagate 7200.7 NCQ 160GB (ST3160827AS) and 7200.10 
320GB (ST3320620AS), both support NCQ and don't support non-NCQ FUA, and 
those (especially the latter) seem to be very popular models.


Likely Seagate didn't implement that command since they figured nobody 
would use that if they had NCQ..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-19 Thread Robert Hancock

Jens Axboe wrote:

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:

3. w/FUA+ORDERED

normal operation -> barrier issued -> write barrier FUA+ORDERED
 -> normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.


I think that FUA was designed for a different use case than what Linux 
is using barriers for currently. The advantage with FUA is when you have 
"before barrier", "after barrier" and "don't care" sets, where only the 
specific things you care about ordering are in the before/after barrier 
sets. Then you can do this:


Issue all before barrier requests with FUA bit set
Wait for all those to complete
Issue all after barrier requests with FUA bit set
Wait for all those to complete

Meanwhile a bunch of "don't care" requests could be going through on the 
device in the background. If we could do this, then I think there would 
be an advantage. Right now, it just saves a command to the drive when 
we're flushing on the post-barrier writes.


This would only be efficient with NCQ FUA, because regular FUA forces 
the requests to complete serially, whereas in this case we don't really 
care what order the individual requests finish, we just care about the 
ordering of the pre vs. post barrier requests.



I'm not too nervous about the FUA write commands, I hope we can safely
assume that if you set the FUA supported bit in the id AND the write fua
command doesn't get aborted, that FUA must work. Anything else would
just be an immensely stupid implementation. NCQ+FUA is more tricky, I
agree that it being just a command bit does make it more likely that it
could be ignored. And that is indeed a danger. Given state of NCQ in
early firmware drives, I would not at all be surprised if the drive
vendors screwed that up too.

But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
to drain the drive queue before issuing the WRITE/FUA. And at that point
we may as well not use the NCQ command, just go for the regular non-NCQ
FUA write. I think that should be safe.


Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives 
don't support non-NCQ FUA writes..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-19 Thread Robert Hancock

Jens Axboe wrote:

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:

3. w/FUA+ORDERED

normal operation - barrier issued - write barrier FUA+ORDERED
 - normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.


I think that FUA was designed for a different use case than what Linux 
is using barriers for currently. The advantage with FUA is when you have 
before barrier, after barrier and don't care sets, where only the 
specific things you care about ordering are in the before/after barrier 
sets. Then you can do this:


Issue all before barrier requests with FUA bit set
Wait for all those to complete
Issue all after barrier requests with FUA bit set
Wait for all those to complete

Meanwhile a bunch of don't care requests could be going through on the 
device in the background. If we could do this, then I think there would 
be an advantage. Right now, it just saves a command to the drive when 
we're flushing on the post-barrier writes.


This would only be efficient with NCQ FUA, because regular FUA forces 
the requests to complete serially, whereas in this case we don't really 
care what order the individual requests finish, we just care about the 
ordering of the pre vs. post barrier requests.



I'm not too nervous about the FUA write commands, I hope we can safely
assume that if you set the FUA supported bit in the id AND the write fua
command doesn't get aborted, that FUA must work. Anything else would
just be an immensely stupid implementation. NCQ+FUA is more tricky, I
agree that it being just a command bit does make it more likely that it
could be ignored. And that is indeed a danger. Given state of NCQ in
early firmware drives, I would not at all be surprised if the drive
vendors screwed that up too.

But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
to drain the drive queue before issuing the WRITE/FUA. And at that point
we may as well not use the NCQ command, just go for the regular non-NCQ
FUA write. I think that should be safe.


Aside from the issue above, as I mentioned elsewhere, lots of NCQ drives 
don't support non-NCQ FUA writes..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-16 Thread Jeff Garzik

Tejun Heo wrote:

Hello,

Robert Hancock wrote:
[--correct summary snipped--]

Given the above, what I'm proposing to do is:

-Remove the blacklisting of Maxtor BANC1G10 firmware for FUA. If we 
need to FUA-blacklist any drives this should likely be added to the 
existing "horkage" mechanism we now have. However, at this point I 
don't think that's needed, considering that I've seen no conclusive 
evidence that any drive has ever been established to have broken FUA.


Agreed.

-Add a new port flag ATA_FLAG_NO_FUA to indicate that a controller 
can't handle FUA commands, and add that flag to sata_sil. Force FUA 
off on any drive connected to a controller with this bit set.


There was some talk that sata_mv might have this problem, but I 
believe the conclusion was that it didn't. The only controllers that 
would are ones that actually try to interpret the ATA command codes 
and don't know about WRITE DMA FUA.


I think it would be better to add ATA_FLAG_FUA instead of ATA_FLAG_NO_FUA.


This is an interesting (if small) problem.  I would propose a third 
option:  add ATA_FLAG_NO_FUA to applicable /SATA/ drivers, but leave 
those without ATA_FLAG_SATA alone.


Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-16 Thread Jeff Garzik

Tejun Heo wrote:

Hello,

Robert Hancock wrote:
[--correct summary snipped--]

Given the above, what I'm proposing to do is:

-Remove the blacklisting of Maxtor BANC1G10 firmware for FUA. If we 
need to FUA-blacklist any drives this should likely be added to the 
existing horkage mechanism we now have. However, at this point I 
don't think that's needed, considering that I've seen no conclusive 
evidence that any drive has ever been established to have broken FUA.


Agreed.

-Add a new port flag ATA_FLAG_NO_FUA to indicate that a controller 
can't handle FUA commands, and add that flag to sata_sil. Force FUA 
off on any drive connected to a controller with this bit set.


There was some talk that sata_mv might have this problem, but I 
believe the conclusion was that it didn't. The only controllers that 
would are ones that actually try to interpret the ATA command codes 
and don't know about WRITE DMA FUA.


I think it would be better to add ATA_FLAG_FUA instead of ATA_FLAG_NO_FUA.


This is an interesting (if small) problem.  I would propose a third 
option:  add ATA_FLAG_NO_FUA to applicable /SATA/ drivers, but leave 
those without ATA_FLAG_SATA alone.


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-15 Thread Jens Axboe
On Tue, Feb 13 2007, Tejun Heo wrote:
> >>So, actually, I was thinking about *always* using the non-NCQ FUA 
> >>opcode.  As currently implemented, FUA request is always issued by 
> >>itself, so NCQ doesn't make any difference there.  So, I think it 
> >>would be better to turn on FUA on driver-by-driver basis whether the 
> >>controller supports NCQ or not.
> >
> >Unfortunately not all drives that support NCQ support the non-NCQ FUA 
> >commands (my Seagates are like this).
> 
> And I'm a bit scared to set FUA bit on such drives and trust that it 
> will actually do FUA, so our opinions aren't too far away from each 
> other.  :-)
> 
> >There's definitely a potential advantage to FUA with NCQ - if you have 
> >non-synchronous accesses going on concurrently with synchronous ones, if 
> >you have to use non-NCQ FUA or flush cache commands, you have to wait 
> >for all the IOs of both types to drain out before you can issue the 
> >flush (since those can't be overlapped with the NCQ read/writes). And if 
> >you can only use flush cache, then you're forcing all the writes to be 
> >flushed including the non-synchronous ones you didn't care about. 
> >Whether or not the block layer currently exploits this I don't know, but 
> >it definitely could.
> 
> The current barrier implementation uses the following sequences for 
> no-FUA and FUA cases.
> 
> 1. w/o FUA
> 
> normal operation -> barrier issued -> drain IO -> flush -> barrier 
> written -> flush -> normal operation resumes
> 
> 2. w/ FUA
> 
> normal operation -> barrier issued -> drain IO -> flush -> barrier 
> written / FUA -> normal operation resumes
> 
> So, the FUA write is issued by itself.  This isn't really efficient and 
> frequent barriers impact the performance badly.  If we can change that 
> NCQ FUA will be certainly beneficial.

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:

3. w/FUA+ORDERED

normal operation -> barrier issued -> write barrier FUA+ORDERED
 -> normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.

> >>Well, I might be being too paranoid but silent FUA failure would be 
> >>really hard to diagnose if that ever happens (and I'm fairly certain 
> >>that it will on some firmwares).
> >
> >Well, there are also probably drives that ignore flush cache commands or 
> > fail to do other things that they should. There's only so far we can go 
> >in coping if the firmware authors are being retarded. If any drive is 
> >broken like that we should likely just blacklist NCQ on it entirely as 
> >obviously little thought or testing went into the implementation..
> 
> FLUSH has been around quite long time now and most drives don't have 
> problem with that.  FUA on ATA is still quite new and libata will be the 
> first major user of it if we enable it by default.  It just seems too 
> easy to ignore that bit and successfully complete the write - there 
> isn't any safety net as opposed to using a separate opcode.  So, I'm a 
> bit nervous.

I'm not too nervous about the FUA write commands, I hope we can safely
assume that if you set the FUA supported bit in the id AND the write fua
command doesn't get aborted, that FUA must work. Anything else would
just be an immensely stupid implementation. NCQ+FUA is more tricky, I
agree that it being just a command bit does make it more likely that it
could be ignored. And that is indeed a danger. Given state of NCQ in
early firmware drives, I would not at all be surprised if the drive
vendors screwed that up too.

But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
to drain the drive queue before issuing the WRITE/FUA. And at that point
we may as well not use the NCQ command, just go for the regular non-NCQ
FUA write. I think that should be safe.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-15 Thread Jens Axboe
On Tue, Feb 13 2007, Tejun Heo wrote:
 So, actually, I was thinking about *always* using the non-NCQ FUA 
 opcode.  As currently implemented, FUA request is always issued by 
 itself, so NCQ doesn't make any difference there.  So, I think it 
 would be better to turn on FUA on driver-by-driver basis whether the 
 controller supports NCQ or not.
 
 Unfortunately not all drives that support NCQ support the non-NCQ FUA 
 commands (my Seagates are like this).
 
 And I'm a bit scared to set FUA bit on such drives and trust that it 
 will actually do FUA, so our opinions aren't too far away from each 
 other.  :-)
 
 There's definitely a potential advantage to FUA with NCQ - if you have 
 non-synchronous accesses going on concurrently with synchronous ones, if 
 you have to use non-NCQ FUA or flush cache commands, you have to wait 
 for all the IOs of both types to drain out before you can issue the 
 flush (since those can't be overlapped with the NCQ read/writes). And if 
 you can only use flush cache, then you're forcing all the writes to be 
 flushed including the non-synchronous ones you didn't care about. 
 Whether or not the block layer currently exploits this I don't know, but 
 it definitely could.
 
 The current barrier implementation uses the following sequences for 
 no-FUA and FUA cases.
 
 1. w/o FUA
 
 normal operation - barrier issued - drain IO - flush - barrier 
 written - flush - normal operation resumes
 
 2. w/ FUA
 
 normal operation - barrier issued - drain IO - flush - barrier 
 written / FUA - normal operation resumes
 
 So, the FUA write is issued by itself.  This isn't really efficient and 
 frequent barriers impact the performance badly.  If we can change that 
 NCQ FUA will be certainly beneficial.

But we can't really change that, since you need the cache flushed before
issuing the FUA write. I've been advocating for an ordered bit for
years, so that we could just do:

3. w/FUA+ORDERED

normal operation - barrier issued - write barrier FUA+ORDERED
 - normal operation resumes

So we don't have to serialize everything both at the block and device
level. I would have made FUA imply this already, but apparently it's not
what MS wanted FUA for, so... The current implementations take the FUA
bit (or WRITE FUA) as a hint to boost it to head of queue, so you are
almost certainly going to jump ahead of already queued writes. Which we
of course really do not.

 Well, I might be being too paranoid but silent FUA failure would be 
 really hard to diagnose if that ever happens (and I'm fairly certain 
 that it will on some firmwares).
 
 Well, there are also probably drives that ignore flush cache commands or 
  fail to do other things that they should. There's only so far we can go 
 in coping if the firmware authors are being retarded. If any drive is 
 broken like that we should likely just blacklist NCQ on it entirely as 
 obviously little thought or testing went into the implementation..
 
 FLUSH has been around quite long time now and most drives don't have 
 problem with that.  FUA on ATA is still quite new and libata will be the 
 first major user of it if we enable it by default.  It just seems too 
 easy to ignore that bit and successfully complete the write - there 
 isn't any safety net as opposed to using a separate opcode.  So, I'm a 
 bit nervous.

I'm not too nervous about the FUA write commands, I hope we can safely
assume that if you set the FUA supported bit in the id AND the write fua
command doesn't get aborted, that FUA must work. Anything else would
just be an immensely stupid implementation. NCQ+FUA is more tricky, I
agree that it being just a command bit does make it more likely that it
could be ignored. And that is indeed a danger. Given state of NCQ in
early firmware drives, I would not at all be surprised if the drive
vendors screwed that up too.

But, since we don't have the ordered bit for NCQ/FUA anyway, we do need
to drain the drive queue before issuing the WRITE/FUA. And at that point
we may as well not use the NCQ command, just go for the regular non-NCQ
FUA write. I think that should be safe.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-13 Thread Tejun Heo

[cc'ing Jeff, Alan, Mark and Jens.  Hi!]

Hello, Robert.

Robert Hancock wrote:
Well, we should be able to determine that experimentally (at least on 
specific controllers) with a little test program that just writes little 
bits of data and fsyncs repeatedly (assuming that does in fact trigger 
FUAs currently..) If it runs faster than the drive could possibly be 
rewriting the physical disk then obviously the FUA bit is not getting 
through and/or not respected and we can blacklist FUA on that controller.


That's right.

Also, the FUA bit in the NCQ commands is in the device register, so it's 
not like the PMP fields where it's not used for anything else and so the 
controller messing with it wouldn't be otherwise noticed..


Yeap, I just wanted to point out (so the FWIW) that seemingly innocent 
ahci does mangle with some part of the FIS given in the memory.  I agree 
that this is much unlikely with the FUA bit.


So, actually, I was thinking about *always* using the non-NCQ FUA 
opcode.  As currently implemented, FUA request is always issued by 
itself, so NCQ doesn't make any difference there.  So, I think it 
would be better to turn on FUA on driver-by-driver basis whether the 
controller supports NCQ or not.


Unfortunately not all drives that support NCQ support the non-NCQ FUA 
commands (my Seagates are like this).


And I'm a bit scared to set FUA bit on such drives and trust that it 
will actually do FUA, so our opinions aren't too far away from each 
other.  :-)


There's definitely a potential advantage to FUA with NCQ - if you have 
non-synchronous accesses going on concurrently with synchronous ones, if 
you have to use non-NCQ FUA or flush cache commands, you have to wait 
for all the IOs of both types to drain out before you can issue the 
flush (since those can't be overlapped with the NCQ read/writes). And if 
you can only use flush cache, then you're forcing all the writes to be 
flushed including the non-synchronous ones you didn't care about. 
Whether or not the block layer currently exploits this I don't know, but 
it definitely could.


The current barrier implementation uses the following sequences for 
no-FUA and FUA cases.


1. w/o FUA

normal operation -> barrier issued -> drain IO -> flush -> barrier 
written -> flush -> normal operation resumes


2. w/ FUA

normal operation -> barrier issued -> drain IO -> flush -> barrier 
written / FUA -> normal operation resumes


So, the FUA write is issued by itself.  This isn't really efficient and 
frequent barriers impact the performance badly.  If we can change that 
NCQ FUA will be certainly beneficial.


Well, I might be being too paranoid but silent FUA failure would be 
really hard to diagnose if that ever happens (and I'm fairly certain 
that it will on some firmwares).


Well, there are also probably drives that ignore flush cache commands or 
 fail to do other things that they should. There's only so far we can go 
in coping if the firmware authors are being retarded. If any drive is 
broken like that we should likely just blacklist NCQ on it entirely as 
obviously little thought or testing went into the implementation..


FLUSH has been around quite long time now and most drives don't have 
problem with that.  FUA on ATA is still quite new and libata will be the 
first major user of it if we enable it by default.  It just seems too 
easy to ignore that bit and successfully complete the write - there 
isn't any safety net as opposed to using a separate opcode.  So, I'm a 
bit nervous.


Any comments, people?

Thanks.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-13 Thread Robert Hancock

Tejun Heo wrote:
On the NCQ side, I think it's pretty safe to assume that all 
controllers will handle it. Obviously I've verified it with sata_nv 
(at least that it doesn't blow up obviously), and the other two NCQ 
drivers we have, ahci and sata_sil24 just feed raw FIS data into the 
controller so there should be no issue with not supporting it.


FWIW, ICH6/7/8 ahci's clear PMP field when transmitting FIS.  The reason 
why I'm hesitant is because there is no way to tell whether the FUA bit 
got honored or ignored.  With extra opcode, it's okay because barrier 
explicitly fails but if NCQ FUA is not supported, it will succeed 
silently as normal write.  Everything will be okay generally but the 
barrier is done incorrectly and on a really bad day it will lead to 
journal corruption.


Well, we should be able to determine that experimentally (at least on 
specific controllers) with a little test program that just writes little 
bits of data and fsyncs repeatedly (assuming that does in fact trigger 
FUAs currently..) If it runs faster than the drive could possibly be 
rewriting the physical disk then obviously the FUA bit is not getting 
through and/or not respected and we can blacklist FUA on that controller.


Also, the FUA bit in the NCQ commands is in the device register, so it's 
not like the PMP fields where it's not used for anything else and so the 
controller messing with it wouldn't be otherwise noticed..




So, actually, I was thinking about *always* using the non-NCQ FUA 
opcode.  As currently implemented, FUA request is always issued by 
itself, so NCQ doesn't make any difference there.  So, I think it would 
be better to turn on FUA on driver-by-driver basis whether the 
controller supports NCQ or not.


Unfortunately not all drives that support NCQ support the non-NCQ FUA 
commands (my Seagates are like this).


There's definitely a potential advantage to FUA with NCQ - if you have 
non-synchronous accesses going on concurrently with synchronous ones, if 
you have to use non-NCQ FUA or flush cache commands, you have to wait 
for all the IOs of both types to drain out before you can issue the 
flush (since those can't be overlapped with the NCQ read/writes). And if 
you can only use flush cache, then you're forcing all the writes to be 
flushed including the non-synchronous ones you didn't care about. 
Whether or not the block layer currently exploits this I don't know, but 
it definitely could.


Well, I might be being too paranoid but silent FUA failure would be 
really hard to diagnose if that ever happens (and I'm fairly certain 
that it will on some firmwares).


Well, there are also probably drives that ignore flush cache commands or 
 fail to do other things that they should. There's only so far we can 
go in coping if the firmware authors are being retarded. If any drive is 
broken like that we should likely just blacklist NCQ on it entirely as 
obviously little thought or testing went into the implementation..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-13 Thread Tejun Heo

Hello, Robert.

Robert Hancock wrote:
[--snip--]
On the NCQ side, I think it's pretty safe to assume that all controllers 
will handle it. Obviously I've verified it with sata_nv (at least that 
it doesn't blow up obviously), and the other two NCQ drivers we have, 
ahci and sata_sil24 just feed raw FIS data into the controller so there 
should be no issue with not supporting it.


FWIW, ICH6/7/8 ahci's clear PMP field when transmitting FIS.  The reason 
why I'm hesitant is because there is no way to tell whether the FUA bit 
got honored or ignored.  With extra opcode, it's okay because barrier 
explicitly fails but if NCQ FUA is not supported, it will succeed 
silently as normal write.  Everything will be okay generally but the 
barrier is done incorrectly and on a really bad day it will lead to 
journal corruption.


So, actually, I was thinking about *always* using the non-NCQ FUA 
opcode.  As currently implemented, FUA request is always issued by 
itself, so NCQ doesn't make any difference there.  So, I think it would 
be better to turn on FUA on driver-by-driver basis whether the 
controller supports NCQ or not.


Well, I might be being too paranoid but silent FUA failure would be 
really hard to diagnose if that ever happens (and I'm fairly certain 
that it will on some firmwares).


--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-13 Thread Tejun Heo

Hello, Robert.

Robert Hancock wrote:
[--snip--]
On the NCQ side, I think it's pretty safe to assume that all controllers 
will handle it. Obviously I've verified it with sata_nv (at least that 
it doesn't blow up obviously), and the other two NCQ drivers we have, 
ahci and sata_sil24 just feed raw FIS data into the controller so there 
should be no issue with not supporting it.


FWIW, ICH6/7/8 ahci's clear PMP field when transmitting FIS.  The reason 
why I'm hesitant is because there is no way to tell whether the FUA bit 
got honored or ignored.  With extra opcode, it's okay because barrier 
explicitly fails but if NCQ FUA is not supported, it will succeed 
silently as normal write.  Everything will be okay generally but the 
barrier is done incorrectly and on a really bad day it will lead to 
journal corruption.


So, actually, I was thinking about *always* using the non-NCQ FUA 
opcode.  As currently implemented, FUA request is always issued by 
itself, so NCQ doesn't make any difference there.  So, I think it would 
be better to turn on FUA on driver-by-driver basis whether the 
controller supports NCQ or not.


Well, I might be being too paranoid but silent FUA failure would be 
really hard to diagnose if that ever happens (and I'm fairly certain 
that it will on some firmwares).


--
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-13 Thread Robert Hancock

Tejun Heo wrote:
On the NCQ side, I think it's pretty safe to assume that all 
controllers will handle it. Obviously I've verified it with sata_nv 
(at least that it doesn't blow up obviously), and the other two NCQ 
drivers we have, ahci and sata_sil24 just feed raw FIS data into the 
controller so there should be no issue with not supporting it.


FWIW, ICH6/7/8 ahci's clear PMP field when transmitting FIS.  The reason 
why I'm hesitant is because there is no way to tell whether the FUA bit 
got honored or ignored.  With extra opcode, it's okay because barrier 
explicitly fails but if NCQ FUA is not supported, it will succeed 
silently as normal write.  Everything will be okay generally but the 
barrier is done incorrectly and on a really bad day it will lead to 
journal corruption.


Well, we should be able to determine that experimentally (at least on 
specific controllers) with a little test program that just writes little 
bits of data and fsyncs repeatedly (assuming that does in fact trigger 
FUAs currently..) If it runs faster than the drive could possibly be 
rewriting the physical disk then obviously the FUA bit is not getting 
through and/or not respected and we can blacklist FUA on that controller.


Also, the FUA bit in the NCQ commands is in the device register, so it's 
not like the PMP fields where it's not used for anything else and so the 
controller messing with it wouldn't be otherwise noticed..




So, actually, I was thinking about *always* using the non-NCQ FUA 
opcode.  As currently implemented, FUA request is always issued by 
itself, so NCQ doesn't make any difference there.  So, I think it would 
be better to turn on FUA on driver-by-driver basis whether the 
controller supports NCQ or not.


Unfortunately not all drives that support NCQ support the non-NCQ FUA 
commands (my Seagates are like this).


There's definitely a potential advantage to FUA with NCQ - if you have 
non-synchronous accesses going on concurrently with synchronous ones, if 
you have to use non-NCQ FUA or flush cache commands, you have to wait 
for all the IOs of both types to drain out before you can issue the 
flush (since those can't be overlapped with the NCQ read/writes). And if 
you can only use flush cache, then you're forcing all the writes to be 
flushed including the non-synchronous ones you didn't care about. 
Whether or not the block layer currently exploits this I don't know, but 
it definitely could.


Well, I might be being too paranoid but silent FUA failure would be 
really hard to diagnose if that ever happens (and I'm fairly certain 
that it will on some firmwares).


Well, there are also probably drives that ignore flush cache commands or 
 fail to do other things that they should. There's only so far we can 
go in coping if the firmware authors are being retarded. If any drive is 
broken like that we should likely just blacklist NCQ on it entirely as 
obviously little thought or testing went into the implementation..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-13 Thread Tejun Heo

[cc'ing Jeff, Alan, Mark and Jens.  Hi!]

Hello, Robert.

Robert Hancock wrote:
Well, we should be able to determine that experimentally (at least on 
specific controllers) with a little test program that just writes little 
bits of data and fsyncs repeatedly (assuming that does in fact trigger 
FUAs currently..) If it runs faster than the drive could possibly be 
rewriting the physical disk then obviously the FUA bit is not getting 
through and/or not respected and we can blacklist FUA on that controller.


That's right.

Also, the FUA bit in the NCQ commands is in the device register, so it's 
not like the PMP fields where it's not used for anything else and so the 
controller messing with it wouldn't be otherwise noticed..


Yeap, I just wanted to point out (so the FWIW) that seemingly innocent 
ahci does mangle with some part of the FIS given in the memory.  I agree 
that this is much unlikely with the FUA bit.


So, actually, I was thinking about *always* using the non-NCQ FUA 
opcode.  As currently implemented, FUA request is always issued by 
itself, so NCQ doesn't make any difference there.  So, I think it 
would be better to turn on FUA on driver-by-driver basis whether the 
controller supports NCQ or not.


Unfortunately not all drives that support NCQ support the non-NCQ FUA 
commands (my Seagates are like this).


And I'm a bit scared to set FUA bit on such drives and trust that it 
will actually do FUA, so our opinions aren't too far away from each 
other.  :-)


There's definitely a potential advantage to FUA with NCQ - if you have 
non-synchronous accesses going on concurrently with synchronous ones, if 
you have to use non-NCQ FUA or flush cache commands, you have to wait 
for all the IOs of both types to drain out before you can issue the 
flush (since those can't be overlapped with the NCQ read/writes). And if 
you can only use flush cache, then you're forcing all the writes to be 
flushed including the non-synchronous ones you didn't care about. 
Whether or not the block layer currently exploits this I don't know, but 
it definitely could.


The current barrier implementation uses the following sequences for 
no-FUA and FUA cases.


1. w/o FUA

normal operation - barrier issued - drain IO - flush - barrier 
written - flush - normal operation resumes


2. w/ FUA

normal operation - barrier issued - drain IO - flush - barrier 
written / FUA - normal operation resumes


So, the FUA write is issued by itself.  This isn't really efficient and 
frequent barriers impact the performance badly.  If we can change that 
NCQ FUA will be certainly beneficial.


Well, I might be being too paranoid but silent FUA failure would be 
really hard to diagnose if that ever happens (and I'm fairly certain 
that it will on some firmwares).


Well, there are also probably drives that ignore flush cache commands or 
 fail to do other things that they should. There's only so far we can go 
in coping if the firmware authors are being retarded. If any drive is 
broken like that we should likely just blacklist NCQ on it entirely as 
obviously little thought or testing went into the implementation..


FLUSH has been around quite long time now and most drives don't have 
problem with that.  FUA on ATA is still quite new and libata will be the 
first major user of it if we enable it by default.  It just seems too 
easy to ignore that bit and successfully complete the write - there 
isn't any safety net as opposed to using a separate opcode.  So, I'm a 
bit nervous.


Any comments, people?

Thanks.

--
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-12 Thread Robert Hancock

Tejun Heo wrote:
-Add a new port flag ATA_FLAG_NO_FUA to indicate that a controller 
can't handle FUA commands, and add that flag to sata_sil. Force FUA 
off on any drive connected to a controller with this bit set.


There was some talk that sata_mv might have this problem, but I 
believe the conclusion was that it didn't. The only controllers that 
would are ones that actually try to interpret the ATA command codes 
and don't know about WRITE DMA FUA.


I think it would be better to add ATA_FLAG_FUA instead of ATA_FLAG_NO_FUA.


I'm not sure about that, it appears that the number of controllers that 
have problems is much lower than the number that don't, so this would 
just need to be added to almost every driver. Especially since the 
non-NCQ FUA which was problematic on SiI in the past isn't being 
switched on by default.


-Change the fua module option to control FUA enable/disable to have a 
third value, "enable for NCQ-supporting drives only", which would 
become the new default. That case seems less likely to cause problems 
since FUA on NCQ is just another bit in the command whereas FUA on 
non-NCQ is an entirely different, potentially unsupported command.


Not enabling FUA doesn't result in huge performance hit.  I'm not sure 
whether we should go such far.  Just supporting FUA on known good 
controllers sounds good enough to me.


On the NCQ side, I think it's pretty safe to assume that all controllers 
will handle it. Obviously I've verified it with sata_nv (at least that 
it doesn't blow up obviously), and the other two NCQ drivers we have, 
ahci and sata_sil24 just feed raw FIS data into the controller so there 
should be no issue with not supporting it.


Assuming that we leave FUA off by default for non-NCQ for the 
foreseeable future, is there really much concern here?





Any comments?

As an aside, I came across a comment that the Silicon Image Windows 
drivers for NCQ-supporting controllers have some blacklist entries for 
drives with broken NCQ in their .inf files. We only seem to have one 
in the libata NCQ blacklist, we may want to add some more of these. 
The ones in the current SiI3124 and 3132 drivers' .inf files for 
"DisableSataQueueing" appear to be:


ModelFirmware
Maxtor 7B250S0BANC1B70
HTS541060G9SA00MB3OC60D
HTS541080G9SA00MB4OC60D
HTS541010G9SA00MBZOC60D

(the latter 3 being Hitachi Travelstar drives)


Yeah, I don't think we need to be too careful about blacklisting NCQ 
considering the sorry state of many early NCQ firmwares.  Please submit 
a patch.  It would be nice if you add a comment why the drives are added 
for documentation.


Will do shortly. Eric, do you have any info on the blacklisting of that 
Maxtor 7B250S0 drive? I would hope that Silicon Image would have a good 
reason for blacklisting that one..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-12 Thread Tejun Heo

Hello,

Robert Hancock wrote:
[--correct summary snipped--]

Given the above, what I'm proposing to do is:

-Remove the blacklisting of Maxtor BANC1G10 firmware for FUA. If we need 
to FUA-blacklist any drives this should likely be added to the existing 
"horkage" mechanism we now have. However, at this point I don't think 
that's needed, considering that I've seen no conclusive evidence that 
any drive has ever been established to have broken FUA.


Agreed.

-Add a new port flag ATA_FLAG_NO_FUA to indicate that a controller can't 
handle FUA commands, and add that flag to sata_sil. Force FUA off on any 
drive connected to a controller with this bit set.


There was some talk that sata_mv might have this problem, but I believe 
the conclusion was that it didn't. The only controllers that would are 
ones that actually try to interpret the ATA command codes and don't know 
about WRITE DMA FUA.


I think it would be better to add ATA_FLAG_FUA instead of ATA_FLAG_NO_FUA.

-Change the fua module option to control FUA enable/disable to have a 
third value, "enable for NCQ-supporting drives only", which would become 
the new default. That case seems less likely to cause problems since FUA 
on NCQ is just another bit in the command whereas FUA on non-NCQ is an 
entirely different, potentially unsupported command.


Not enabling FUA doesn't result in huge performance hit.  I'm not sure 
whether we should go such far.  Just supporting FUA on known good 
controllers sounds good enough to me.



Any comments?

As an aside, I came across a comment that the Silicon Image Windows 
drivers for NCQ-supporting controllers have some blacklist entries for 
drives with broken NCQ in their .inf files. We only seem to have one in 
the libata NCQ blacklist, we may want to add some more of these. The 
ones in the current SiI3124 and 3132 drivers' .inf files for 
"DisableSataQueueing" appear to be:


ModelFirmware
Maxtor 7B250S0BANC1B70
HTS541060G9SA00MB3OC60D
HTS541080G9SA00MB4OC60D
HTS541010G9SA00MBZOC60D

(the latter 3 being Hitachi Travelstar drives)


Yeah, I don't think we need to be too careful about blacklisting NCQ 
considering the sorry state of many early NCQ firmwares.  Please submit 
a patch.  It would be nice if you add a comment why the drives are added 
for documentation.


Thanks.

--
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-12 Thread Tejun Heo

Hello,

Robert Hancock wrote:
[--correct summary snipped--]

Given the above, what I'm proposing to do is:

-Remove the blacklisting of Maxtor BANC1G10 firmware for FUA. If we need 
to FUA-blacklist any drives this should likely be added to the existing 
horkage mechanism we now have. However, at this point I don't think 
that's needed, considering that I've seen no conclusive evidence that 
any drive has ever been established to have broken FUA.


Agreed.

-Add a new port flag ATA_FLAG_NO_FUA to indicate that a controller can't 
handle FUA commands, and add that flag to sata_sil. Force FUA off on any 
drive connected to a controller with this bit set.


There was some talk that sata_mv might have this problem, but I believe 
the conclusion was that it didn't. The only controllers that would are 
ones that actually try to interpret the ATA command codes and don't know 
about WRITE DMA FUA.


I think it would be better to add ATA_FLAG_FUA instead of ATA_FLAG_NO_FUA.

-Change the fua module option to control FUA enable/disable to have a 
third value, enable for NCQ-supporting drives only, which would become 
the new default. That case seems less likely to cause problems since FUA 
on NCQ is just another bit in the command whereas FUA on non-NCQ is an 
entirely different, potentially unsupported command.


Not enabling FUA doesn't result in huge performance hit.  I'm not sure 
whether we should go such far.  Just supporting FUA on known good 
controllers sounds good enough to me.



Any comments?

As an aside, I came across a comment that the Silicon Image Windows 
drivers for NCQ-supporting controllers have some blacklist entries for 
drives with broken NCQ in their .inf files. We only seem to have one in 
the libata NCQ blacklist, we may want to add some more of these. The 
ones in the current SiI3124 and 3132 drivers' .inf files for 
DisableSataQueueing appear to be:


ModelFirmware
Maxtor 7B250S0BANC1B70
HTS541060G9SA00MB3OC60D
HTS541080G9SA00MB4OC60D
HTS541010G9SA00MBZOC60D

(the latter 3 being Hitachi Travelstar drives)


Yeah, I don't think we need to be too careful about blacklisting NCQ 
considering the sorry state of many early NCQ firmwares.  Please submit 
a patch.  It would be nice if you add a comment why the drives are added 
for documentation.


Thanks.

--
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-12 Thread Robert Hancock

Tejun Heo wrote:
-Add a new port flag ATA_FLAG_NO_FUA to indicate that a controller 
can't handle FUA commands, and add that flag to sata_sil. Force FUA 
off on any drive connected to a controller with this bit set.


There was some talk that sata_mv might have this problem, but I 
believe the conclusion was that it didn't. The only controllers that 
would are ones that actually try to interpret the ATA command codes 
and don't know about WRITE DMA FUA.


I think it would be better to add ATA_FLAG_FUA instead of ATA_FLAG_NO_FUA.


I'm not sure about that, it appears that the number of controllers that 
have problems is much lower than the number that don't, so this would 
just need to be added to almost every driver. Especially since the 
non-NCQ FUA which was problematic on SiI in the past isn't being 
switched on by default.


-Change the fua module option to control FUA enable/disable to have a 
third value, enable for NCQ-supporting drives only, which would 
become the new default. That case seems less likely to cause problems 
since FUA on NCQ is just another bit in the command whereas FUA on 
non-NCQ is an entirely different, potentially unsupported command.


Not enabling FUA doesn't result in huge performance hit.  I'm not sure 
whether we should go such far.  Just supporting FUA on known good 
controllers sounds good enough to me.


On the NCQ side, I think it's pretty safe to assume that all controllers 
will handle it. Obviously I've verified it with sata_nv (at least that 
it doesn't blow up obviously), and the other two NCQ drivers we have, 
ahci and sata_sil24 just feed raw FIS data into the controller so there 
should be no issue with not supporting it.


Assuming that we leave FUA off by default for non-NCQ for the 
foreseeable future, is there really much concern here?





Any comments?

As an aside, I came across a comment that the Silicon Image Windows 
drivers for NCQ-supporting controllers have some blacklist entries for 
drives with broken NCQ in their .inf files. We only seem to have one 
in the libata NCQ blacklist, we may want to add some more of these. 
The ones in the current SiI3124 and 3132 drivers' .inf files for 
DisableSataQueueing appear to be:


ModelFirmware
Maxtor 7B250S0BANC1B70
HTS541060G9SA00MB3OC60D
HTS541080G9SA00MB4OC60D
HTS541010G9SA00MBZOC60D

(the latter 3 being Hitachi Travelstar drives)


Yeah, I don't think we need to be too careful about blacklisting NCQ 
considering the sorry state of many early NCQ firmwares.  Please submit 
a patch.  It would be nice if you add a comment why the drives are added 
for documentation.


Will do shortly. Eric, do you have any info on the blacklisting of that 
Maxtor 7B250S0 drive? I would hope that Silicon Image would have a good 
reason for blacklisting that one..


--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-11 Thread Robert Hancock

Robert Hancock wrote:

Given the above, what I'm proposing to do is:

-Remove the blacklisting of Maxtor BANC1G10 firmware for FUA. If we need 
to FUA-blacklist any drives this should likely be added to the existing 
"horkage" mechanism we now have. However, at this point I don't think 
that's needed, considering that I've seen no conclusive evidence that 
any drive has ever been established to have broken FUA.


-Add a new port flag ATA_FLAG_NO_FUA to indicate that a controller can't 
handle FUA commands, and add that flag to sata_sil. Force FUA off on any 
drive connected to a controller with this bit set.


There was some talk that sata_mv might have this problem, but I believe 
the conclusion was that it didn't. The only controllers that would are 
ones that actually try to interpret the ATA command codes and don't know 
about WRITE DMA FUA.


-Change the fua module option to control FUA enable/disable to have a 
third value, "enable for NCQ-supporting drives only", which would become 
the new default. That case seems less likely to cause problems since FUA 
on NCQ is just another bit in the command whereas FUA on non-NCQ is an 
entirely different, potentially unsupported command.


OK, here's what I've got to implement the above, and a few other things -
not submitted for inclusion yet as I'd like to get a few comments.

This centralizes the logic in one place for deciding whether to use FUA
or not. It also modifies the logic to account for the fact that when
NCQ is enabled we should always be able to use FUA, since it's inherent
in the definition of the NCQ commands. Since enabling and disabling NCQ
can thus also enable/disable FUA (if the drive doesn't support non-NCQ
FUA) we need to revalidate the device when doing this on change_queue_depth
so that the SCSI layer sees the change.

(I tried to test this, but wasn't able to actually change the queue depth
using the /sys/block/sda/device/queue_depth file. The queue_depth attribute
started out as r--r--r--, I tried chmod u+w and writing it but just got an
"Input/output error". Did somebody break or disable this functionality?)

Also, as well as setting ATA_FLAG_NO_FUA in sata_sil it appears that
pata_it821x also needs FUA disabled when in smart mode as the firmware can't
handle that command.

diff -rup linux-2.6.20-git6/drivers/ata/libata-core.c 
linux-2.6.20-git6edit/drivers/ata/libata-core.c
--- linux-2.6.20-git6/drivers/ata/libata-core.c 2007-02-11 17:31:19.0 
-0600
+++ linux-2.6.20-git6edit/drivers/ata/libata-core.c 2007-02-11 
21:43:11.0 -0600
@@ -85,9 +85,9 @@ int atapi_dmadir = 0;
module_param(atapi_dmadir, int, 0444);
MODULE_PARM_DESC(atapi_dmadir, "Enable ATAPI DMADIR bridge support (0=off, 
1=on)");

-int libata_fua = 0;
+int libata_fua = 1;
module_param_named(fua, libata_fua, int, 0444);
-MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on)");
+MODULE_PARM_DESC(fua, "FUA support (0=off, 1=on for NCQ drives only, 2=on)");

static int ata_probe_timeout = ATA_TMOUT_INTERNAL / HZ;
module_param(ata_probe_timeout, int, 0444);
diff -rup linux-2.6.20-git6/drivers/ata/libata-scsi.c 
linux-2.6.20-git6edit/drivers/ata/libata-scsi.c
--- linux-2.6.20-git6/drivers/ata/libata-scsi.c 2007-02-11 17:31:19.0 
-0600
+++ linux-2.6.20-git6edit/drivers/ata/libata-scsi.c 2007-02-11 
23:07:35.0 -0600
@@ -1002,6 +1002,16 @@ int ata_scsi_change_queue_depth(struct s

scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, queue_depth);

+   /* Note: NCQ is switched off if queue depth is set to 1.
+  Thus changing the depth may also enable/disable FUA,
+  which the SCSI layer needs to know about, so we trigger
+  a revalidate. */
+   if((queue_depth == 1 && !(dev->flags & ATA_DFLAG_NCQ_OFF)) ||
+  (queue_depth > 1 && (dev->flags & ATA_DFLAG_NCQ_OFF))) {
+   ap->eh_info.action |= ATA_EH_REVALIDATE;
+   ata_port_schedule_eh(ap);
+   }
+
spin_lock_irqsave(ap->lock, flags);
if (queue_depth > 1)
dev->flags &= ~ATA_DFLAG_NCQ_OFF;
@@ -1990,27 +2000,46 @@ static unsigned int ata_msense_rw_recove
}

/*
- * We can turn this into a real blacklist if it's needed, for now just
- * blacklist any Maxtor BANC1G10 revision firmware
+ * ata_dev_supports_fua - Determine if this device supports FUA.
+ * @dev: Device to check
+ *
+ * Determine if this device supports FUA based on drive and
+ * controller capabilities.
+ *
+ * LOCKING:
+ * None.
 */
-static int ata_dev_supports_fua(u16 *id)
+static int ata_dev_supports_fua(struct ata_device* dev)
{
-   unsigned char model[ATA_ID_PROD_LEN + 1], fw[ATA_ID_FW_REV_LEN + 1];
-
+   /* Is FUA completely disabled? */
if (!libata_fua)
return 0;
-   if (!ata_id_has_fua(id))
+   
+   /* Does the drive support FUA?
+  NCQ-enabled drives always support FUA, otherwise
+  check if the drive indicates support for FUA commands. */
+  

libata FUA revisited

2007-02-11 Thread Robert Hancock
I've been looking at some list archives from about a year ago when there 
was a big hoohah about FUA in libata. To summarize what I've gotten from 
that discussion:


Nicolas Mailhot ran into a problem with the first kernels that supported 
libata FUA, using a Silicon Image 3114 controller and a Maxtor 6L300S0 
   drive with BANC1G10 firmware. Essentially it would quickly corrupt 
the filesystem on bootup. After that:


-A blacklist entry was added into libata disabling FUA on Maxtor drives 
with BANC1G10 firmware


-Eric Mudama from Maxtor complained that there was nothing wrong with 
FUA on those drives and the blacklist should be taken out (though it 
never was)


-It was also confirmed by Eric and others that Silicon Image 311x 
controllers go nuts if they're issued WRITE DMA FUA commands, at least 
without some driver improvements which I assume haven't happened.


-Eventually FUA was disabled by default globally in libata.

Given the above, what I'm proposing to do is:

-Remove the blacklisting of Maxtor BANC1G10 firmware for FUA. If we need 
to FUA-blacklist any drives this should likely be added to the existing 
"horkage" mechanism we now have. However, at this point I don't think 
that's needed, considering that I've seen no conclusive evidence that 
any drive has ever been established to have broken FUA.


-Add a new port flag ATA_FLAG_NO_FUA to indicate that a controller can't 
handle FUA commands, and add that flag to sata_sil. Force FUA off on any 
drive connected to a controller with this bit set.


There was some talk that sata_mv might have this problem, but I believe 
the conclusion was that it didn't. The only controllers that would are 
ones that actually try to interpret the ATA command codes and don't know 
about WRITE DMA FUA.


-Change the fua module option to control FUA enable/disable to have a 
third value, "enable for NCQ-supporting drives only", which would become 
the new default. That case seems less likely to cause problems since FUA 
on NCQ is just another bit in the command whereas FUA on non-NCQ is an 
entirely different, potentially unsupported command.


Any comments?

As an aside, I came across a comment that the Silicon Image Windows 
drivers for NCQ-supporting controllers have some blacklist entries for 
drives with broken NCQ in their .inf files. We only seem to have one in 
the libata NCQ blacklist, we may want to add some more of these. The 
ones in the current SiI3124 and 3132 drivers' .inf files for 
"DisableSataQueueing" appear to be:


Model   Firmware
Maxtor 7B250S0  BANC1B70
HTS541060G9SA00 MB3OC60D
HTS541080G9SA00 MB4OC60D
HTS541010G9SA00 MBZOC60D

(the latter 3 being Hitachi Travelstar drives)

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove "nospam" from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


libata FUA revisited

2007-02-11 Thread Robert Hancock
I've been looking at some list archives from about a year ago when there 
was a big hoohah about FUA in libata. To summarize what I've gotten from 
that discussion:


Nicolas Mailhot ran into a problem with the first kernels that supported 
libata FUA, using a Silicon Image 3114 controller and a Maxtor 6L300S0 
   drive with BANC1G10 firmware. Essentially it would quickly corrupt 
the filesystem on bootup. After that:


-A blacklist entry was added into libata disabling FUA on Maxtor drives 
with BANC1G10 firmware


-Eric Mudama from Maxtor complained that there was nothing wrong with 
FUA on those drives and the blacklist should be taken out (though it 
never was)


-It was also confirmed by Eric and others that Silicon Image 311x 
controllers go nuts if they're issued WRITE DMA FUA commands, at least 
without some driver improvements which I assume haven't happened.


-Eventually FUA was disabled by default globally in libata.

Given the above, what I'm proposing to do is:

-Remove the blacklisting of Maxtor BANC1G10 firmware for FUA. If we need 
to FUA-blacklist any drives this should likely be added to the existing 
horkage mechanism we now have. However, at this point I don't think 
that's needed, considering that I've seen no conclusive evidence that 
any drive has ever been established to have broken FUA.


-Add a new port flag ATA_FLAG_NO_FUA to indicate that a controller can't 
handle FUA commands, and add that flag to sata_sil. Force FUA off on any 
drive connected to a controller with this bit set.


There was some talk that sata_mv might have this problem, but I believe 
the conclusion was that it didn't. The only controllers that would are 
ones that actually try to interpret the ATA command codes and don't know 
about WRITE DMA FUA.


-Change the fua module option to control FUA enable/disable to have a 
third value, enable for NCQ-supporting drives only, which would become 
the new default. That case seems less likely to cause problems since FUA 
on NCQ is just another bit in the command whereas FUA on non-NCQ is an 
entirely different, potentially unsupported command.


Any comments?

As an aside, I came across a comment that the Silicon Image Windows 
drivers for NCQ-supporting controllers have some blacklist entries for 
drives with broken NCQ in their .inf files. We only seem to have one in 
the libata NCQ blacklist, we may want to add some more of these. The 
ones in the current SiI3124 and 3132 drivers' .inf files for 
DisableSataQueueing appear to be:


Model   Firmware
Maxtor 7B250S0  BANC1B70
HTS541060G9SA00 MB3OC60D
HTS541080G9SA00 MB4OC60D
HTS541010G9SA00 MBZOC60D

(the latter 3 being Hitachi Travelstar drives)

--
Robert Hancock  Saskatoon, SK, Canada
To email, remove nospam from [EMAIL PROTECTED]
Home Page: http://www.roberthancock.com/
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: libata FUA revisited

2007-02-11 Thread Robert Hancock

Robert Hancock wrote:

Given the above, what I'm proposing to do is:

-Remove the blacklisting of Maxtor BANC1G10 firmware for FUA. If we need 
to FUA-blacklist any drives this should likely be added to the existing 
horkage mechanism we now have. However, at this point I don't think 
that's needed, considering that I've seen no conclusive evidence that 
any drive has ever been established to have broken FUA.


-Add a new port flag ATA_FLAG_NO_FUA to indicate that a controller can't 
handle FUA commands, and add that flag to sata_sil. Force FUA off on any 
drive connected to a controller with this bit set.


There was some talk that sata_mv might have this problem, but I believe 
the conclusion was that it didn't. The only controllers that would are 
ones that actually try to interpret the ATA command codes and don't know 
about WRITE DMA FUA.


-Change the fua module option to control FUA enable/disable to have a 
third value, enable for NCQ-supporting drives only, which would become 
the new default. That case seems less likely to cause problems since FUA 
on NCQ is just another bit in the command whereas FUA on non-NCQ is an 
entirely different, potentially unsupported command.


OK, here's what I've got to implement the above, and a few other things -
not submitted for inclusion yet as I'd like to get a few comments.

This centralizes the logic in one place for deciding whether to use FUA
or not. It also modifies the logic to account for the fact that when
NCQ is enabled we should always be able to use FUA, since it's inherent
in the definition of the NCQ commands. Since enabling and disabling NCQ
can thus also enable/disable FUA (if the drive doesn't support non-NCQ
FUA) we need to revalidate the device when doing this on change_queue_depth
so that the SCSI layer sees the change.

(I tried to test this, but wasn't able to actually change the queue depth
using the /sys/block/sda/device/queue_depth file. The queue_depth attribute
started out as r--r--r--, I tried chmod u+w and writing it but just got an
Input/output error. Did somebody break or disable this functionality?)

Also, as well as setting ATA_FLAG_NO_FUA in sata_sil it appears that
pata_it821x also needs FUA disabled when in smart mode as the firmware can't
handle that command.

diff -rup linux-2.6.20-git6/drivers/ata/libata-core.c 
linux-2.6.20-git6edit/drivers/ata/libata-core.c
--- linux-2.6.20-git6/drivers/ata/libata-core.c 2007-02-11 17:31:19.0 
-0600
+++ linux-2.6.20-git6edit/drivers/ata/libata-core.c 2007-02-11 
21:43:11.0 -0600
@@ -85,9 +85,9 @@ int atapi_dmadir = 0;
module_param(atapi_dmadir, int, 0444);
MODULE_PARM_DESC(atapi_dmadir, Enable ATAPI DMADIR bridge support (0=off, 
1=on));

-int libata_fua = 0;
+int libata_fua = 1;
module_param_named(fua, libata_fua, int, 0444);
-MODULE_PARM_DESC(fua, FUA support (0=off, 1=on));
+MODULE_PARM_DESC(fua, FUA support (0=off, 1=on for NCQ drives only, 2=on));

static int ata_probe_timeout = ATA_TMOUT_INTERNAL / HZ;
module_param(ata_probe_timeout, int, 0444);
diff -rup linux-2.6.20-git6/drivers/ata/libata-scsi.c 
linux-2.6.20-git6edit/drivers/ata/libata-scsi.c
--- linux-2.6.20-git6/drivers/ata/libata-scsi.c 2007-02-11 17:31:19.0 
-0600
+++ linux-2.6.20-git6edit/drivers/ata/libata-scsi.c 2007-02-11 
23:07:35.0 -0600
@@ -1002,6 +1002,16 @@ int ata_scsi_change_queue_depth(struct s

scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, queue_depth);

+   /* Note: NCQ is switched off if queue depth is set to 1.
+  Thus changing the depth may also enable/disable FUA,
+  which the SCSI layer needs to know about, so we trigger
+  a revalidate. */
+   if((queue_depth == 1  !(dev-flags  ATA_DFLAG_NCQ_OFF)) ||
+  (queue_depth  1  (dev-flags  ATA_DFLAG_NCQ_OFF))) {
+   ap-eh_info.action |= ATA_EH_REVALIDATE;
+   ata_port_schedule_eh(ap);
+   }
+
spin_lock_irqsave(ap-lock, flags);
if (queue_depth  1)
dev-flags = ~ATA_DFLAG_NCQ_OFF;
@@ -1990,27 +2000,46 @@ static unsigned int ata_msense_rw_recove
}

/*
- * We can turn this into a real blacklist if it's needed, for now just
- * blacklist any Maxtor BANC1G10 revision firmware
+ * ata_dev_supports_fua - Determine if this device supports FUA.
+ * @dev: Device to check
+ *
+ * Determine if this device supports FUA based on drive and
+ * controller capabilities.
+ *
+ * LOCKING:
+ * None.
 */
-static int ata_dev_supports_fua(u16 *id)
+static int ata_dev_supports_fua(struct ata_device* dev)
{
-   unsigned char model[ATA_ID_PROD_LEN + 1], fw[ATA_ID_FW_REV_LEN + 1];
-
+   /* Is FUA completely disabled? */
if (!libata_fua)
return 0;
-   if (!ata_id_has_fua(id))
+   
+   /* Does the drive support FUA?
+  NCQ-enabled drives always support FUA, otherwise
+  check if the drive indicates support for FUA commands. */
+   if((dev-flags