Re: Multi-Actuator SAS HDD First Look

2018-04-18 Thread Tim Walker
On Wed, Apr 18, 2018 at 10:20 AM, Bart Van Assche
 wrote:
> On Wed, 2018-04-18 at 05:16 -0600, Tim Walker wrote:
>> It would be good if we could set up an informal meeting time and
>> location at LSFMM to discuss these dual actuator topics. so far you,
>> Doug, Hannes, and Christoph have expressed the most interest, plus
>> Damien. Can we set an hour aside one afternoon?
>
> Hello Tim,
>
> Had you noticed that the official LSF/MM agenda mentions the following:
> Tuesday, 11 - 11:30 AM: James Borden, Multi-actuator disk drives.
>
> Thanks,
>
> Bart.
>
>
>

No, I hadn't checked that, yet. James probably won't be there. I guess
that solves my problem. Thanks.

-- 
Tim Walker
Product Design Systems Engineering, Seagate Technology
(303) 775-3770


Re: Multi-Actuator SAS HDD First Look

2018-04-18 Thread Bart Van Assche
On Wed, 2018-04-18 at 05:16 -0600, Tim Walker wrote:
> It would be good if we could set up an informal meeting time and
> location at LSFMM to discuss these dual actuator topics. so far you,
> Doug, Hannes, and Christoph have expressed the most interest, plus
> Damien. Can we set an hour aside one afternoon?

Hello Tim,

Had you noticed that the official LSF/MM agenda mentions the following:
Tuesday, 11 - 11:30 AM: James Borden, Multi-actuator disk drives.

Thanks,

Bart.





Re: Multi-Actuator SAS HDD First Look

2018-04-18 Thread Tim Walker
On Sun, Apr 15, 2018 at 10:31 PM, Bart Van Assche
 wrote:
> On Sun, 2018-04-15 at 19:35 -0600, Tim Walker wrote:
>> I also believe the dual-actuator, or any significant HDD parallelism,
>> would map well onto an NVMe target, or NVMe-oF behind nvmet. Maybe a
>> lightweight virtual NVMe controller that would efficiently present the
>> HDD logs/mode pages/etc via the admin queue and the LUNs as fixed
>> namespaces...?
>>
>> Doug, I will flesh your three LUN idea out some more and send it up
>> the flagpole over here. Thanks for the input.
>>
>> I'd like to have a conversation at LSFMM and maybe pull together a
>> fairly well defined consensus recommendation. Is that possible? Can we
>> schedule it?
>
> Hello Tim,
>
> I think that you have to submit a request to the LSF/MM program committee
> (lsf...@lists.linux-foundation.org) to add an item to the official agenda.
> In case this topic wouldn't be added to the official agenda we can still
> discuss this topic in a meeting room at the LSF/MM location.
>
> Bart.
>
>
>

Hello Bart-

It would be good if we could set up an informal meeting time and
location at LSFMM to discuss these dual actuator topics. so far you,
Doug, Hannes, and Christoph have expressed the most interest, plus
Damien. Can we set an hour aside one afternoon?

-Tim

-- 
Tim Walker
Product Design Systems Engineering, Seagate Technology
(303) 775-3770


Re: Multi-Actuator SAS HDD First Look

2018-04-15 Thread Bart Van Assche
On Sun, 2018-04-15 at 19:35 -0600, Tim Walker wrote:
> I also believe the dual-actuator, or any significant HDD parallelism,
> would map well onto an NVMe target, or NVMe-oF behind nvmet. Maybe a
> lightweight virtual NVMe controller that would efficiently present the
> HDD logs/mode pages/etc via the admin queue and the LUNs as fixed
> namespaces...?
> 
> Doug, I will flesh your three LUN idea out some more and send it up
> the flagpole over here. Thanks for the input.
> 
> I'd like to have a conversation at LSFMM and maybe pull together a
> fairly well defined consensus recommendation. Is that possible? Can we
> schedule it?

Hello Tim,

I think that you have to submit a request to the LSF/MM program committee
(lsf...@lists.linux-foundation.org) to add an item to the official agenda.
In case this topic wouldn't be added to the official agenda we can still
discuss this topic in a meeting room at the LSF/MM location.

Bart.





Re: Multi-Actuator SAS HDD First Look

2018-04-15 Thread Tim Walker
On Mon, Apr 9, 2018 at 10:02 AM, Douglas Gilbert  wrote:
>
> On 2018-04-09 02:17 AM, Hannes Reinecke wrote:
>>
>> On 04/09/2018 04:08 AM, Tim Walker wrote:
>>>
>>> On Fri, Apr 6, 2018 at 11:09 AM, Douglas Gilbert  
>>> wrote:


 On 2018-04-06 02:42 AM, Christoph Hellwig wrote:
>
>
> On Fri, Apr 06, 2018 at 08:24:18AM +0200, Hannes Reinecke wrote:
>>
>>
>> Ah. Far better.
>> What about delegating FORMAT UNIT to the control LUN, and not
>> implementing it for the individual disk LUNs?
>> That would make an even stronger case for having a control LUN;
>> with that there wouldn't be any problem with having to synchronize
>> across LUNs etc.
>
>
>
> It sounds to me like NVMe might be a much better model for this drive
> than SCSI, btw :)



 So you found a document that outlines NVMe's architecture! Could you
 share the url (no marketing BS, please)?


 And a serious question ... How would you map NVMe's (in Linux)
 subsystem number, controller device minor number, CNTLID field
 (Identify ctl response) and namespace id onto the SCSI subsystem's
 h:c:t:l ?

 Doug Gilbert

>>>
>>> Hannes- yes, a drive system altering operation like FORMAT UNIT is
>>> asking for a dedicated management port, as the NVMe folks apparently
>>> felt. But what is the least painful endpoint type for LUN0?
>>>
>>>
>> I would probably use 'processor device' (ie type 3) as it's the least
>> defined, so you can do basically everything you like with it.
>> Possibly 'Enclosure Services' (type 0x0d) works, too, but then you have
>> to check with the SES spec if it allows the things you'd need.
>
>
> Processor device type (0x3) please. Then you are only required to support
> the mandatory commands in SPC and that is not many. And then nothing
> precludes you from implementing Start Stop Unit, Sanitize and/or Format
> Unit on it. And as I pointed out earlier, you could even throw in a
> copy manager (see SPC). Also as far as I know Linux, FreeBSD and Windows
> will leave a Processor device type LU alone and just have a SCSI
> pass-through device attached to it, and that is exactly what you want.
> By default only root/administrator can open those pass-through devices.
>
> If you chose SES type (0xd) then the Linux kernel ses driver (and the
> FreeBSD equivalent) would attempt to attach to it before the user space
> could countermand it (as things stand). And SES additionally makes the
> SEND DIAGNOSTIC and RECEIVE DIAGNOSTIC RESULTS commands mandatory and
> at least one diagnostic page (0x0) mandatory. If it doesn't supply
> any other SES dpages then those two drivers are going to get pretty
> confused (which would be a good test for them). Also it could get
> confusing from an administration point of view. I'm guessing many of
> these Multi-Actuator SAS HDDs will end up in big enclosures. And
> those enclosures most likely would present a SES device. Multiple dummy
> enclosures within a real enclosure will look strange (especially as
> SES already has a concept of a sub-enclosure).
>
> Doug Gilbert
>
>

I also believe the dual-actuator, or any significant HDD parallelism,
would map well onto an NVMe target, or NVMe-oF behind nvmet. Maybe a
lightweight virtual NVMe controller that would efficiently present the
HDD logs/mode pages/etc via the admin queue and the LUNs as fixed
namespaces...?

Doug, I will flesh your three LUN idea out some more and send it up
the flagpole over here. Thanks for the input.

I'd like to have a conversation at LSFMM and maybe pull together a
fairly well defined consensus recommendation. Is that possible? Can we
schedule it?

Best regards,
-Tim


Re: Multi-Actuator SAS HDD First Look

2018-04-09 Thread Douglas Gilbert

On 2018-04-09 02:17 AM, Hannes Reinecke wrote:

On 04/09/2018 04:08 AM, Tim Walker wrote:

On Fri, Apr 6, 2018 at 11:09 AM, Douglas Gilbert  wrote:


On 2018-04-06 02:42 AM, Christoph Hellwig wrote:


On Fri, Apr 06, 2018 at 08:24:18AM +0200, Hannes Reinecke wrote:


Ah. Far better.
What about delegating FORMAT UNIT to the control LUN, and not
implementing it for the individual disk LUNs?
That would make an even stronger case for having a control LUN;
with that there wouldn't be any problem with having to synchronize
across LUNs etc.



It sounds to me like NVMe might be a much better model for this drive
than SCSI, btw :)



So you found a document that outlines NVMe's architecture! Could you
share the url (no marketing BS, please)?


And a serious question ... How would you map NVMe's (in Linux)
subsystem number, controller device minor number, CNTLID field
(Identify ctl response) and namespace id onto the SCSI subsystem's
h:c:t:l ?

Doug Gilbert



Hannes- yes, a drive system altering operation like FORMAT UNIT is
asking for a dedicated management port, as the NVMe folks apparently
felt. But what is the least painful endpoint type for LUN0?



I would probably use 'processor device' (ie type 3) as it's the least
defined, so you can do basically everything you like with it.
Possibly 'Enclosure Services' (type 0x0d) works, too, but then you have
to check with the SES spec if it allows the things you'd need.


Processor device type (0x3) please. Then you are only required to support
the mandatory commands in SPC and that is not many. And then nothing
precludes you from implementing Start Stop Unit, Sanitize and/or Format
Unit on it. And as I pointed out earlier, you could even throw in a
copy manager (see SPC). Also as far as I know Linux, FreeBSD and Windows
will leave a Processor device type LU alone and just have a SCSI
pass-through device attached to it, and that is exactly what you want.
By default only root/administrator can open those pass-through devices.

If you chose SES type (0xd) then the Linux kernel ses driver (and the
FreeBSD equivalent) would attempt to attach to it before the user space
could countermand it (as things stand). And SES additionally makes the
SEND DIAGNOSTIC and RECEIVE DIAGNOSTIC RESULTS commands mandatory and
at least one diagnostic page (0x0) mandatory. If it doesn't supply
any other SES dpages then those two drivers are going to get pretty
confused (which would be a good test for them). Also it could get
confusing from an administration point of view. I'm guessing many of
these Multi-Actuator SAS HDDs will end up in big enclosures. And
those enclosures most likely would present a SES device. Multiple dummy
enclosures within a real enclosure will look strange (especially as
SES already has a concept of a sub-enclosure).

Doug Gilbert




Re: Multi-Actuator SAS HDD First Look

2018-04-09 Thread Christoph Hellwig
On Fri, Apr 06, 2018 at 01:09:08PM -0400, Douglas Gilbert wrote:
> So you found a document that outlines NVMe's architecture! Could you
> share the url (no marketing BS, please)?

You can always take a look at the actual spec:

http://nvmexpress.org/wp-content/uploads/NVM-Express-1_3a-20171024_ratified.pdf

But in summary: while in SCSI your Nexus for any command is with the
logic unit, in NVMe it is with the controller.   Many admin commands
operate on the whole controller.

> And a serious question ... How would you map NVMe's (in Linux)
> subsystem number, controller device minor number, CNTLID field
> (Identify ctl response) and namespace id onto the SCSI subsystem's
> h:c:t:l ?

I wouldn't because the scheme already doesn't make any sense for SCSI,
nevermind should we try to map NVMe into a scsi specific worldview.

> 
> Doug Gilbert
> 
---end quoted text---


Re: Multi-Actuator SAS HDD First Look

2018-04-09 Thread Hannes Reinecke
On 04/09/2018 04:08 AM, Tim Walker wrote:
> On Fri, Apr 6, 2018 at 11:09 AM, Douglas Gilbert  
> wrote:
>>
>> On 2018-04-06 02:42 AM, Christoph Hellwig wrote:
>>>
>>> On Fri, Apr 06, 2018 at 08:24:18AM +0200, Hannes Reinecke wrote:

 Ah. Far better.
 What about delegating FORMAT UNIT to the control LUN, and not
 implementing it for the individual disk LUNs?
 That would make an even stronger case for having a control LUN;
 with that there wouldn't be any problem with having to synchronize
 across LUNs etc.
>>>
>>>
>>> It sounds to me like NVMe might be a much better model for this drive
>>> than SCSI, btw :)
>>
>>
>> So you found a document that outlines NVMe's architecture! Could you
>> share the url (no marketing BS, please)?
>>
>>
>> And a serious question ... How would you map NVMe's (in Linux)
>> subsystem number, controller device minor number, CNTLID field
>> (Identify ctl response) and namespace id onto the SCSI subsystem's
>> h:c:t:l ?
>>
>> Doug Gilbert
>>
> 
> Hannes- yes, a drive system altering operation like FORMAT UNIT is
> asking for a dedicated management port, as the NVMe folks apparently
> felt. But what is the least painful endpoint type for LUN0?
> 
> 
I would probably use 'processor device' (ie type 3) as it's the least
defined, so you can do basically everything you like with it.
Possibly 'Enclosure Services' (type 0x0d) works, too, but then you have
to check with the SES spec if it allows the things you'd need.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: Multi-Actuator SAS HDD First Look

2018-04-08 Thread Tim Walker
On Fri, Apr 6, 2018 at 11:09 AM, Douglas Gilbert  wrote:
>
> On 2018-04-06 02:42 AM, Christoph Hellwig wrote:
>>
>> On Fri, Apr 06, 2018 at 08:24:18AM +0200, Hannes Reinecke wrote:
>>>
>>> Ah. Far better.
>>> What about delegating FORMAT UNIT to the control LUN, and not
>>> implementing it for the individual disk LUNs?
>>> That would make an even stronger case for having a control LUN;
>>> with that there wouldn't be any problem with having to synchronize
>>> across LUNs etc.
>>
>>
>> It sounds to me like NVMe might be a much better model for this drive
>> than SCSI, btw :)
>
>
> So you found a document that outlines NVMe's architecture! Could you
> share the url (no marketing BS, please)?
>
>
> And a serious question ... How would you map NVMe's (in Linux)
> subsystem number, controller device minor number, CNTLID field
> (Identify ctl response) and namespace id onto the SCSI subsystem's
> h:c:t:l ?
>
> Doug Gilbert
>

Hannes- yes, a drive system altering operation like FORMAT UNIT is
asking for a dedicated management port, as the NVMe folks apparently
felt. But what is the least painful endpoint type for LUN0?


-- 
Tim Walker
Product Design Systems Engineering, Seagate Technology
(303) 775-3770


Re: Multi-Actuator SAS HDD First Look

2018-04-06 Thread Douglas Gilbert

On 2018-04-06 02:42 AM, Christoph Hellwig wrote:

On Fri, Apr 06, 2018 at 08:24:18AM +0200, Hannes Reinecke wrote:

Ah. Far better.
What about delegating FORMAT UNIT to the control LUN, and not
implementing it for the individual disk LUNs?
That would make an even stronger case for having a control LUN;
with that there wouldn't be any problem with having to synchronize
across LUNs etc.


It sounds to me like NVMe might be a much better model for this drive
than SCSI, btw :)


So you found a document that outlines NVMe's architecture! Could you
share the url (no marketing BS, please)?


And a serious question ... How would you map NVMe's (in Linux)
subsystem number, controller device minor number, CNTLID field
(Identify ctl response) and namespace id onto the SCSI subsystem's
h:c:t:l ?

Doug Gilbert



Re: Multi-Actuator SAS HDD First Look

2018-04-06 Thread Christoph Hellwig
On Fri, Apr 06, 2018 at 08:24:18AM +0200, Hannes Reinecke wrote:
> Ah. Far better.
> What about delegating FORMAT UNIT to the control LUN, and not
> implementing it for the individual disk LUNs?
> That would make an even stronger case for having a control LUN;
> with that there wouldn't be any problem with having to synchronize
> across LUNs etc.

It sounds to me like NVMe might be a much better model for this drive
than SCSI, btw :)


Re: Multi-Actuator SAS HDD First Look

2018-04-06 Thread Hannes Reinecke
On Thu, 5 Apr 2018 17:43:46 -0600
Tim Walker  wrote:

> On Tue, Apr 3, 2018 at 1:46 AM, Christoph Hellwig 
> wrote:
> > On Sat, Mar 31, 2018 at 01:03:46PM +0200, Hannes Reinecke wrote:  
> >> Actually I would propose to have a 'management' LUN at LUN0, who
> >> could handle all the device-wide commands (eg things like START
> >> STOP UNIT, firmware update, or even SMART commands), and ignoring
> >> them for the remaining LUNs.  
> >
> > That is in fact the only workable option at all.  Everything else
> > completly breaks the scsi architecture.  
> 
> Here's an update: Seagate will eliminate the inter-LU actions from
> FORMAT UNIT and SANITIZE. Probably SANITIZE will be per-LUN, but
> FORMAT UNIT is trickier due to internal drive architecture, and how
> FORMAT UNIT initializes on-disk metadata. Likely it will require some
> sort of synchronization across LUNs, such as the command being sent to
> both LUNs sequentially or something similar. We are also considering
> not supporting FORMAT UNIT at all - would anybody object? Any other
> suggestions?
> 

Ah. Far better.
What about delegating FORMAT UNIT to the control LUN, and not
implementing it for the individual disk LUNs?
That would make an even stronger case for having a control LUN;
with that there wouldn't be any problem with having to synchronize
across LUNs etc.

Cheers,

Hannes


Re: Multi-Actuator SAS HDD First Look

2018-04-05 Thread Douglas Gilbert

On 2018-04-05 07:43 PM, Tim Walker wrote:

On Tue, Apr 3, 2018 at 1:46 AM, Christoph Hellwig  wrote:

On Sat, Mar 31, 2018 at 01:03:46PM +0200, Hannes Reinecke wrote:

Actually I would propose to have a 'management' LUN at LUN0, who could
handle all the device-wide commands (eg things like START STOP UNIT,
firmware update, or even SMART commands), and ignoring them for the
remaining LUNs.


That is in fact the only workable option at all.  Everything else
completly breaks the scsi architecture.


Here's an update: Seagate will eliminate the inter-LU actions from
FORMAT UNIT and SANITIZE. Probably SANITIZE will be per-LUN, but
FORMAT UNIT is trickier due to internal drive architecture, and how
FORMAT UNIT initializes on-disk metadata. Likely it will require some
sort of synchronization across LUNs, such as the command being sent to
both LUNs sequentially or something similar. We are also considering
not supporting FORMAT UNIT at all - would anybody object? Any other
suggestions?


Good, that is progress. [But you still only have one spindle.]

If Protection Information (PI) or changing the logical block size between
512 and 4096 bytes per block are options, then you need FU for that.
But does it need to take 900 minutes like one I got recently from S..?
Couldn't the actual reformatting of a track be deferred until the first
block written to that track?

Doug Gilbert



Re: Multi-Actuator SAS HDD First Look

2018-04-05 Thread Tim Walker
On Tue, Apr 3, 2018 at 1:46 AM, Christoph Hellwig  wrote:
> On Sat, Mar 31, 2018 at 01:03:46PM +0200, Hannes Reinecke wrote:
>> Actually I would propose to have a 'management' LUN at LUN0, who could
>> handle all the device-wide commands (eg things like START STOP UNIT,
>> firmware update, or even SMART commands), and ignoring them for the
>> remaining LUNs.
>
> That is in fact the only workable option at all.  Everything else
> completly breaks the scsi architecture.

Here's an update: Seagate will eliminate the inter-LU actions from
FORMAT UNIT and SANITIZE. Probably SANITIZE will be per-LUN, but
FORMAT UNIT is trickier due to internal drive architecture, and how
FORMAT UNIT initializes on-disk metadata. Likely it will require some
sort of synchronization across LUNs, such as the command being sent to
both LUNs sequentially or something similar. We are also considering
not supporting FORMAT UNIT at all - would anybody object? Any other
suggestions?

-- 
Tim Walker
Product Design Systems Engineering, Seagate Technology
(303) 775-3770


Re: Multi-Actuator SAS HDD First Look

2018-04-03 Thread Christoph Hellwig
On Sat, Mar 31, 2018 at 01:03:46PM +0200, Hannes Reinecke wrote:
> Actually I would propose to have a 'management' LUN at LUN0, who could
> handle all the device-wide commands (eg things like START STOP UNIT,
> firmware update, or even SMART commands), and ignoring them for the
> remaining LUNs.

That is in fact the only workable option at all.  Everything else
completly breaks the scsi architecture.


Re: Multi-Actuator SAS HDD First Look

2018-04-02 Thread Tim Walker
On Mon, Apr 2, 2018 at 10:29 AM, Douglas Gilbert  wrote:
> On 2018-04-02 11:34 AM, Tim Walker wrote:
>>
>> On Sat, Mar 31, 2018 at 10:52 AM, Douglas Gilbert 
>> wrote:
>>>
>>> On 2018-03-30 04:01 PM, Bart Van Assche wrote:


 On Fri, 2018-03-30 at 12:36 -0600, Tim Walker wrote:
>
>
> Yes I will be there to discuss the multi-LUN approach. I wanted to get
> these interface details out so we could have some background and
> perhaps folks would come with ideas. I don't have much more to put
> out, but I will certainly answer questions - either on this list or
> directly.



 Hello Tim,

 As far as I know the Linux SCSI stack does not yet support any SCSI
 devices
 for which a single SCSI command affects multiple (two in this case)
 LUNs.
 Adding such support may take significant work. There will also be the
 disadvantage that most SCSI core contributors do not have access to a
 multi-
 actuator device and hence that their changes may break support for
 multi-
 actuator devices.
>>>
>>>
>>>
>>> Hmmm, INQUIRY (3PC bit) and REPORT LUNS seem to be counter examples to
>>> Bart's assertion. Plus there are a more that tell you about things
>>> outside
>>> the addressed LU, for example the SCSI Ports VPD page tells you about
>>> other
>>> SCSI ports hence other LUs) on the current device.
>>>
>>>
>>>  From Tim's command list:
>>>
>>> Device level
>>> 
>>> 0x0, 0x1: okay
>>> 0x4 (Format unit): yikes, that could be a nasty surprise, accessing a
>>> file
>>>system on the other LU and getting an error "Not ready, format in
>>> progress"!!
>>> 0x12: standard INQUIRY okay, VPD pages not so much LU id different;
>>> relative
>>>port id, different; target port id different (at the least)
>>> 0x1b (SSU): storage LUs need to know this model, otherwise the logic on
>>>each LU could get into a duel: "I want it spun up; no, I want it spun
>>>down ..."
>>> 0x35, 0x37, 0x3b, 0x3c: okay
>>> 0x48 (sanitize): similar to Format unit above
>>> 0x91,0x4c,0x4d: okay
>>> MODE SENSE/SELECT(6,10): depends on page, block descriptor needs to be
>>>partially device level (since LB size may be changed by FU which is
>>>device level)
>>> rest of device level: okay or (I) don't know
>>> 0xf7: READ UDS DATA, that's interesting, but proprietary I guess
>>>
>>> Perhaps you could add a rider on FU and SAN: they get rejected unless the
>>> other storage LU is in (logical) spun down state.
>>>
>>>
>>> LU specific
>>> ---
>>> all okay, I hoping READ(6,10,12,16,32) and their WRITE cousins will be
>>>there also :-) Plus the TMF: LU reset
>>>
>>> Device or LU
>>> 
>>> all okay
>>>
>>>
>>> I'm intrigued by your 3 LU model. My wish list for that:
>>>
>>> LUN 0 would be processor device type (0x3) so it wouldn't confuse the
>>> OS (Linux) that it held storage (READ CAPACITY is mandatory for PDT 0x0
>>> and cannot represent a 0 block LU) and you could pick and choose which
>>> SCSI commands to implement on it. LUN 0 TUR could report what the spindle
>>> was really doing, SSU could do precisely what it is told (and SSU on LUNs
>>> 1 and 2 would be an "and" for spin down and an "or" for spin up). I've
>>> got several boxes full of SAS cables and only one cable that I can think
>>> of that lets me get to the secondary SAS port. So on LUN 0 you could have
>>> a proprietary (unless T10 takes it on board) mode page to enable the user
>>> to say which ports that LUNs 1 and 2 where accessible on. Obviously LUN 0
>>> would need to be accessible on both ports. [A non-accessible LUN would
>>> still respond to INQUIRY but with a first byte of 07f: PDT=0x1f (unknown)
>>> and PQ=3 which implies it is there but inaccessible via the current
>>> port.]
>>>
>>> A processor PDT opens up the possibility of putting a copy manager on
>>> LUN 0. Think offloaded (from main machine's perspective) backups and
>>> restores where LUN 1 or 2 is the source or destination.
>>>
>>> Enough dreaming.
>>>
>>> Doug Gilbert
>>
>>
>>
>>
>> Thanks for all the input from everybody. I'll collate it for a meeting
>> with our interface architect.
>
>
> Just to amplify the case against a FU or SAN being allowed at almost any
> time from either storage LU irrespective of what the other was doing.
> Imagine one initiator has some important data on one LU and has an
> EXCLUSIVE ACCESS persistent reservation on it. Then another initiator
> (e.g. on a different machine) sends a FU to the other LU which is
> honoured, wiping the whole device. One unhappy customer 
>
> Doug Gilbert

Thanks, Doug. That is very clear, and I get it. We'll have a solution.

Best regards,
-Tim

-- 
Tim Walker
Product Design Systems Engineering, Seagate Technology
(303) 775-3770


Re: Multi-Actuator SAS HDD First Look

2018-04-02 Thread Douglas Gilbert

On 2018-04-02 11:34 AM, Tim Walker wrote:

On Sat, Mar 31, 2018 at 10:52 AM, Douglas Gilbert  wrote:

On 2018-03-30 04:01 PM, Bart Van Assche wrote:


On Fri, 2018-03-30 at 12:36 -0600, Tim Walker wrote:


Yes I will be there to discuss the multi-LUN approach. I wanted to get
these interface details out so we could have some background and
perhaps folks would come with ideas. I don't have much more to put
out, but I will certainly answer questions - either on this list or
directly.



Hello Tim,

As far as I know the Linux SCSI stack does not yet support any SCSI
devices
for which a single SCSI command affects multiple (two in this case) LUNs.
Adding such support may take significant work. There will also be the
disadvantage that most SCSI core contributors do not have access to a
multi-
actuator device and hence that their changes may break support for multi-
actuator devices.



Hmmm, INQUIRY (3PC bit) and REPORT LUNS seem to be counter examples to
Bart's assertion. Plus there are a more that tell you about things outside
the addressed LU, for example the SCSI Ports VPD page tells you about other
SCSI ports hence other LUs) on the current device.


 From Tim's command list:

Device level

0x0, 0x1: okay
0x4 (Format unit): yikes, that could be a nasty surprise, accessing a file
   system on the other LU and getting an error "Not ready, format in
progress"!!
0x12: standard INQUIRY okay, VPD pages not so much LU id different; relative
   port id, different; target port id different (at the least)
0x1b (SSU): storage LUs need to know this model, otherwise the logic on
   each LU could get into a duel: "I want it spun up; no, I want it spun
   down ..."
0x35, 0x37, 0x3b, 0x3c: okay
0x48 (sanitize): similar to Format unit above
0x91,0x4c,0x4d: okay
MODE SENSE/SELECT(6,10): depends on page, block descriptor needs to be
   partially device level (since LB size may be changed by FU which is
   device level)
rest of device level: okay or (I) don't know
0xf7: READ UDS DATA, that's interesting, but proprietary I guess

Perhaps you could add a rider on FU and SAN: they get rejected unless the
other storage LU is in (logical) spun down state.


LU specific
---
all okay, I hoping READ(6,10,12,16,32) and their WRITE cousins will be
   there also :-) Plus the TMF: LU reset

Device or LU

all okay


I'm intrigued by your 3 LU model. My wish list for that:

LUN 0 would be processor device type (0x3) so it wouldn't confuse the
OS (Linux) that it held storage (READ CAPACITY is mandatory for PDT 0x0
and cannot represent a 0 block LU) and you could pick and choose which
SCSI commands to implement on it. LUN 0 TUR could report what the spindle
was really doing, SSU could do precisely what it is told (and SSU on LUNs
1 and 2 would be an "and" for spin down and an "or" for spin up). I've
got several boxes full of SAS cables and only one cable that I can think
of that lets me get to the secondary SAS port. So on LUN 0 you could have
a proprietary (unless T10 takes it on board) mode page to enable the user
to say which ports that LUNs 1 and 2 where accessible on. Obviously LUN 0
would need to be accessible on both ports. [A non-accessible LUN would
still respond to INQUIRY but with a first byte of 07f: PDT=0x1f (unknown)
and PQ=3 which implies it is there but inaccessible via the current port.]

A processor PDT opens up the possibility of putting a copy manager on
LUN 0. Think offloaded (from main machine's perspective) backups and
restores where LUN 1 or 2 is the source or destination.

Enough dreaming.

Doug Gilbert




Thanks for all the input from everybody. I'll collate it for a meeting
with our interface architect.


Just to amplify the case against a FU or SAN being allowed at almost any
time from either storage LU irrespective of what the other was doing.
Imagine one initiator has some important data on one LU and has an
EXCLUSIVE ACCESS persistent reservation on it. Then another initiator
(e.g. on a different machine) sends a FU to the other LU which is
honoured, wiping the whole device. One unhappy customer 

Doug Gilbert


Re: Multi-Actuator SAS HDD First Look

2018-04-02 Thread Tim Walker
On Sat, Mar 31, 2018 at 10:52 AM, Douglas Gilbert  wrote:
> On 2018-03-30 04:01 PM, Bart Van Assche wrote:
>>
>> On Fri, 2018-03-30 at 12:36 -0600, Tim Walker wrote:
>>>
>>> Yes I will be there to discuss the multi-LUN approach. I wanted to get
>>> these interface details out so we could have some background and
>>> perhaps folks would come with ideas. I don't have much more to put
>>> out, but I will certainly answer questions - either on this list or
>>> directly.
>>
>>
>> Hello Tim,
>>
>> As far as I know the Linux SCSI stack does not yet support any SCSI
>> devices
>> for which a single SCSI command affects multiple (two in this case) LUNs.
>> Adding such support may take significant work. There will also be the
>> disadvantage that most SCSI core contributors do not have access to a
>> multi-
>> actuator device and hence that their changes may break support for multi-
>> actuator devices.
>
>
> Hmmm, INQUIRY (3PC bit) and REPORT LUNS seem to be counter examples to
> Bart's assertion. Plus there are a more that tell you about things outside
> the addressed LU, for example the SCSI Ports VPD page tells you about other
> SCSI ports hence other LUs) on the current device.
>
>
> From Tim's command list:
>
> Device level
> 
> 0x0, 0x1: okay
> 0x4 (Format unit): yikes, that could be a nasty surprise, accessing a file
>   system on the other LU and getting an error "Not ready, format in
> progress"!!
> 0x12: standard INQUIRY okay, VPD pages not so much LU id different; relative
>   port id, different; target port id different (at the least)
> 0x1b (SSU): storage LUs need to know this model, otherwise the logic on
>   each LU could get into a duel: "I want it spun up; no, I want it spun
>   down ..."
> 0x35, 0x37, 0x3b, 0x3c: okay
> 0x48 (sanitize): similar to Format unit above
> 0x91,0x4c,0x4d: okay
> MODE SENSE/SELECT(6,10): depends on page, block descriptor needs to be
>   partially device level (since LB size may be changed by FU which is
>   device level)
> rest of device level: okay or (I) don't know
> 0xf7: READ UDS DATA, that's interesting, but proprietary I guess
>
> Perhaps you could add a rider on FU and SAN: they get rejected unless the
> other storage LU is in (logical) spun down state.
>
>
> LU specific
> ---
> all okay, I hoping READ(6,10,12,16,32) and their WRITE cousins will be
>   there also :-) Plus the TMF: LU reset
>
> Device or LU
> 
> all okay
>
>
> I'm intrigued by your 3 LU model. My wish list for that:
>
> LUN 0 would be processor device type (0x3) so it wouldn't confuse the
> OS (Linux) that it held storage (READ CAPACITY is mandatory for PDT 0x0
> and cannot represent a 0 block LU) and you could pick and choose which
> SCSI commands to implement on it. LUN 0 TUR could report what the spindle
> was really doing, SSU could do precisely what it is told (and SSU on LUNs
> 1 and 2 would be an "and" for spin down and an "or" for spin up). I've
> got several boxes full of SAS cables and only one cable that I can think
> of that lets me get to the secondary SAS port. So on LUN 0 you could have
> a proprietary (unless T10 takes it on board) mode page to enable the user
> to say which ports that LUNs 1 and 2 where accessible on. Obviously LUN 0
> would need to be accessible on both ports. [A non-accessible LUN would
> still respond to INQUIRY but with a first byte of 07f: PDT=0x1f (unknown)
> and PQ=3 which implies it is there but inaccessible via the current port.]
>
> A processor PDT opens up the possibility of putting a copy manager on
> LUN 0. Think offloaded (from main machine's perspective) backups and
> restores where LUN 1 or 2 is the source or destination.
>
> Enough dreaming.
>
> Doug Gilbert



Thanks for all the input from everybody. I'll collate it for a meeting
with our interface architect.

Best regards,
-TIm

-- 
Tim Walker
Product Design Systems Engineering, Seagate Technology
(303) 775-3770


Re: Multi-Actuator SAS HDD First Look

2018-03-31 Thread Douglas Gilbert

On 2018-03-30 04:01 PM, Bart Van Assche wrote:

On Fri, 2018-03-30 at 12:36 -0600, Tim Walker wrote:

Yes I will be there to discuss the multi-LUN approach. I wanted to get
these interface details out so we could have some background and
perhaps folks would come with ideas. I don't have much more to put
out, but I will certainly answer questions - either on this list or
directly.


Hello Tim,

As far as I know the Linux SCSI stack does not yet support any SCSI devices
for which a single SCSI command affects multiple (two in this case) LUNs.
Adding such support may take significant work. There will also be the
disadvantage that most SCSI core contributors do not have access to a multi-
actuator device and hence that their changes may break support for multi-
actuator devices.


Hmmm, INQUIRY (3PC bit) and REPORT LUNS seem to be counter examples to
Bart's assertion. Plus there are a more that tell you about things outside
the addressed LU, for example the SCSI Ports VPD page tells you about other
SCSI ports hence other LUs) on the current device.


From Tim's command list:

Device level

0x0, 0x1: okay
0x4 (Format unit): yikes, that could be a nasty surprise, accessing a file
  system on the other LU and getting an error "Not ready, format in progress"!!
0x12: standard INQUIRY okay, VPD pages not so much LU id different; relative
  port id, different; target port id different (at the least)
0x1b (SSU): storage LUs need to know this model, otherwise the logic on
  each LU could get into a duel: "I want it spun up; no, I want it spun
  down ..."
0x35, 0x37, 0x3b, 0x3c: okay
0x48 (sanitize): similar to Format unit above
0x91,0x4c,0x4d: okay
MODE SENSE/SELECT(6,10): depends on page, block descriptor needs to be
  partially device level (since LB size may be changed by FU which is
  device level)
rest of device level: okay or (I) don't know
0xf7: READ UDS DATA, that's interesting, but proprietary I guess

Perhaps you could add a rider on FU and SAN: they get rejected unless the
other storage LU is in (logical) spun down state.


LU specific
---
all okay, I hoping READ(6,10,12,16,32) and their WRITE cousins will be
  there also :-) Plus the TMF: LU reset

Device or LU

all okay


I'm intrigued by your 3 LU model. My wish list for that:

LUN 0 would be processor device type (0x3) so it wouldn't confuse the
OS (Linux) that it held storage (READ CAPACITY is mandatory for PDT 0x0
and cannot represent a 0 block LU) and you could pick and choose which
SCSI commands to implement on it. LUN 0 TUR could report what the spindle
was really doing, SSU could do precisely what it is told (and SSU on LUNs
1 and 2 would be an "and" for spin down and an "or" for spin up). I've
got several boxes full of SAS cables and only one cable that I can think
of that lets me get to the secondary SAS port. So on LUN 0 you could have
a proprietary (unless T10 takes it on board) mode page to enable the user
to say which ports that LUNs 1 and 2 where accessible on. Obviously LUN 0
would need to be accessible on both ports. [A non-accessible LUN would
still respond to INQUIRY but with a first byte of 07f: PDT=0x1f (unknown)
and PQ=3 which implies it is there but inaccessible via the current port.]

A processor PDT opens up the possibility of putting a copy manager on
LUN 0. Think offloaded (from main machine's perspective) backups and
restores where LUN 1 or 2 is the source or destination.

Enough dreaming.

Doug Gilbert


Re: Multi-Actuator SAS HDD First Look

2018-03-31 Thread Hannes Reinecke
On 03/30/2018 08:07 PM, Tim Walker wrote:
> Hello-
> 
> Concerning how we are currently allocating commands to LUNs or the
> device as a whole, here is a list based on the current two LUN model.
> This model has LUN0 & LUN1, both reporting 1/2 the total storage. Our
> definition of "device based" is that it ignores the LUN field and
> executes the command on the entire device. In other words, sending a
> device based command to LUN1 will also act on LUN0. "LUN-based"
> commands affect only the LUN they're addressed to. I'm soliciting
> feedback and suggestions, as well as subject matter experts to point
> out pain points and incompatibilities. Thank you for your input.
> 
Uh. You will have fun pushing that past T-10 ...

> These commands ignore the LUN field and affect all LUNs on the device:
> 0x00: TEST UNIT READY. Applies to entire device. The drive will return
> a GOOD status only if both LUNs can service medium access commands.
Please, don't.
TEST UNIT READY is the _only_ command at SPC level allowing us to check
the state of the LUN. Moving that up the device (ie target) level is
leaving us with no idea about the status of the actual LUN.

> 0x01: REZERO. Applies to entire device. The command will force the
> seek to LBA 0 on both LUNs. The thermal compensation and other actions
> are also taken at both LUNs (actuators)."
Why? Is there any necessity that a REZERO of one LUN has a dependency on
the other?

> 0x04: FORMAT UNIT. Applies to entire device. The format parameters are
> applied to both LUN's. The format operation is done in parallel on the
> two LUN's. Format with defect list is not supported for the Dual LUN
> drive."
Again, why? Just for performance reasons?
That surely can be done by issuing a FORMAT UNIT command asynchonously
to both devices ...

> 0x12: INQUIRY. Applies to entire device. The same information is
> returned for the Inquiry command regardless of LUN setting. Each LUN
> has different identifier.
Strongly against it.
INQUIRY is _THE_ prime identifer of the LUN, Both LUNs might return the
same INQUIRY data, at the very least page 0x93 of the VPD data _NEEDS_
to be different both both LUNs.
Please keep it at LUN level.

> 0x1B: START STOP UNIT. Applies to entire device. The command will
> apply to both actuators - it will cause both actuators to be either
> spin down or spin-up depending on the command options. If the command
> fails on either actuator check condition is returned.
Already discussed. Probably no other way around it.

> 0x35: SYNCRONIZE CACHE. Applies to entire device. This will be a
> device command and only support the option to flush the entire cache.
> The drive does not support the flush of a particular LBA range only.
Fine with that; most HBAs have the same limitation.

> 0x37: READ DEFECT DATA (10). Applies to entire device. Device based
> defect list is returned - this will include the defects from both the
> LUNs. The heads are sequentially numbered across both LUNs.
Hmm. Again, why?

> 0x3B: WRITE BUFFER (10) Download. Applies to entire device. This is a
> device based command - as part of the download the code on both the
> LUN's will be updated.
About to be expected. Okay.

> 0x3B: WRITE BUFFER (10) other than download. Applies to entire device.
> Other than download Device based command - there is only one common
> buffer for the two LUNs.
See above. Okay.

> 0x3C: READ BUFFER (10). Applies to entire device. Device based command
> - there is only one common buffer for the two LUNs.
Might be worthwhile adding a new option to READ/WRITE BUFFER (eg using
one of the bits before the 'MODE' field), specifying that this command
applies to all LUNs on this device.

> 0x48 0x01: SANITIZE overwrite. Applies to entire device. Treated as a
> device level command - sanitize operation performed on both LUNs when
> command received.> 0x48 0x03: SANITIZE security erase. Applies to entire 
> device. Treated
> as a device level command - sanitize operation performed on both LUNs
> when command received.
> 0x48 0x1F: SANITIZE exit failure mode. Applies to entire device.
Emphatically no.
That would allow one application on one LUN erasing the contents of the
other LUN, which might be in use by a completely different application.

> 0x91: SYNCRONIZE CACHE (16). Applies to entire device. Same as Sync Cache.
> 0x4C: LOG SELECT (10). Applies to entire device. One global set of log
> pages for both LUNs. Any LBA information is stored as an internal LBA
> value, i.e. LUN1 LBAs start at LUN0 last_LBA + 1.
> 0x4D: LOG SENSE (10). Applies to entire device. One global set of log
> pages for both LUNs. Any LBA information is stored as an internal LBA
> value, i.e. LUN1 LBAs start at LUN0 last_LBA + 1.
> 0x55: MODE SELECT (10). Applies to entire device. Same as Mode select.
> 0x5A: MODE SENSE (10). Applies to entire device. Same as Mode sense.
> 0x9E 0x17: GET PHYSICAL ELEMENT STATUS. Applies to entire device.
> 0x9E 0x18: REMOVE ELEMENT AND TRUNCATE. Applies to entire device.
> 

Re: Multi-Actuator SAS HDD First Look

2018-03-31 Thread Hannes Reinecke
On 03/30/2018 03:07 PM, Tim Walker wrote:
> Hi Doug-
> 
> Currently, the dual actuator firmware safely spins the drive down if
> either LUN receives the START STOP UNIT command.  In other words, if
> LUN1 receives the command, it will flush any dirty data from LUN1l and
> LUN0, then spin down, taking both LUN1 & LUN0 off line. Alternatively,
> we've had input that either:
> a) Both LUNs must receive the START STOP UNIT command before the drive
> will spin down, OR
> b) Move the storage to LUN1 & LUN2, keeping LUN0 (with no storage) for
> device specific commands such as START STOP UNIT that do not directly
> access the media.
> 
That will get interesting when suspending the device; physical
interactions between independent LUNs are always tricky. And I don't
even want to imagine the implications when assigning each LUN to
individual VM guests; shutting down one VM will also interfere with the
other VM ... shudder.

Actually I would propose to have a 'management' LUN at LUN0, who could
handle all the device-wide commands (eg things like START STOP UNIT,
firmware update, or even SMART commands), and ignoring them for the
remaining LUNs.

I guess that would make the whole setup easier to handle.

Cheers,

Hannes
-- 
Dr. Hannes ReineckeTeamlead Storage & Networking
h...@suse.de   +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)


Re: Multi-Actuator SAS HDD First Look

2018-03-30 Thread Bart Van Assche
On Fri, 2018-03-30 at 12:36 -0600, Tim Walker wrote:
> Yes I will be there to discuss the multi-LUN approach. I wanted to get
> these interface details out so we could have some background and
> perhaps folks would come with ideas. I don't have much more to put
> out, but I will certainly answer questions - either on this list or
> directly.

Hello Tim,

As far as I know the Linux SCSI stack does not yet support any SCSI devices
for which a single SCSI command affects multiple (two in this case) LUNs.
Adding such support may take significant work. There will also be the
disadvantage that most SCSI core contributors do not have access to a multi-
actuator device and hence that their changes may break support for multi-
actuator devices.

PS: the preferred style for all Linux kernel mailing lists I know of is to reply
below a previous message. See also https://en.wikipedia.org/wiki/Posting_style.

Thanks,

Bart.





Re: Multi-Actuator SAS HDD First Look

2018-03-30 Thread Tim Walker
Yes I will be there to discuss the multi-LUN approach. I wanted to get
these interface details out so we could have some background and
perhaps folks would come with ideas. I don't have much more to put
out, but I will certainly answer questions - either on this list or
directly.

Best regards,
-Tim

On Fri, Mar 30, 2018 at 12:31 PM, Bart Van Assche
 wrote:
> On Fri, 2018-03-30 at 12:21 -0600, Tim Walker wrote:
>
>> Yes, the header LUN field. Sorry!
>
>>
>
>> We hadn't intended to broadcast - we expect to see a LUN specified.
>
>> For a device specific command both LUNs will be affected regardless of
>
>> which LUN is specified in the transport field. e.g. if we command LUN1
>
>> to stop (START STOP UNIT) then LUN0 will also be stopped.
>
>>
>
>> Sorry if my terminology wasn't entirely clear - some jargon and
>
>> informal usage slipping through.
>
>
>
> Hello Tim,
>
>
>
> Do you perhaps plan to attend LSF/MM next month? It probably will be easier
>
> to discuss this further in person instead of continuing this e-mail thread.
>
> See also 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__events.linuxfoundation.org_events_lsfmm-2D2018_=DwIGaQ=IGDlg0lD0b-nebmJJ0Kp8A=NW1X0yRHNNEluZ8sOGXBxCbQJZPWcIkPT0Uy3ynVsFU=MebnGjWzpdtum7IWMvt4lPVFr7aZr4pLz9A4pVL9uFM=SPtZIJ6pKfO5rlo4MokPxd09lri5kKheGP56t-zooiY=.
>
>
>
> Thanks,
>
>
>
> Bart.
>
>
>
>
>
>
>
>
>



-- 
Tim Walker
Product Design Systems Engineering, Seagate Technology
(303) 775-3770


Re: Multi-Actuator SAS HDD First Look

2018-03-30 Thread Bart Van Assche
On Fri, 2018-03-30 at 12:21 -0600, Tim Walker wrote:
> Yes, the header LUN field. Sorry!
> 
> We hadn't intended to broadcast - we expect to see a LUN specified.
> For a device specific command both LUNs will be affected regardless of
> which LUN is specified in the transport field. e.g. if we command LUN1
> to stop (START STOP UNIT) then LUN0 will also be stopped.
> 
> Sorry if my terminology wasn't entirely clear - some jargon and
> informal usage slipping through.

Hello Tim,

Do you perhaps plan to attend LSF/MM next month? It probably will be easier
to discuss this further in person instead of continuing this e-mail thread.
See also https://events.linuxfoundation.org/events/lsfmm-2018/.

Thanks,

Bart.






Re: Multi-Actuator SAS HDD First Look

2018-03-30 Thread Tim Walker
Hi Bart-

Yes, the header LUN field. Sorry!

We hadn't intended to broadcast - we expect to see a LUN specified.
For a device specific command both LUNs will be affected regardless of
which LUN is specified in the transport field. e.g. if we command LUN1
to stop (START STOP UNIT) then LUN0 will also be stopped.

Sorry if my terminology wasn't entirely clear - some jargon and
informal usage slipping through.

Best regards,
-Tim

-Tim

On Fri, Mar 30, 2018 at 12:17 PM, Bart Van Assche
 wrote:
> On Fri, 2018-03-30 at 12:07 -0600, Tim Walker wrote:
>> Concerning how we are currently allocating commands to LUNs or the
>> device as a whole, here is a list based on the current two LUN model.
>> This model has LUN0 & LUN1, both reporting 1/2 the total storage. Our
>> definition of "device based" is that it ignores the LUN field and
>> executes the command on the entire device. [ ... ]
>
> Hello Tim,
>
> Can you clarify what "LUN field" means in this context? Are you referring
> to a field in the SAS transport header or perhaps to the obsolete LUN
> field in the SCSI CDB? Additionally, as far as I know every SCSI command
> should be executed by exactly one SCSI LUN. I'm not aware of any support
> in the SCSI specs for broadcasting a single SCSI command to multiple LUNs.
>
> Thanks,
>
> Bart.
>
>
>
>



-- 
Tim Walker
Product Design Systems Engineering, Seagate Technology
(303) 775-3770


Re: Multi-Actuator SAS HDD First Look

2018-03-30 Thread Bart Van Assche
On Fri, 2018-03-30 at 12:07 -0600, Tim Walker wrote:
> Concerning how we are currently allocating commands to LUNs or the
> device as a whole, here is a list based on the current two LUN model.
> This model has LUN0 & LUN1, both reporting 1/2 the total storage. Our
> definition of "device based" is that it ignores the LUN field and
> executes the command on the entire device. [ ... ]

Hello Tim,

Can you clarify what "LUN field" means in this context? Are you referring
to a field in the SAS transport header or perhaps to the obsolete LUN
field in the SCSI CDB? Additionally, as far as I know every SCSI command
should be executed by exactly one SCSI LUN. I'm not aware of any support
in the SCSI specs for broadcasting a single SCSI command to multiple LUNs.

Thanks,

Bart.






Re: Multi-Actuator SAS HDD First Look

2018-03-30 Thread Tim Walker
Hello-

Concerning how we are currently allocating commands to LUNs or the
device as a whole, here is a list based on the current two LUN model.
This model has LUN0 & LUN1, both reporting 1/2 the total storage. Our
definition of "device based" is that it ignores the LUN field and
executes the command on the entire device. In other words, sending a
device based command to LUN1 will also act on LUN0. "LUN-based"
commands affect only the LUN they're addressed to. I'm soliciting
feedback and suggestions, as well as subject matter experts to point
out pain points and incompatibilities. Thank you for your input.

These commands ignore the LUN field and affect all LUNs on the device:
0x00: TEST UNIT READY. Applies to entire device. The drive will return
a GOOD status only if both LUNs can service medium access commands.
0x01: REZERO. Applies to entire device. The command will force the
seek to LBA 0 on both LUNs. The thermal compensation and other actions
are also taken at both LUNs (actuators)."
0x04: FORMAT UNIT. Applies to entire device. The format parameters are
applied to both LUN's. The format operation is done in parallel on the
two LUN's. Format with defect list is not supported for the Dual LUN
drive."
0x12: INQUIRY. Applies to entire device. The same information is
returned for the Inquiry command regardless of LUN setting. Each LUN
has different identifier.
0x1B: START STOP UNIT. Applies to entire device. The command will
apply to both actuators - it will cause both actuators to be either
spin down or spin-up depending on the command options. If the command
fails on either actuator check condition is returned.
0x35: SYNCRONIZE CACHE. Applies to entire device. This will be a
device command and only support the option to flush the entire cache.
The drive does not support the flush of a particular LBA range only.
0x37: READ DEFECT DATA (10). Applies to entire device. Device based
defect list is returned - this will include the defects from both the
LUNs. The heads are sequentially numbered across both LUNs.
0x3B: WRITE BUFFER (10) Download. Applies to entire device. This is a
device based command - as part of the download the code on both the
LUN's will be updated.
0x3B: WRITE BUFFER (10) other than download. Applies to entire device.
Other than download Device based command - there is only one common
buffer for the two LUNs.
0x3C: READ BUFFER (10). Applies to entire device. Device based command
- there is only one common buffer for the two LUNs.
0x48 0x01: SANITIZE overwrite. Applies to entire device. Treated as a
device level command - sanitize operation performed on both LUNs when
command received.
0x48 0x03: SANITIZE security erase. Applies to entire device. Treated
as a device level command - sanitize operation performed on both LUNs
when command received.
0x48 0x1F: SANITIZE exit failure mode. Applies to entire device.
0x91: SYNCRONIZE CACHE (16). Applies to entire device. Same as Sync Cache.
0x4C: LOG SELECT (10). Applies to entire device. One global set of log
pages for both LUNs. Any LBA information is stored as an internal LBA
value, i.e. LUN1 LBAs start at LUN0 last_LBA + 1.
0x4D: LOG SENSE (10). Applies to entire device. One global set of log
pages for both LUNs. Any LBA information is stored as an internal LBA
value, i.e. LUN1 LBAs start at LUN0 last_LBA + 1.
0x55: MODE SELECT (10). Applies to entire device. Same as Mode select.
0x5A: MODE SENSE (10). Applies to entire device. Same as Mode sense.
0x9E 0x17: GET PHYSICAL ELEMENT STATUS. Applies to entire device.
0x9E 0x18: REMOVE ELEMENT AND TRUNCATE. Applies to entire device.
0xA0: REPORT LUNS. Applies to entire device. Returns information on
the two/multiple LUNs supported by the drive.
0xA2: SECURITY PROTOCOL IN. Applies to entire device.
0xA3 0x0C: REPORT SUPPORTED OP CODES. Applies to entire device.
0xA3 0x0D: REPORT SUPPORTED TMFS. Applies to entire device.
0xA3 0x0F: REPORT TIMESTAMP. Applies to entire device.
0xA4 0x0C: REMOVE I_T NEXUS. Applies to entire device.
0xB7: READ DEFECT DATA (12). Applies to entire device.
0xF7: READ UDS DATA. Applies to entire device.

These commands honor the LUN field and affect the addressed LUN only:
0x5E: PERSISTENT RESERVE IN. LUN Specific.
0x5F: PERSISTENT RESERVE OUT. LUN Specific.
0x9F 0x11: WRITE LONG (16). LUN Specific. Only support WR_UNCOR option
to make the sectors un-correctable.
0xA3 0x05: REPORT DEVICE ID. LUN Specific.
0xA4 0x06: SET DEVICE ID. LUN Specific.
0xB5: SECURITY PROTOCOL OUT. LUN Specific.
0x03: REQUEST SENSE.  LUN Specific. The command returns the sense data
for the respective LUN.
0x07: REASSIGN BLOCKS. LUN Specific. The reassign command will be LUN
specific. It reassigns the defective blocks in the defect list to the
reassign area on the respective LUN.
0x25: READ CAPACITY (10). LUN Specific. The capacity for the LUN
specified in the CDB is returned - the capacity can be different for
the two LUN's in the drive.
0x3E: READ LONG (10). LUN Specific
0x3F 0x11: WRITE LONG (10). 

Re: Multi-Actuator SAS HDD First Look

2018-03-30 Thread Tim Walker
Hi Doug-

Currently, the dual actuator firmware safely spins the drive down if
either LUN receives the START STOP UNIT command.  In other words, if
LUN1 receives the command, it will flush any dirty data from LUN1l and
LUN0, then spin down, taking both LUN1 & LUN0 off line. Alternatively,
we've had input that either:
a) Both LUNs must receive the START STOP UNIT command before the drive
will spin down, OR
b) Move the storage to LUN1 & LUN2, keeping LUN0 (with no storage) for
device specific commands such as START STOP UNIT that do not directly
access the media.

Thanks for the question.

Best regards,
-Tim

On Thu, Mar 29, 2018 at 12:03 PM, Douglas Gilbert  wrote:
> On 2018-03-26 11:08 AM, Hannes Reinecke wrote:
>>
>> On Fri, 23 Mar 2018 08:57:12 -0600
>> Tim Walker  wrote:
>>
>>> Seagate announced their split actuator SAS drive, which will probably
>>> require some kernel changes for full support. It's targeted at cloud
>>> provider JBODs and RAID.
>>>
>>> Here are some of the drive's architectural points. Since the two LUNs
>>> share many common components (e.g. spindle) Seagate allocated some
>>> SCSI operations to be LUN specific and some to affect the entire
>>> device, that is, both LUNs.
>>>
>>> 1. Two LUNs, 0 & 1, each with independent lba space, and each
>>> connected to an independent read channel, actuator, and set of heads.
>>> 2. Each actuator addresses 1/2 of the media - no media is shared
>>> across the actuators. They seek independently.
>>> 3. One World Wide Name (WWN) is assigned to the port for device
>>> address. Each Logical Unit has a separate World Wide Name for
>>> identification in VPD page.
>>> 4. 128 deep command queue, shared across both LUNs
>>> 5. Each LUN can pull commands from the queue independently, so they
>>> can implement their own sorting and optimization.
>>> 6. Ordered tag attribute causes the command to be ordered across both
>>> Logical Units
>>> 7. Head of Queue attribute causes the command to be ordered with
>>> respect to a single Logical Unit
>>> 8. Mode pages are device-based (shared across both Logical Units)
>>> 9. Log pages are device-based.
>>> 10. Inquiry VPD pages (with minor exceptions) are device based.
>>> 11. Device health features (SMART, etc) are device based
>>>
>>> Seagate wants the multi-actuator design to integrate into the stack as
>>> painlessly as possible.The interface design is still in the early
>>> stages, so I am gathering requirements and recommendations, and also
>>> providing any information necessary to help scope integrating a
>>> multi-LUN device into the MQ stack. So, I am soliciting any pertinent
>>> feedback including:
>>>
>>> 1. Painful incompatibilities between the Seagate proposal and current
>>> MQ architecture
>>> 2. Linux changes needed
>>> 3. Drive interface changes needed
>>> 4. Anything else I may have overlooked
>>>
>> So far it looks okay; just make sure to have VPD page 0x83
>> entries properly associated.
>> To all intents and purposes these devices seem to look like 'normal'
>> devices with two LUNs; nothing special with that.
>> Real question would be in which areas those devices differentiate from
>> the two indepdendent LUN scenario.
>>
>> There might be issues with per-device informations like SMART etc;
>> ideally they are available from _both_ LUNs.
>> Otherwise they'll show up as blank from one LUN, causing consternation
>> with any management software.
>
>
> Further to this point, some types of damage, such as to a head
> or (one side of) a platter would degrade one LU, possibly making
> it unusable for storage, while the other side (and the other LU)
> would be fine.
>
> I'm curious how you plan to implement the START STOP UNIT command.
> If one side of the platter is in "start" state and the other side
> in "stop" state, will the heads on the stopped side be parked (if
> they can be parked)? And if both sides (LUs) are stopped I would
> hope you really would spin down the disk, then if either is started
> the disk would be spun up.
>
> Getting T10 to add a bit to the Block Device Characteristics VPD page
> might be helpful. It could be a "shares a spindle" bit with the other
> LUs identified in the SCSI Ports VPD page. Such an indication would
> help an enclosure find out if a Multi-Actuator disk was really spun down
> and ready to be removed or replaced. I think SES and smartmontools may
> need tweaks to handle this new device model sensibly.
>
> Doug Gilbert
>
>



-- 
Tim Walker
Product Design Systems Engineering, Seagate Technology
(303) 775-3770


Re: Multi-Actuator SAS HDD First Look

2018-03-29 Thread Douglas Gilbert

On 2018-03-26 11:08 AM, Hannes Reinecke wrote:

On Fri, 23 Mar 2018 08:57:12 -0600
Tim Walker  wrote:


Seagate announced their split actuator SAS drive, which will probably
require some kernel changes for full support. It's targeted at cloud
provider JBODs and RAID.

Here are some of the drive's architectural points. Since the two LUNs
share many common components (e.g. spindle) Seagate allocated some
SCSI operations to be LUN specific and some to affect the entire
device, that is, both LUNs.

1. Two LUNs, 0 & 1, each with independent lba space, and each
connected to an independent read channel, actuator, and set of heads.
2. Each actuator addresses 1/2 of the media - no media is shared
across the actuators. They seek independently.
3. One World Wide Name (WWN) is assigned to the port for device
address. Each Logical Unit has a separate World Wide Name for
identification in VPD page.
4. 128 deep command queue, shared across both LUNs
5. Each LUN can pull commands from the queue independently, so they
can implement their own sorting and optimization.
6. Ordered tag attribute causes the command to be ordered across both
Logical Units
7. Head of Queue attribute causes the command to be ordered with
respect to a single Logical Unit
8. Mode pages are device-based (shared across both Logical Units)
9. Log pages are device-based.
10. Inquiry VPD pages (with minor exceptions) are device based.
11. Device health features (SMART, etc) are device based

Seagate wants the multi-actuator design to integrate into the stack as
painlessly as possible.The interface design is still in the early
stages, so I am gathering requirements and recommendations, and also
providing any information necessary to help scope integrating a
multi-LUN device into the MQ stack. So, I am soliciting any pertinent
feedback including:

1. Painful incompatibilities between the Seagate proposal and current
MQ architecture
2. Linux changes needed
3. Drive interface changes needed
4. Anything else I may have overlooked


So far it looks okay; just make sure to have VPD page 0x83
entries properly associated.
To all intents and purposes these devices seem to look like 'normal'
devices with two LUNs; nothing special with that.
Real question would be in which areas those devices differentiate from
the two indepdendent LUN scenario.

There might be issues with per-device informations like SMART etc;
ideally they are available from _both_ LUNs.
Otherwise they'll show up as blank from one LUN, causing consternation
with any management software.


Further to this point, some types of damage, such as to a head
or (one side of) a platter would degrade one LU, possibly making
it unusable for storage, while the other side (and the other LU)
would be fine.

I'm curious how you plan to implement the START STOP UNIT command.
If one side of the platter is in "start" state and the other side
in "stop" state, will the heads on the stopped side be parked (if
they can be parked)? And if both sides (LUs) are stopped I would
hope you really would spin down the disk, then if either is started
the disk would be spun up.

Getting T10 to add a bit to the Block Device Characteristics VPD page
might be helpful. It could be a "shares a spindle" bit with the other
LUs identified in the SCSI Ports VPD page. Such an indication would
help an enclosure find out if a Multi-Actuator disk was really spun down
and ready to be removed or replaced. I think SES and smartmontools may
need tweaks to handle this new device model sensibly.

Doug Gilbert




Re: Multi-Actuator SAS HDD First Look

2018-03-26 Thread Hannes Reinecke
On Fri, 23 Mar 2018 08:57:12 -0600
Tim Walker  wrote:

> Seagate announced their split actuator SAS drive, which will probably
> require some kernel changes for full support. It's targeted at cloud
> provider JBODs and RAID.
> 
> Here are some of the drive's architectural points. Since the two LUNs
> share many common components (e.g. spindle) Seagate allocated some
> SCSI operations to be LUN specific and some to affect the entire
> device, that is, both LUNs.
> 
> 1. Two LUNs, 0 & 1, each with independent lba space, and each
> connected to an independent read channel, actuator, and set of heads.
> 2. Each actuator addresses 1/2 of the media - no media is shared
> across the actuators. They seek independently.
> 3. One World Wide Name (WWN) is assigned to the port for device
> address. Each Logical Unit has a separate World Wide Name for
> identification in VPD page.
> 4. 128 deep command queue, shared across both LUNs
> 5. Each LUN can pull commands from the queue independently, so they
> can implement their own sorting and optimization.
> 6. Ordered tag attribute causes the command to be ordered across both
> Logical Units
> 7. Head of Queue attribute causes the command to be ordered with
> respect to a single Logical Unit
> 8. Mode pages are device-based (shared across both Logical Units)
> 9. Log pages are device-based.
> 10. Inquiry VPD pages (with minor exceptions) are device based.
> 11. Device health features (SMART, etc) are device based
> 
> Seagate wants the multi-actuator design to integrate into the stack as
> painlessly as possible.The interface design is still in the early
> stages, so I am gathering requirements and recommendations, and also
> providing any information necessary to help scope integrating a
> multi-LUN device into the MQ stack. So, I am soliciting any pertinent
> feedback including:
> 
> 1. Painful incompatibilities between the Seagate proposal and current
> MQ architecture
> 2. Linux changes needed
> 3. Drive interface changes needed
> 4. Anything else I may have overlooked
> 
So far it looks okay; just make sure to have VPD page 0x83
entries properly associated.
To all intents and purposes these devices seem to look like 'normal'
devices with two LUNs; nothing special with that.
Real question would be in which areas those devices differentiate from
the two indepdendent LUN scenario.

There might be issues with per-device informations like SMART etc;
ideally they are available from _both_ LUNs.
Otherwise they'll show up as blank from one LUN, causing consternation
with any management software.

Cheers,

Hannes


Multi-Actuator SAS HDD First Look

2018-03-23 Thread Tim Walker
Seagate announced their split actuator SAS drive, which will probably
require some kernel changes for full support. It's targeted at cloud
provider JBODs and RAID.

Here are some of the drive's architectural points. Since the two LUNs
share many common components (e.g. spindle) Seagate allocated some
SCSI operations to be LUN specific and some to affect the entire
device, that is, both LUNs.

1. Two LUNs, 0 & 1, each with independent lba space, and each
connected to an independent read channel, actuator, and set of heads.
2. Each actuator addresses 1/2 of the media - no media is shared
across the actuators. They seek independently.
3. One World Wide Name (WWN) is assigned to the port for device
address. Each Logical Unit has a separate World Wide Name for
identification in VPD page.
4. 128 deep command queue, shared across both LUNs
5. Each LUN can pull commands from the queue independently, so they
can implement their own sorting and optimization.
6. Ordered tag attribute causes the command to be ordered across both
Logical Units
7. Head of Queue attribute causes the command to be ordered with
respect to a single Logical Unit
8. Mode pages are device-based (shared across both Logical Units)
9. Log pages are device-based.
10. Inquiry VPD pages (with minor exceptions) are device based.
11. Device health features (SMART, etc) are device based

Seagate wants the multi-actuator design to integrate into the stack as
painlessly as possible.The interface design is still in the early
stages, so I am gathering requirements and recommendations, and also
providing any information necessary to help scope integrating a
multi-LUN device into the MQ stack. So, I am soliciting any pertinent
feedback including:

1. Painful incompatibilities between the Seagate proposal and current
MQ architecture
2. Linux changes needed
3. Drive interface changes needed
4. Anything else I may have overlooked

Please feel free to send any questions or comments.

Tim Walker
Product Design Systems Engineering, Seagate Technology
(303) 775-3770