Re: [MINI SUMMIT] SCSI core performance

2012-07-18 Thread James Bottomley
On Tue, 2012-07-17 at 19:39 -0700, Nicholas A. Bellinger wrote:
 Hi KS-PCs,
 
 I'd like to propose a SCSI performance mini-summit to see how interested
 folks are in helping address the long-term issues that SCSI core is
 currently facing wrt to multi-lun per host and heavy small block random
 I/O workloads.
 
 I know this would probably be better suited for LSF (for the record it
 was proposed this year) but now that we've acknowledge there is a
 problem with SCSI LLDs vs. raw block drivers vs. other SCSI subsystems,
 it would be useful to get the storage folks into a single room at some
 point during KS/LPC to figure out what is actually going on with SCSI
 core.

You seem to have a short memory:  The last time it was discussed

http://marc.info/?t=13415537393

It rapidly became apparent there isn't a problem.  Enabling high IOPS in
the SCSI stack is what I think you mean.

 As mentioned in the recent tcm_vhost thread, there are a number of cases
 where drivers/target/ code can demonstrate this limitation pretty
 vividly now.
 
 This includes the following scenarios using raw block flash export with
 target_core_mod + target_core_iblock export and the same small block
 (4k) mixed random I/O workload with fio:
 
 *) tcm_loop local SCSI LLD performance is an order of magnitude slower 
than the same local raw block flash backend.
 *) tcm_qla2xxx performs better using MSFT Server hosts than Linux v3.x
based hosts using 2x socket Nehalem hardware w/ PCI-e Gen2 HBAs
 *) ib_srpt performs better using MSFT Server host than RHEL 6.x .32 
based hosts using 2x socket Romley hardware w/ PCI-e Gen3 HCAs
 *) Raw block IBLOCK export into KVM guest v3.5-rc w/ virtio-scsi is 
behind in performance vs. raw local block flash.  (cmwq on the host 
is helping here, but still need to with MSFT SCSI mini-port)
 
 Also, with 1M IOPs into a single VM guest now being done by other non
 Linux based hypervisors, the virtualization bit for high performance KVM
 SCSI based storage is quickly coming on..
 
 So all of that said, I'd like to at least have a discussion with the key
 SCSI + block folks who will be present in San Diego on path forward to
 address these without having to wait until LSF-2013 + hope for a topic
 slot to materialize then.
 
 Thank you for your consideration,

Well, your proposal is devoid of an actual proposal.

Enabling high IOPS involves reducing locking overhead and path length
through the code.  I think most of the low hanging fruit in this area is
already picked, but if you have an idea, please say.  There might be
something we can extract from the lockless queue work Jens is doing, but
we need that to materialise first.

Without a concrete thing to discuss, shooting the breeze on high IOPS in
the SCSI stack is about as useful as discussing what happened in last
night's episode of Coronation Street which, when it happens in my house,
always helps me see how incredibly urgent fixing the leaky tap I've been
putting off for months actually is.

If someone can come up with a proposal ... or even perhaps another path
trace showing where the reducible overhead and lock problems are we can
discuss it on the list and we might have a real topic by the time LSF
rolls around.

James


--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [MINI SUMMIT] SCSI core performance

2012-07-18 Thread Nicholas A. Bellinger
On Wed, 2012-07-18 at 09:00 +0100, James Bottomley wrote:
 On Tue, 2012-07-17 at 19:39 -0700, Nicholas A. Bellinger wrote:
  Hi KS-PCs,
  
  I'd like to propose a SCSI performance mini-summit to see how interested
  folks are in helping address the long-term issues that SCSI core is
  currently facing wrt to multi-lun per host and heavy small block random
  I/O workloads.
  
  I know this would probably be better suited for LSF (for the record it
  was proposed this year) but now that we've acknowledge there is a
  problem with SCSI LLDs vs. raw block drivers vs. other SCSI subsystems,
  it would be useful to get the storage folks into a single room at some
  point during KS/LPC to figure out what is actually going on with SCSI
  core.
 
 You seem to have a short memory:  The last time it was discussed
 
 http://marc.info/?t=13415537393
 
 It rapidly became apparent there isn't a problem.  Enabling high IOPS in
 the SCSI stack is what I think you mean.
 

small block random I/O == performance, that is correct.

The host-lock-less stuff is doing better these days for small-ish
multi-lun setups with large block sequential I/O workloads.

Doing ~1 GB/sec per LUN is achievable with multi-lun per host (say up to
6-8 LUNs dependent on your setup) using PCI-e Gen3 hardware.

  As mentioned in the recent tcm_vhost thread, there are a number of cases
  where drivers/target/ code can demonstrate this limitation pretty
  vividly now.
  
  This includes the following scenarios using raw block flash export with
  target_core_mod + target_core_iblock export and the same small block
  (4k) mixed random I/O workload with fio:
  
  *) tcm_loop local SCSI LLD performance is an order of magnitude slower 
 than the same local raw block flash backend.
  *) tcm_qla2xxx performs better using MSFT Server hosts than Linux v3.x
 based hosts using 2x socket Nehalem hardware w/ PCI-e Gen2 HBAs
  *) ib_srpt performs better using MSFT Server host than RHEL 6.x .32 
 based hosts using 2x socket Romley hardware w/ PCI-e Gen3 HCAs
  *) Raw block IBLOCK export into KVM guest v3.5-rc w/ virtio-scsi is 
 behind in performance vs. raw local block flash.  (cmwq on the host 
 is helping here, but still need to with MSFT SCSI mini-port)
  
  Also, with 1M IOPs into a single VM guest now being done by other non
  Linux based hypervisors, the virtualization bit for high performance KVM
  SCSI based storage is quickly coming on..
  
  So all of that said, I'd like to at least have a discussion with the key
  SCSI + block folks who will be present in San Diego on path forward to
  address these without having to wait until LSF-2013 + hope for a topic
  slot to materialize then.
  
  Thank you for your consideration,
 
 Well, your proposal is devoid of an actual proposal.
 

Huh..?  It's a proposal for a discussion to (hopefully) identify the
main culprit(s) and figure out an incremental way forward.

Due to the fact that 1M IOPs machines aren't quite the norm (yet), the
idea is to get storage folks in the same room who do have access to 1M
IOPs systems + have an interest in making SCSI core go faster for random
small block I/O workloads.

This can be vendors / LLD maintainers who've run into similar
limitations with SCSI core, or folks who have an interest in KVM guest
SCSI performance.

 Enabling high IOPS involves reducing locking overhead and path length
 through the code.  I think most of the low hanging fruit in this area is
 already picked, but if you have an idea, please say.  There might be
 something we can extract from the lockless queue work Jens is doing, but
 we need that to materialise first.
 

Would really like to hear from Jens here, but I don't know how much time
he is spending on the SCSI layer these days..

I've been more interested recently in working on a fabric that can
demonstrate this bottleneck with raw block flash into KVM guest -
virtio-scsi, as I think it's a important vehicle for short-term
diagnosis.

 Without a concrete thing to discuss, shooting the breeze on high IOPS in
 the SCSI stack is about as useful as discussing what happened in last
 night's episode of Coronation Street which, when it happens in my house,
 always helps me see how incredibly urgent fixing the leaky tap I've been
 putting off for months actually is.
 

Sorry, I've never heard of that show.  

 If someone can come up with a proposal ... or even perhaps another path
 trace showing where the reducible overhead and lock problems are we can
 discuss it on the list and we might have a real topic by the time LSF
 rolls around.
 

So identifying root culprit(s) is still a WIP at this point.

In the next weeks I'll be back spending time back on 1M IOPs machines
with raw block flash + qla2xxx/srpt/vhost + Linux/MSFT SCSI clients, and
should be getting some more data-points by then.

Anyways, if it ends up taking until LSF it ends up at LSF.  I figured
since things are heating up for virtio-scsi that KS might be a good
venue for 

[MINI SUMMIT] SCSI core performance

2012-07-17 Thread Nicholas A. Bellinger
Hi KS-PCs,

I'd like to propose a SCSI performance mini-summit to see how interested
folks are in helping address the long-term issues that SCSI core is
currently facing wrt to multi-lun per host and heavy small block random
I/O workloads.

I know this would probably be better suited for LSF (for the record it
was proposed this year) but now that we've acknowledge there is a
problem with SCSI LLDs vs. raw block drivers vs. other SCSI subsystems,
it would be useful to get the storage folks into a single room at some
point during KS/LPC to figure out what is actually going on with SCSI
core.

As mentioned in the recent tcm_vhost thread, there are a number of cases
where drivers/target/ code can demonstrate this limitation pretty
vividly now.

This includes the following scenarios using raw block flash export with
target_core_mod + target_core_iblock export and the same small block
(4k) mixed random I/O workload with fio:

*) tcm_loop local SCSI LLD performance is an order of magnitude slower 
   than the same local raw block flash backend.
*) tcm_qla2xxx performs better using MSFT Server hosts than Linux v3.x
   based hosts using 2x socket Nehalem hardware w/ PCI-e Gen2 HBAs
*) ib_srpt performs better using MSFT Server host than RHEL 6.x .32 
   based hosts using 2x socket Romley hardware w/ PCI-e Gen3 HCAs
*) Raw block IBLOCK export into KVM guest v3.5-rc w/ virtio-scsi is 
   behind in performance vs. raw local block flash.  (cmwq on the host 
   is helping here, but still need to with MSFT SCSI mini-port)

Also, with 1M IOPs into a single VM guest now being done by other non
Linux based hypervisors, the virtualization bit for high performance KVM
SCSI based storage is quickly coming on..

So all of that said, I'd like to at least have a discussion with the key
SCSI + block folks who will be present in San Diego on path forward to
address these without having to wait until LSF-2013 + hope for a topic
slot to materialize then.

Thank you for your consideration,

--nab

--
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html