Re: [RFC] Re: broken userland ABI in configfs binary attributes

2019-08-27 Thread Boaz Harrosh
On 27/08/2019 20:27, Al Viro wrote:
<>
> If you want to express something like "data packet formed; now you can commit
> it and tell me if there'd been any errors", use something explicit.  close()
> simply isn't suitable for that.  writev() for datagram-like semantics might
> be; fsync() or fdatasync() could serve for "commit now".
> 

Yes! I change my mind you are right. close() should stay with void semantics.
I always thought the IO error reporting on close was a bad POSIX decision and
fsync should be the final resting bed, and if you do not call fsync then you
don't care about the error.

Sigh, looks like the error was for ever ignored from day one. Maybe the Kernel
guys felt the errors were important. But application users of configfs, did any
actually care and check? Is there really a regression here? maybe the current 
imp
needs to just be documented.
(Or the more blasphemous, change the ABI and force people to call fsync or 
something)

I feel the frustration too
Boaz


Re: [RFC] Re: broken userland ABI in configfs binary attributes

2019-08-27 Thread Boaz Harrosh
On 26/08/2019 22:32, Al Viro wrote:
<>
> D'oh...  OK, that settles it; exclusion with st_write() would've been
> painful, but playing with the next st_write() on the same struct file
> rewinding the damn thing to overwrite what st_flush() had spewed is
> an obvious no-go.
> 

So what are the kind of errors current ->release implementation is trying to 
return?
Is it actual access the HW errors or its more of a resource allocations errors?
If the later then maybe the allocations can be done before hand, say at ->flush 
but
are not redone on redundant flushes?

If the former then yes looks like troubles. That said I believe the 
->last_close_with_error() is a very common needed pattern which few use exactly
because it does not work. But which I wanted/needed many times before.

So I would break some eggs which ever is the most elegant way, and perhaps add a
new parameter to ->flush(bool last) or some other easy API.
[Which is BTW the worst name ever, while at it lets rename it to ->close() which
 is what it is. "flush" is used elsewhere to mean sync.
]

So yes please lets fix VFS API with drivers so to have an easy and safe way
to execute very last close, all the while still being able to report errors to
close(2).

My $0.017 Thanks
Boaz


Re: remove exofs, the T10 OSD code and block/scsi bidi support V4

2019-02-04 Thread Boaz Harrosh
On 01/02/19 09:55, Christoph Hellwig wrote:
> The only real user of the T10 OSD protocol, the pNFS object layout
> driver never went to the point of having shipping products, and we
> removed it 1.5 years ago.  Exofs is just a simple example without
> real life users.
> 
> The code has been mostly unmaintained for years and is getting in the
> way of block / SCSI changes, 

For the 4th time: Which ?

> and does not even work properly currently,
> so I think it's finally time to drop it.
> 
> Quote from Boaz:
> 

Hello? anyone home?

Please resend without the below Quote!!!
Because yes I said that but I have also said the opposite. And I have clearly
stated in reaction to the last 3 resends. That I now oppose to the below 
statement.
And that you should please remove it.

Do your own text please. Don't drag me in the mud

> "As I said then. It is used in Universities for studies and experiments.
> Every once in a while. I get an email with questions and reports.
> 
> But yes feel free to remove the all thing!!
> 
> I guess I can put it up on github. In a public tree.
> 
> Just that I will need to forward port it myself, til now you guys
> been doing this for me ;-)"
> 
> Now the last time this caused a bit of a stir, but still no actual users,
> not even for SG_IO passthrough commands.  So here we go again, this time
> including removing everything in the scsi and block layer supporting it,
> and thus shrinking struct request.
> 

NAK-by: Boaz harrosh 
For what ever that means ...

Cheers
Boaz



Re: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-12-23 Thread Boaz Harrosh
On 20/12/18 09:26, Christoph Hellwig wrote:
> On Wed, Dec 19, 2018 at 09:01:53PM -0500, Douglas Gilbert wrote:
>>>   1) reduce the size of every kernel with block layer support, and
>>>  even more for every kernel with scsi support
>>
>> By proposing the removal of bidi support from the block layer, it isn't
>> just the SCSI subsystem that will be impacted. Those NVMe documents
>> that you referred me to earlier in the year, in the command tables
>> in 1.3c and earlier you have noticed the 2 bit direction field and
>> what 11b means? Even if there aren't any bidi NVMe commands *** yet,
>> the fact that NVMe's 64 byte command format has provision for 4
>> (not 2) independent data transfers (data + meta, for each direction).
>> Surely NVMe will sooner or later take advantage of those ... a
>> command like READ GATHERED comes to mind.
> 
> NVMe on the other hand does have support for separate read and write
> buffers as in the current SCSI bidi support, as it encodes the data
> transfers in that SQE.  So IFF NVMe does bidi commands it would have
> to use a single buffer for data in/out, 

There is no such thing as "buffer" there is at first a bio, and after
virtual-to-iommu mapping a scatter-gather-list. All these are currently
governed by a struct request.
request, bio, and sgl, have a single direction, All API's expect a single
direction.

All BIDI did was to say. Lets not change any API or structure but just
use two of them at the same time.

All the wiser is the very high level user, and the very low HW driver like
iscsi. All the middlewere was never touched.

In the view of a bidi target like say an osd. It all stream looks like a single
"Buffer" on the wire, were some of it is read and some of it is written
to.

> which can be easily done

?? Did you try. It will take much more than an additional pointer sir

> in the block layer without the current bidi support that chains
> two struct request instances for data in and data out.
> 

That was the all trick of not changing a single API or structure
Just have two of the same thing, we already know how to handle

>>>   2) reduce the size of the critical struct request structure by
>>>  128 bits, thus reducing the memory used by every blk-mq driver
>>>  significantly, never mind the cache effects
>>
>> Hmm, one pointer (that is null in the non-bidi case) should be enough,
>> that's 64 or 32 bits.
> 
> Due to the way we use request chaining we need two fields at the
> moment.  ->special and ->next_rq.  

No! ->special is nothing to do with bidi. ->special is a field to be
used by LLD's only and are not to be touched by block layer or transports
or high level users.
 
Request has the single ->next_rq for bidi. And could be eliminated by
sharing space with the elevator info. Do you want a patch?

(So in effect it can be taking 0 bytes, and yes a little bit of code)

> If we'd refactor the whole thing
> for the basically non-existent user we could indeed probably get it
> down to a single pointer. 
> 
>> While on the subject of bidi, the order of transfers: is the data-out
>> (to the target) always before the data-in or is it the target device
>> that decides (depending on the semantics of the command) who is first?
> 
> The way I read SAM data needs to be transferred to the device for
> processing first, then the processing occurs and then it is transferred
> out, so the order seems fixed.
> 

Not sure what is the "SAM" above. But most of the BIDI commands I know,
osd and otherwise, the order is command specific, and many times it is
done in parallel.
Read some bits than write some bits, rinse and repeat ...

(You see in scsi the all OUT buffer is part of the actual CDB, so in effect
 any READ is a BIDI. The novelty here is the variable sizes and the SW stack
 memory targets for the different operations)

>>
>> Doug Gilbert
>>
>>  *** there could already be vendor specific bidi NVMe commands out
>> there (ditto for SCSI)
> 
> For NVMe they'd need to transfer data in and out in the same buffer
> to sort work, and even then only if we don't happen to be bounce
> buffering using swiotlb, or using a network transport.  Similarly for
> SCSI only iSCSI at the moment supports bidi CDBs, so we could have
> applications using vendor specific bidi commands on iSCSI, which
> is exactly what we're trying to find out, but it is a bit of a very
> niche use case.
> 

Again bidi works NOW. Did not yet see the big gain, of throwing it
out.

Jai Maa
Boaz



Re: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-12-23 Thread Boaz Harrosh
On 19/12/18 16:43, Christoph Hellwig wrote:
> On Mon, Nov 26, 2018 at 07:11:10PM +0200, Boaz Harrosh wrote:
>> On 11/11/18 15:32, Christoph Hellwig wrote:
>>> The only real user of the T10 OSD protocol, the pNFS object layout
>>> driver never went to the point of having shipping products, and we
>>> removed it 1.5 years ago.  Exofs is just a simple example without
>>> real life users.
>>>
>>
>> You have failed to say what is your motivation for this patchset? What
>> is it you are trying to fix/improve.
> 
> Drop basically unused support, which allows us to
> 
>  1) reduce the size of every kernel with block layer support, and
> even more for every kernel with scsi support

Do you have numbers? its mainly code-segment so I don't think you will see
any real life measurable difference.

>  2) reduce the size of the critical struct request structure by
> 128 bits, thus reducing the memory used by every blk-mq driver
> significantly, never mind the cache effects

128 bits? I see the "struct request *next_rq;"
is there another one?

It could share space with elv; && flush;
Do you want a patch?

>  3) stop having the maintainance overhead for this code in the
> block layer, which has been rather painful at times
> 

I hear you man. Life is pain. But is it really such an overhead?
I mean it is already implemented. What else is there to do?
Please please show me? (Sorry for being slow)

Jai Maa
Boaz



Re: [PATCH 33/41] scsi: osd: osd_initiator: mark expected switch fall-throughs

2018-12-18 Thread Boaz Harrosh
On 28/11/18 06:32, Gustavo A. R. Silva wrote:
> In preparation to enabling -Wimplicit-fallthrough, mark switch cases
> where we are expecting to fall through.
> 
> Signed-off-by: Gustavo A. R. Silva 

ACK-by: Boaz Harrosh 

> ---
>  drivers/scsi/osd/osd_initiator.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/scsi/osd/osd_initiator.c 
> b/drivers/scsi/osd/osd_initiator.c
> index 60cf7c5eb880..cb26f26d5ec1 100644
> --- a/drivers/scsi/osd/osd_initiator.c
> +++ b/drivers/scsi/osd/osd_initiator.c
> @@ -1849,6 +1849,7 @@ int osd_req_decode_sense_full(struct osd_request *or,
>   32, 1, dump, sizeof(dump), true);
>   OSD_SENSE_PRINT2("response_integrity [%s]\n", dump);
>   }
> + /* fall through */
>   case osd_sense_attribute_identification:
>   {
>   struct osd_sense_attributes_data_descriptor
> @@ -1879,7 +1880,7 @@ int osd_req_decode_sense_full(struct osd_request *or,
>   attr_page, attr_id);
>   }
>   }
> - /*These are not legal for OSD*/
> + /* fall through - These are not legal for OSD */
>   case scsi_sense_field_replaceable_unit:
>   OSD_SENSE_PRINT2("scsi_sense_field_replaceable_unit\n");
>   break;
> 



Re: remove exofs, the T10 OSD code and block/scsi bidi support V3

2018-11-26 Thread Boaz Harrosh
On 11/11/18 15:32, Christoph Hellwig wrote:
> The only real user of the T10 OSD protocol, the pNFS object layout
> driver never went to the point of having shipping products, and we
> removed it 1.5 years ago.  Exofs is just a simple example without
> real life users.
> 

You have failed to say what is your motivation for this patchset? What
is it you are trying to fix/improve.

For the sake of "not used much" I fail to see the risk taking of
this removal.

> The code has been mostly unmaintained for years and is getting in the
> way of block / SCSI changes, so I think it's finally time to drop it.
> 
> Quote from Boaz:
> 
> "As I said then. It is used in Universities for studies and experiments.
> Every once in a while. I get an email with questions and reports.
> 
> But yes feel free to remove the all thing!!
> 
> I guess I can put it up on github. In a public tree.
> 
> Just that I will need to forward port it myself, til now you guys
> been doing this for me ;-)"
> 

Yes I wrote that for V1. But I wrote the *opposite* thing in a later mail.
Which nullifies this statement above. So please remove this quote in future
submits.

Here is what I wrote later as response of V2 of this set:



I think I'm changing my mind about this.

Because of two reasons:

One: I see 3 thousands bit-rots in the Kernel and particularly SCSI drivers
that are much older and fat-and-ugliness consuming then the clean osd
stack. For example the all ISA bus and ZONE_DMA stuff.

Two: I have offered many times, every time this came up. That if
anyone has a major (or even minor) change to the block and/or scsi layers
that he/she has done. And that now breaks the compilation/run time of
OSD or exofs.
  I'm personally willing to spend my weekends and fix it myself. Send me
  a URL of the tree with the work done, and I will send the patches needed
  to revitalize OSD/exofs as part of that change set.

I have never received any such requests to date.

So I would please like to protest on two of Christoph's statements above.

"The code has been mostly unmaintained for years and is getting in the
 way of block / SCSI changes"

1. What does "maintained" means? I have for all these years been immediately
   responsive to any inquiries and patches to the code in question.
   And am listed as MAINTAINER of this code.
2. I have regularly, for ever, every kernel release around the RC3-RC4
   time frame, compiled and ran my almost automatic setup and made sure
   the things still run as expected (No regressions).

So Yes the code has not seen any new fixtures for years. But it is regularly
tested and proven to work, on latest kernel. So it fails the definition 
of a "bit rot"

Christoph you've been saying for so long "getting in the way of block/SCSI
changes". And every time and again this time please tell me, you never answered
before. What are these changes you want to make? can I please help?

Send me any tree where exofs/osd compilation is broken and I will personally
fix it in "ONE WEEK" time.

(If compilation is fine but you know runtime will break, its nice to have an
 heads up, but if not my automatic system will detect it anyway)

Lets say that if in the FUTURE a change-set is submitted that breaks OSD/EXOFS
compilation, and I failed to respond with a fix within "ONE WEEK". Then
this goes in as part of that change-set. And not with the argument of
"Not used, not maintained" - But as "Breaks compilation of the following 
changes"
I promise I will gladly ACK it then.

So for now. A personal NACK from me on the grounds that. You never told me
why / what this is breaking.

Thanks
Boaz



> Now the last time this caused a bit of a stir, but still no actual users,
> not even for SG_IO passthrough commands.  So here we go again, this time
> including removing everything in the scsi and block layer supporting it,
> and thus shrinking struct request.
> 

Again. T10-OSD or not. Bidi is currently actively used. By Linus rules
You are not allowed to remove it.

Two use paths:
1. Management CDBS of private vendors yes via iscsi. virt_io and usb-scsi
2. Target mode support of WRITE-RETURN-XOR, and COMPARE_AND_WRITE

---
You guys should do what you feel best. Even not answering my questions and
of course not agreeing with my advise, .i.e about breaking people's setups.

But please remove the wrong quote from me. Please quote my objection
of the matter. (pretty please because you may surly ignore that request as
well)

[I am not fighting about this at all. Please do what you need to do.
 Just want to set the record strait that's all]

Cheers :-)
Boaz



Re: [PATCH] libosd: Remove ignored __weak attribute

2018-10-31 Thread Boaz Harrosh
On 26/10/18 00:31, Nathan Chancellor wrote:
> On Tue, Oct 02, 2018 at 04:06:31PM -0700, Bart Van Assche wrote:
<>
> 
> Hi Bart,
> 
> I'm sorry if I didn't follow the conclusion of this conversation properly
> but this is the below diff you were initially looking for, correct?
> 
> If so, Boaz and Nick, do you have any objections if this is v2? I'd like
> to get this patch accepted so the warning can be fixed for everyone.
> 

ACK on both the original and below code they will all work just fine.

I do like the original "just remove the _weak thingy". But this one
will work as well.

The "with extern" version suggested before is more cumbersome because it will
need an EXPORT_SYMBOL() but will also work.

The all use of osd_root_object is just. A properly C-types pointer to couple
of ZEROS. but since the Interface needs a pointer, those zeros need storage
somewhere. It does not matter where. (The value of the pointer is not used
only its content)

Do you need that I send a proper patch? Actually the original first patch
is the version I like. (And again all 3 approaches will work)

Thanks
Boaz

> Thanks,
> Nathan
> 
> 
> 
> diff --git a/drivers/scsi/osd/osd_initiator.c 
> b/drivers/scsi/osd/osd_initiator.c
> index e19fa883376f..4250f739beb3 100644
> --- a/drivers/scsi/osd/osd_initiator.c
> +++ b/drivers/scsi/osd/osd_initiator.c
> @@ -58,6 +58,8 @@
> 
>  enum { OSD_REQ_RETRIES = 1 };
> 
> +static const struct osd_obj_id osd_root_object;
> +
>  MODULE_AUTHOR("Boaz Harrosh ");
>  MODULE_DESCRIPTION("open-osd initiator library libosd.ko");
>  MODULE_LICENSE("GPL");
> diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
> index eaf36ccf58db..770c758baaa9 100644
> --- a/drivers/scsi/osd/osd_uld.c
> +++ b/drivers/scsi/osd/osd_uld.c
> @@ -73,6 +73,7 @@
> 
>  static const char osd_name[] = "osd";
>  static const char *osd_version_string = "open-osd 0.2.1";
> +static const struct osd_obj_id osd_root_object;
> 
>  MODULE_AUTHOR("Boaz Harrosh ");
>  MODULE_DESCRIPTION("open-osd Upper-Layer-Driver osd.ko");
> diff --git a/include/scsi/osd_types.h b/include/scsi/osd_types.h
> index 48e8a165e136..eb31357ec8b3 100644
> --- a/include/scsi/osd_types.h
> +++ b/include/scsi/osd_types.h
> @@ -28,8 +28,6 @@ struct osd_obj_id {
> osd_id id;
>  };
> 
> -static const struct __weak osd_obj_id osd_root_object = {0, 0};
> -
>  struct osd_attr {
> u32 attr_page;
> u32 attr_id;
> 



Re: [PATCH] libosd: Remove ignored __weak attribute

2018-10-31 Thread Boaz Harrosh
On 26/10/18 20:54, Nick Desaulniers wrote:
<>
> 
> It's hard to say without knowing the original intent of the code.
>>From the variable's identifier and fact that it's global, I *assume*
> that we want only 1 struct osd_obj_id which is the root, hence the
> identifier `osd_root_object`.  It has 4 references in the entire
> kernel; it doesn't make sense to my why those references would want to
> be referring to two different instances of `osd_root_object`.  Maybe
> the maintainers can clarify if 2 instances is the intent?
> 
> Further complicated is the use of the __weak attribute AND the
> compiler flag -fno-common (which the kernel sets in the top level
> Makefile).  Also, it seems that ODR is a C++ concept; it's not clear
> to me if there's semantic differences with C or not (I don't think so
> in this case, but I've learned not to bet on subtle semantic
> differences between the languages not existing).
> 
> __attribute__((weak)) AND static on a global variable declared in a
> header raises many red flags to me.  Was weak added to work around an
> ODR link error?
> 
> If creating one instance of this variable is a functional change, I
> can't help but suspect the original code was wrong.  But maybe Bart,
> Boaz, or Christoph can clarify or have more thoughts on this?  Looks
> like Boaz added this header in commit de258bf5e638 ("[SCSI] libosd:
> OSDv1 Headers").
> 

Sorry for the late response. Was off line for a bit.

The original patch and all its permutations are all correct
the definition of osd_root_object is just a fancy type cast of
couple of zeros. IE a couple of zeroes with the right type. So they
can be simply compared to retuned and sent line content. So it does
not matter at all what change is accepted.

I'm so sorry for your trouble and test development. It could all
be saved if I was noticing this thread.

I will ACK again the original simple V2 patch.

Thanks
Boaz

<>


Re: [PATCH] libosd: Remove ignored __weak attribute

2018-10-31 Thread Boaz Harrosh
On 27/10/18 16:28, Martin K. Petersen wrote:
> 
> Bart,
> 
>> Removing kernel drivers that are not used helps to reduce the workload
>> of a maintainer and hence is a rational action. Additionally, if
>> anyone would ever complain about removal of a kernel driver, it can be
>> brought back by reverting the commit through which it has been
>> removed.  Martin, please reply if you see this differently.
> 
> We remove crusty old SCSI drivers all the time. The heuristic is based
> on lack of user bug reports and absence of commits that are not due to
> kernel interface changes or trivial cleanups. So removing stuff is
> perfectly normal.
> 
> The OSD protocol failed to get traction in the industry, adoption was
> very limited. 

Very true.

> If the code just plugged straight into existing kernel
> interfaces it would be easier to justify keeping it around. 

It was using the very much straight existing kernel interfaces
at the time of its insertion, basically the regular PC_command
(As opposed to FS_commnnd) Where the CDB/scsi_command content is
provided by the caller. And the commands complete as an whole.
Plus the added support for bidi PC_commands.

Please tell me what are those entangled interfaces you are unhappy with?
I would like to help disentangle them.

> However, the
> OSD support requires bidirectional command support so we carry a bunch
> of additional plumbing in both block and SCSI to accommodate it. There
> are no other users of these interfaces, so dropping OSD would mean we
> could simplify some (hot) code paths. That would be a win in my book.

But OSD is not the only one or scsi command-set that is using bidi. There
are a few more uses of scsi and BTW block-layer bidi commands in the field.
Target drivers been supporting bidi commands. Vendor drivers use bidi interface
for management CDBS. I'm afraid that by the Linus rules you cannot remove bidi.
Orthogonal to t10-osd or not to be.

> Consequently, if a patch were to materialize that disentangled and
> removed OSD, I'd be inclined to merge it.
> 

Is there a way I can help with this?

> But I do think that this is an orthogonal discussion to the innocuous
> __weak attribute cleanup.
> 

Thanks
Boaz



Re: remove exofs and the T10 OSD code V2

2018-10-31 Thread Boaz Harrosh
On 31/10/18 23:10, Douglas Gilbert wrote:
> On 2018-10-31 4:57 p.m., Boaz Harrosh wrote:
>> On 30/10/18 09:45, Christoph Hellwig wrote:
>>> On Mon, Oct 29, 2018 at 02:42:12PM -0600, Jens Axboe wrote:
>>>> LGTM, for both:
>>>
>>> I also have this one on top as requested by Martin.  The core block
>>> bidi support is unfortunately also used by bsg-lib, although it is
>>> not anywhere near as invasive.  But that is another argument for
>>> looking into moving bsg-lib away from using block queues..
>>>
>>
>> BUT this patch is very very wrong.
>>
>> Totally apart from T10-OSD and its use in the field. Support for scsi BIDI
>> commands is not exclusive to T10-OSD at all. Even the simple scsi-array
>> command-set has BIDI operations defined. for example the write-return-xor
>> and so on.
>>
>> Also some private administrative CDBs of some vendor devices uses SCSI-BIDI.
>> So this patch just broke some drivers. (User-mode apps use bsg pass through)
>>
>> Also you might (try hard and) remove all usage of scsi-bidi as an initiator
>> from the Linux Kernel. But what about target mode. As a target we have 
>> supported
>> on the wire bidi protocols like write-return-xor and others for a long time.
>> Are you willing to silently break all these setups in the field on the next 
>> update?
>> Are you so sure these are never used?
>>
>> PLEASE, I beg of you guys. Do not remove SCSI-BIDI. It is a cry of 
>> generations.
>>
>> And I think by the rules of Linus, as far as target mode. You are not allowed
>> to break users in this way.
> 
> Hi,
> I'm in the process of rebuilding the sg driver with the following goals:
> 
>- undo 20 years of road wear, some of which is caused by literally
>  hundreds of "laparoscopic" patches that gradually ruin a driver,
>  at least from the maintainer's viewpoint. Comments that once made
>  sense become cryptic or just nonsense; naming schemes that
>  obliterate variable names to the extent that a random name
>  generator would be easier to follow; and just plain broken code.
>  For example, why sort out the existing locking at a particular
>  level when you can add a new lock in a completely non-orthogonal
>  way? [Yes, I looking at you, google.] Anyway, my first cut at this
>  is out there (on the linux-scsi list, see: "[PATCH v3 0/8] sg:
>  major cleanup, remove max_queue limit"). Not much new there,
>  unless you look very closely
> 
>- the next step is to add to the sg driver async SCSI command
>  capability based on the sg_io_v4 structure previously only used
>  by the bsg driver and now removed from bsg. The main advantage
>  of the sg_io_v4 structure over previous pass-through interface
>  attempts is the support of SCSI bidi commands
> 
>- as part of this effort introduce two new ioctls: SG_IOSUBMIT and
>  SG_IORECEIVE to replace the write()/read() technique currently
>  in use (since Linux 1.0 in 1992). The write()/read() technique
>  seems to be responsible for various security folks losing clumps
>  of their hair. One advantage of ioctls, as Alan Cox pointed out,
>  is the ability to write to and read back from the kernel in a way
>  that is naturally synchronized. [Actually, those security folks
>  shouldn't look too closely at sg_read() in that respect.]
> 
> In discussions with several folks who are on the T10 committee, I
> wondered why there was no READ GATHERED command (there has been a
> WRITE SCATTERED for 2 years). The answer was lack of interest ***,
> plus the difficultly of implementation. You see, READ GATHERED needs
> to send a scatter gather list to the device and get the corresponding
> data back (as a linear array). And that requires either:
>a) bidi commands (scatter gather list in the data-out, corresponding
>   "read" data in the data-in), or
>b) lng SCSI commands, up to around 256 bytes long in which the
>   sgat list is the latter part of that command
> 
> And the T10 folks say neither of those options is well supported or
> is expensive. 

It is supported in Linux scsi/osd driver is a proof of that. And expensive
it is not. I have demonstrated the ability to saturate a 10G link over
a raid of devices from a single writer. In OSD everything is bidi.

> I'm guessing they are referring to Linux and Windows.
> I haven't argued much beyond that point, but it looks like a bit of
> a chicken and egg situation.
> 
> 
> Don't know too much about the T10 OSD stuff. But I can see that it
> uses both long SCSI com

Re: remove exofs and the T10 OSD code V2

2018-10-31 Thread Boaz Harrosh
On 31/10/18 19:29, Bart Van Assche wrote:
> On Wed, 2018-10-31 at 18:34 +0200, Boaz Harrosh wrote:
>> On 27/10/18 11:20, Christoph Hellwig wrote:
>>> The only real user of the T10 OSD protocol, the pNFS object layout
>>> driver never went to the point of having shipping products, and we
>>> removed it 1.5 years ago.  Exofs is just a simple example without
>>> real life users.
>>>
>>> The code has been mostly unmaintained for years and is getting in the
>>> way of block / SCSI changes, so I think it's finally time to drop it.
>>>
>>> Quote from Boaz:
>>>
>>> "As I said then. It is used in Universities for studies and experiments.
>>> Every once in a while. I get an email with questions and reports.
>>>
>>> But yes feel free to remove the all thing!!
>>
>> I think I'm changing my mind about this.
>> [ ... ]
> 
> The osd driver was accepted in the upstream kernel in 2009. I have checked all
> commits in Linus' tree for the osd driver that went in since 2009. All changes
> made by other kernel developers than you are the result of tree-wide 
> refactoring,
> compiler warning fixes, fixes for issues detected by static source code
> analyzers or spelling fixes. Hence my question: how big is the user base of 
> the
> exofs and osd kernel drivers? 
> 

Not big at all. And none of them production setups. As I said mainly used
by academia.

Thanks
Boaz

> Thanks,
> Bart.



Re: remove exofs and the T10 OSD code V2

2018-10-31 Thread Boaz Harrosh
On 27/10/18 11:20, Christoph Hellwig wrote:
> The only real user of the T10 OSD protocol, the pNFS object layout
> driver never went to the point of having shipping products, and we
> removed it 1.5 years ago.  Exofs is just a simple example without
> real life users.
> 
> The code has been mostly unmaintained for years and is getting in the
> way of block / SCSI changes, so I think it's finally time to drop it.
> 
> Quote from Boaz:
> 
> "As I said then. It is used in Universities for studies and experiments.
> Every once in a while. I get an email with questions and reports.
> 
> But yes feel free to remove the all thing!!
> 

I think I'm changing my mind about this.

Because of two reasons:

One: I see 3 thousands bit-rots in the Kernel and particularly SCSI drivers
that are much older and fat-and-ugliness consuming then the clean osd
stack. For example the all ISA bus and ZONE_DMA stuff.

Two: I have offered many many times, every time this came up. That if
anyone has a major (or even minor) change to the block and/or scsi layers
that he/she has done. And that now breaks the compilation/run time of
OSD or exofs.
  I'm personally willing to spend my weekends and fix it myself. Send me
  a URL of the tree with the work done, and I will send the patches needed
  to revitalize OSD/exofs as part of that change set.

I have never received any such requests to date.

So I would please like to protest on two of Christoph's statements above.

"The code has been mostly unmaintained for years and is getting in the
 way of block / SCSI changes"

1. What does "maintained" means? I have for all these years been immediately
   responsive to any inquiries and patches to the code in question.
   And am listed as MAINTAINER of this code.
2. I have regularly, for ever, every kernel release around the RC3-RC4
   time frame, compiled and ran my almost automatic setup and made sure
   the things still run as expected (No regressions).

So Yes the code has not seen any new fixtures for years. But it is regularly
tested and proven to work, on latest kernel. So it fails the definition 
of a "bit rot"

Christoph you've been saying for so long "getting in the way of block/SCSI
changes". And every time and again this time please tell me, you never answered
before. What are these changes you want to make? can I please help?

Send me any tree where exofs/osd compilation is broken and I will personally
fix it in "ONE WEEK" time.

(If compilation is fine but you know runtime will break, its nice to have an
 heads up, but if not my automatic system will detect it anyway)

Lets say that if in the FUTURE a change-set is submitted that breaks OSD/EXOFS
compilation, and I failed to respond with a fix within "ONE WEEK". Then
this goes in as part of that change-set. And not with the argument of
"Not used, not maintained" - But as "Breaks compilation of the following 
changes"
I promise I will gladly ACK it then.

So for now. A personal NACK from me on the grounds that. You never told me
why / what this is braking.

Thanks
Boaz

> I guess I can put it up on github. In a public tree.
> 
> Just that I will need to forward port it myself, til now you guys
> been doing this for me ;-)"
> 



Re: remove exofs and the T10 OSD code V2

2018-10-31 Thread Boaz Harrosh
On 30/10/18 09:45, Christoph Hellwig wrote:
> On Mon, Oct 29, 2018 at 02:42:12PM -0600, Jens Axboe wrote:
>> LGTM, for both:
> 
> I also have this one on top as requested by Martin.  The core block
> bidi support is unfortunately also used by bsg-lib, although it is
> not anywhere near as invasive.  But that is another argument for
> looking into moving bsg-lib away from using block queues..
> 

BUT this patch is very very wrong.

Totally apart from T10-OSD and its use in the field. Support for scsi BIDI
commands is not exclusive to T10-OSD at all. Even the simple scsi-array
command-set has BIDI operations defined. for example the write-return-xor
and so on.

Also some private administrative CDBs of some vendor devices uses SCSI-BIDI.
So this patch just broke some drivers. (User-mode apps use bsg pass through)

Also you might (try hard and) remove all usage of scsi-bidi as an initiator
from the Linux Kernel. But what about target mode. As a target we have supported
on the wire bidi protocols like write-return-xor and others for a long time.
Are you willing to silently break all these setups in the field on the next 
update?
Are you so sure these are never used?

PLEASE, I beg of you guys. Do not remove SCSI-BIDI. It is a cry of generations.

And I think by the rules of Linus, as far as target mode. You are not allowed
to break users in this way.

Thanks
Boaz

> ---
>>From d6dd4f32798edd425a4df72f6125dba03e19d8c7 Mon Sep 17 00:00:00 2001
> From: Christoph Hellwig 
> Date: Sat, 27 Oct 2018 15:49:13 +0200
> Subject: scsi: remove bidirectional command support
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  drivers/scsi/cxgbi/libcxgbi.c  | 13 ++---
>  drivers/scsi/iscsi_tcp.c   |  9 +---
>  drivers/scsi/libiscsi.c| 64 +++-
>  drivers/scsi/libiscsi_tcp.c|  8 +--
>  drivers/scsi/scsi_debug.c  | 51 ---
>  drivers/scsi/scsi_error.c  |  3 --
>  drivers/scsi/scsi_lib.c| 80 ++
>  drivers/scsi/virtio_scsi.c | 14 ++
>  drivers/target/loopback/tcm_loop.c | 15 --
>  drivers/usb/storage/uas.c  | 11 +---
>  include/scsi/scsi_cmnd.h   | 19 +--
>  include/scsi/scsi_eh.h |  1 -
>  12 files changed, 35 insertions(+), 253 deletions(-)
> 
> diff --git a/drivers/scsi/cxgbi/libcxgbi.c b/drivers/scsi/cxgbi/libcxgbi.c
> index 75f876409fb9..4466ae5c9a74 100644
> --- a/drivers/scsi/cxgbi/libcxgbi.c
> +++ b/drivers/scsi/cxgbi/libcxgbi.c
> @@ -1211,7 +1211,7 @@ scmd_get_params(struct scsi_cmnd *sc, struct 
> scatterlist **sgl,
>   unsigned int *sgcnt, unsigned int *dlen,
>   unsigned int prot)
>  {
> - struct scsi_data_buffer *sdb = prot ? scsi_prot(sc) : scsi_out(sc);
> + struct scsi_data_buffer *sdb = prot ? scsi_prot(sc) : &sc->sdb;
>  
>   *sgl = sdb->table.sgl;
>   *sgcnt = sdb->table.nents;
> @@ -1427,8 +1427,7 @@ static void task_release_itt(struct iscsi_task *task, 
> itt_t hdr_itt)
>   log_debug(1 << CXGBI_DBG_DDP,
> "cdev 0x%p, task 0x%p, release tag 0x%x.\n",
> cdev, task, tag);
> - if (sc &&
> - (scsi_bidi_cmnd(sc) || sc->sc_data_direction == DMA_FROM_DEVICE) &&
> + if (sc && sc->sc_data_direction == DMA_FROM_DEVICE &&
>   cxgbi_ppm_is_ddp_tag(ppm, tag)) {
>   struct cxgbi_task_data *tdata = iscsi_task_cxgbi_data(task);
>   struct cxgbi_task_tag_info *ttinfo = &tdata->ttinfo;
> @@ -1460,9 +1459,7 @@ static int task_reserve_itt(struct iscsi_task *task, 
> itt_t *hdr_itt)
>   u32 tag = 0;
>   int err = -EINVAL;
>  
> - if (sc &&
> - (scsi_bidi_cmnd(sc) || sc->sc_data_direction == DMA_FROM_DEVICE)
> - ) {
> + if (sc && sc->sc_data_direction == DMA_FROM_DEVICE) {
>   struct cxgbi_task_data *tdata = iscsi_task_cxgbi_data(task);
>   struct cxgbi_task_tag_info *ttinfo = &tdata->ttinfo;
>  
> @@ -1896,7 +1893,7 @@ int cxgbi_conn_alloc_pdu(struct iscsi_task *task, u8 
> opcode)
>   if (SKB_MAX_HEAD(cdev->skb_tx_rsvd) > (512 * MAX_SKB_FRAGS) &&
>   (opcode == ISCSI_OP_SCSI_DATA_OUT ||
>(opcode == ISCSI_OP_SCSI_CMD &&
> -   (scsi_bidi_cmnd(sc) || sc->sc_data_direction == DMA_TO_DEVICE
> +   sc->sc_data_direction == DMA_TO_DEVICE)))
>   /* data could goes into skb head */
>   headroom += min_t(unsigned int,
>   SKB_MAX_HEAD(cdev->skb_tx_rsvd),
> @@ -1971,7 +1968,7 @@ int cxgbi_conn_init_pdu(struct iscsi_task *task, 
> unsigned int offset,
>   return 0;
>  
>   if (task->sc) {
> - struct scsi_data_buffer *sdb = scsi_out(task->sc);
> + struct scsi_data_buffer *sdb = &task->sc->sdb;
>   struct scatterlist *sg = NULL;
>   int err;
>  
> diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
> index 23354f206533..a78b46bb2b71 100644
>

Re: [PATCH] libosd: Remove ignored __weak attribute

2018-10-02 Thread Boaz Harrosh
On 02/10/18 17:56, Christoph Hellwig wrote:
> On Mon, Oct 01, 2018 at 06:16:32PM -0700, Bart Van Assche wrote:
>> Boaz, the most recent osd patch that is neither trivial nor treewide
>> refactoring is six years old (51976a8c85ce ("[SCSI] osd_uld: Add osdname &
>> systemid sysfs at scsi_osd class"). That suggests that nobody is using this
>> driver anymore. Can this driver be removed from the kernel tree?
> 
> Last time we tried to exofs and the osd code Boaz complained loudly,
> but as far as I can tell it really is not used and bitrotting, so we
> should finally go ahead and drop it.
> 

As I said then. It is used in Universities for studies and experiments.
Every once in a while. I get an email with questions and reports.

But yes feel free to remove the all thing!!

I guess I can put it up on github. In a public tree.

Just that I will need to forward port it myself, til now you guys
been doing this for me ;-)

Thanks, sorry for the hassle
Boaz



Re: [PATCH] scsi: libosd: Remove VLA usage

2018-05-13 Thread Boaz Harrosh
On 03/05/18 01:55, Kees Cook wrote:
> On the quest to remove all VLAs from the kernel[1] this rearranges the
> code to avoid a VLA warning under -Wvla (gcc doesn't recognize "const"
> variables as not triggering VLA creation). Additionally cleans up variable
> naming to avoid 80 character column limit.
> 
> [1] 
> https://lkml.kernel.org/r/CA+55aFzCG-zNmZwX4A2FQpadafLfEzK6CC=qpxydaacu1rq...@mail.gmail.com
> 

ACK-BY: Boaz Harrosh 

> Signed-off-by: Kees Cook 
> ---
>  drivers/scsi/osd/osd_initiator.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/scsi/osd/osd_initiator.c 
> b/drivers/scsi/osd/osd_initiator.c
> index e18877177f1b..917a86a2ae8c 100644
> --- a/drivers/scsi/osd/osd_initiator.c
> +++ b/drivers/scsi/osd/osd_initiator.c
> @@ -1842,14 +1842,14 @@ int osd_req_decode_sense_full(struct osd_request *or,
>   case osd_sense_response_integrity_check:
>   {
>   struct osd_sense_response_integrity_check_descriptor
> - *osricd = cur_descriptor;
> - const unsigned len =
> -   sizeof(osricd->integrity_check_value);
> - char key_dump[len*4 + 2]; /* 2nibbles+space+ASCII */
> -
> - hex_dump_to_buffer(osricd->integrity_check_value, len,
> -32, 1, key_dump, sizeof(key_dump), true);
> - OSD_SENSE_PRINT2("response_integrity [%s]\n", key_dump);
> + *d = cur_descriptor;
> + /* 2nibbles+space+ASCII */
> + char dump[sizeof(d->integrity_check_value) * 4 + 2];
> +
> + hex_dump_to_buffer(d->integrity_check_value,
> + sizeof(d->integrity_check_value),
> + 32, 1, dump, sizeof(dump), true);
> + OSD_SENSE_PRINT2("response_integrity [%s]\n", dump);
>   }
>   case osd_sense_attribute_identification:
>   {
> 



Re: [PATCH, RFC] MAINTAINERS: update OSD entries

2017-05-03 Thread Boaz Harrosh
On 05/02/2017 02:33 PM, Jeff Layton wrote:
> On Tue, 2017-05-02 at 09:57 +0200, Christoph Hellwig wrote:
>> The open-osd domain doesn't exist anymore, and mails to the list lead
>> to really annoying bounced that repeat every day.
>>
>> Also the primarydata address for Benny bounces, and while I have a new
>> one for him he doesn't seem to be maintaining the OSD code any more.
>>
>> Which beggs the question:  should we really leave the Supported status
>> in MAINTAINERS given that the code is barely maintained?
>>
>> Signed-off-by: Christoph Hellwig 
>> ---
>>  MAINTAINERS | 4 
>>  1 file changed, 4 deletions(-)
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index 1bb06c5f7716..28dd83a1d9e2 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -9418,10 +9418,6 @@ F:drivers/net/wireless/intersil/orinoco/
>>  
>>  OSD LIBRARY and FILESYSTEM
>>  M:  Boaz Harrosh 
>> -M:  Benny Halevy 
>> -L:  osd-...@open-osd.org
>> -W:  http://open-osd.org
>> -T:  git git://git.open-osd.org/open-osd.git
>>  S:  Maintained
>>  F:  drivers/scsi/osd/
>>  F:  include/scsi/osd_*
> 
> Hah, you beat me to it! I was going to spin up a patch for this today.
> 
> Acked-by: Jeff Layton 

Acked-by: Boaz Harrosh 

> 



Re: [PATCH 3/4] nfs: remove the objlayout driver

2017-04-23 Thread Boaz Harrosh
On 04/21/2017 05:00 PM, Trond Myklebust wrote:
> Maintenance is not development. It’s about doing all the followup
> _after_ the feature is declared to be developed. That’s been missing
> for quite some time in the case of the OSD pNFS code, which is why
> I’m not even bothering to consider staging. If you are saying you are
> still maintaining exofs, and want to continue doing so, then great,
> but note that there is a file called BUGS in that directory that has
> been untouched since 2008, and that’s why I thing staging is a good
> idea.
> 

No, the BUGS file is just stale. As you said was not ever updated. All
the bugs (1) in there do no longer exist for a long long time.
I will send a patch to remove the file.

Yes I maintain this fs. It has the complete fixture list, of what was
first intended. I keep running it every major kernel release to make
sure it actually runs, and there are no regressions.

If this FS stands in the way of any new development please anyone
contact me and I will help in the conversion. Of exofs and the
osd-uld.

Thanks
Boaz



Re: [PATCH 3/4] nfs: remove the objlayout driver

2017-04-21 Thread Boaz Harrosh
On 04/20/2017 11:18 PM, Trond Myklebust wrote:
<>
> 
> OK. I'm applying this patch for the 4.12 merge window. 

That is understandable this code was not tested for a long while

> If, as Boaz
> suggests, there is still an interest in exofs, then I suggest we put
> that to the test by moving it into the STAGING area, to see if someone
> will step up to maintain it.
> 

I do not understand what you mean by "someone will maintain it"
What is it that is missing in exofs that you see needs development?

As I said I regularly run and test this FS and update as things advance
there where no new fixtures for a while, but no regressions either.

Please explain what it means "will maintain it"

> Cheers
>   Trond
> 

Thanks
Boaz



Re: linux-next: manual merge of the scsi-mkp tree with the char-misc tree

2017-04-20 Thread Boaz Harrosh
On 04/07/2017 10:50 PM, Bart Van Assche wrote:
> On Fri, 2017-04-07 at 13:29 -0600, Logan Gunthorpe wrote:
>> On 07/04/17 09:49 AM, Bart Van Assche wrote:
>>> Sorry that I had not yet noticed Logan's patch series. Should my two
>>> patches that conflict with Logan's patch series be dropped and reworked
>>> after Logan's patches are upstream?
>>
>> Yeah, Greg took my patchset around a few maintainers relatively quickly.
>> This is the second conflict, so sorry about that. Looks like the easiest
>> thing would be to just base your change off of mine. It doesn't look too
>> difficult. If you can do it before my patch hits upstream, I'd
>> appreciate some testing and/or review as no one from the scsi side
>> responded and that particular patch was a bit more involved than I would
>> have liked.
> 
> Boaz, had you noticed Logan's osd patch? If not, can you have a look?
> 

I did look, I even sent an ACK on one of the early versions.
The merge breakage is more of a build issue because I never
had get_device fail for me in my testing so it is more
academic.

Yes they both look fine BTW
Boaz

> Thanks,
> Bart.
> 



Re: [PATCH 1/4] block: remove the osdblk driver

2017-04-19 Thread Boaz Harrosh
On 04/12/2017 07:01 PM, Christoph Hellwig wrote:
> This was just a proof of concept user for the SCSI OSD library, and
> never had any real users.
> 
> Signed-off-by: Christoph Hellwig 

Yes please remove this driver

ACK-by Boaz Harrosh 

> ---
>  drivers/block/Kconfig  |  16 --
>  drivers/block/Makefile |   1 -
>  drivers/block/osdblk.c | 693 
> -
>  3 files changed, 710 deletions(-)
>  delete mode 100644 drivers/block/osdblk.c
> 
> diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
> index a1c2e816128f..58e24c354933 100644
> --- a/drivers/block/Kconfig
> +++ b/drivers/block/Kconfig
> @@ -312,22 +312,6 @@ config BLK_DEV_SKD
>  
>   Use device /dev/skd$N amd /dev/skd$Np$M.
>  
> -config BLK_DEV_OSD
> - tristate "OSD object-as-blkdev support"
> - depends on SCSI_OSD_ULD
> - ---help---
> -   Saying Y or M here will allow the exporting of a single SCSI
> -   OSD (object-based storage) object as a Linux block device.
> -
> -   For example, if you create a 2G object on an OSD device,
> -   you can then use this module to present that 2G object as
> -   a Linux block device.
> -
> -   To compile this driver as a module, choose M here: the
> -   module will be called osdblk.
> -
> -   If unsure, say N.
> -
>  config BLK_DEV_SX8
>   tristate "Promise SATA SX8 support"
>   depends on PCI
> diff --git a/drivers/block/Makefile b/drivers/block/Makefile
> index b12c772bbeb3..42e8cee1cbc7 100644
> --- a/drivers/block/Makefile
> +++ b/drivers/block/Makefile
> @@ -22,7 +22,6 @@ obj-$(CONFIG_CDROM_PKTCDVD) += pktcdvd.o
>  obj-$(CONFIG_MG_DISK)+= mg_disk.o
>  obj-$(CONFIG_SUNVDC) += sunvdc.o
>  obj-$(CONFIG_BLK_DEV_SKD)+= skd.o
> -obj-$(CONFIG_BLK_DEV_OSD)+= osdblk.o
>  
>  obj-$(CONFIG_BLK_DEV_UMEM)   += umem.o
>  obj-$(CONFIG_BLK_DEV_NBD)+= nbd.o
> diff --git a/drivers/block/osdblk.c b/drivers/block/osdblk.c
> deleted file mode 100644
> index 8127b8201a01..
> --- a/drivers/block/osdblk.c
> +++ /dev/null
> @@ -1,693 +0,0 @@
> -
> -/*
> -   osdblk.c -- Export a single SCSI OSD object as a Linux block device
> -
> -
> -   Copyright 2009 Red Hat, Inc.
> -
> -   This program is free software; you can redistribute it and/or modify
> -   it under the terms of the GNU General Public License as published by
> -   the Free Software Foundation.
> -
> -   This program is distributed in the hope that it will be useful,
> -   but WITHOUT ANY WARRANTY; without even the implied warranty of
> -   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> -   GNU General Public License for more details.
> -
> -   You should have received a copy of the GNU General Public License
> -   along with this program; see the file COPYING.  If not, write to
> -   the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
> -
> -
> -   Instructions for use
> -   
> -
> -   1) Map a Linux block device to an existing OSD object.
> -
> -  In this example, we will use partition id 1234, object id 5678,
> -  OSD device /dev/osd1.
> -
> -  $ echo "1234 5678 /dev/osd1" > /sys/class/osdblk/add
> -
> -
> -   2) List all active blkdev<->object mappings.
> -
> -  In this example, we have performed step #1 twice, creating two blkdevs,
> -  mapped to two separate OSD objects.
> -
> -  $ cat /sys/class/osdblk/list
> -  0 174 1234 5678 /dev/osd1
> -  1 179 1994 897123 /dev/osd0
> -
> -  The columns, in order, are:
> -  - blkdev unique id
> -  - blkdev assigned major
> -  - OSD object partition id
> -  - OSD object id
> -  - OSD device
> -
> -
> -   3) Remove an active blkdev<->object mapping.
> -
> -  In this example, we remove the mapping with blkdev unique id 1.
> -
> -  $ echo 1 > /sys/class/osdblk/remove
> -
> -
> -   NOTE:  The actual creation and deletion of OSD objects is outside the 
> scope
> -   of this driver.
> -
> - */
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#define DRV_NAME "osdblk"
> -#define PFX DRV_NAME ": "
> -
> -/* #define _OSDBLK_DEBUG */
> -#ifdef _OSDBLK_DEBUG
> -#define OSDBLK_DEBUG(fmt, a...) \
> - printk(KERN_NOTICE "osdblk @%s:%d: " fmt, __func__, __LINE__, ##a)
> -#else
> -#define OSDBLK_DEBUG(fmt, a...) \
> - do { if (0) printk(fmt, ##a); } while (0)
> -#endif
> -
> -MODULE_AUTHOR("

Re: RFC: drop the T10 OSD code and its users

2017-04-18 Thread Boaz Harrosh
On 04/12/2017 07:01 PM, Christoph Hellwig wrote:

Hi Sir Christoph

> The only real user of the T10 OSD protocol, the pNFS object layout
> driver never went to the point of having shipping products, and the
> other two users (osdblk and exofs) were simple example of it's usage.
> 

I understand why osdblk might be a pain, and was broken from day one, and
should by all means go away ASAP.

But exofs should not be bothering anyone, and as far as I know does
not use any special API's except the osd_uld code of course.

> The code has been mostly unmaintained for years and is getting in the
> way of block / SCSI changes, so I think it's finally time to drop it.
> 

Please tell me what are those changes you are talking about? I might be
able to help in conversion. I guess you mean osd_uld and the Upper SCSI API.
Just point me at a tree where osd_uld is broken, and perhaps with a little
guidance from you I can do a satisfactory conversion.

Is true that no new code went in for a long while, but I still from time
to time run a setup and test that the all stack, like iscsi-bidi and so on still
works.

That said, yes only a stand alone exofs was tested for a long time, a full
pnfs setup is missing any supporting server. So Yes I admit that pnfs-obj is
getting very rotten. And is most probably broken, on the pnfs side of things.
[Which I admit makes my little plea kind of mute ;-) ]

Every once in a while I get emails from Students basing all kind of interesting
experiments on top of the exofs and object base storage. So for the sake of 
academics
and for the sake of a true bidi-stack testing, might we want to evaluate what 
is the
up coming cost, and what is a minimum set we are willing to keep?

Please advise?

thanks
Boaz

> These patches are against Jens' block for-next tree as that already
> has various modifications of the SCSI code.
> 



Re: [PATCH] osd_uld: Check scsi_device_get() return value

2017-04-05 Thread Boaz Harrosh
On 03/30/2017 08:17 PM, Bart Van Assche wrote:
> scsi_device_get() can fail. Hence check its return value.
> 
> Signed-off-by: Bart Van Assche 
> Cc: Boaz Harrosh 

Cool thanks
ACK-by: Boaz Harrosh 

> ---
>  drivers/scsi/osd/osd_uld.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
> index e0ce5d2fd14d..1dea4244dd0c 100644
> --- a/drivers/scsi/osd/osd_uld.c
> +++ b/drivers/scsi/osd/osd_uld.c
> @@ -464,14 +464,15 @@ static int osd_probe(struct device *dev)
>   /* hold one more reference to the scsi_device that will get released
>* in __release, in case a logout is happening while fs is mounted
>*/
> - scsi_device_get(scsi_device);
> + if (scsi_device_get(scsi_device))
> + goto err_put_disk;
>   osd_dev_init(&oud->od, scsi_device);
>  
>   /* Detect the OSD Version */
>   error = __detect_osd(oud);
>   if (error) {
>   OSD_ERR("osd detection failed, non-compatible OSD device\n");
> - goto err_put_disk;
> + goto err_put_sdev;
>   }
>  
>   /* init the char-device for communication with user-mode */
> @@ -508,8 +509,9 @@ static int osd_probe(struct device *dev)
>  
>  err_put_cdev:
>   cdev_del(&oud->cdev);
> -err_put_disk:
> +err_put_sdev:
>   scsi_device_put(scsi_device);
> +err_put_disk:
>   put_disk(disk);
>  err_free_osd:
>   dev_set_drvdata(dev, NULL);
> 



Re: [PATCH] scsi: osd_uld: remove an unneeded NULL check

2017-03-23 Thread Boaz Harrosh
On 03/23/2017 12:41 PM, Dan Carpenter wrote:
> We don't call the remove() function unless probe() succeeds so "oud"
> can't be NULL here.  Plus, if it were NULL, we dereference it on the
> next line so it would crash anyway.
> 
> Signed-off-by: Dan Carpenter 
> 

Thanks sure!
ACK-by Boaz Harrosh 

> diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
> index 4101c3178411..8b9941a5687a 100644
> --- a/drivers/scsi/osd/osd_uld.c
> +++ b/drivers/scsi/osd/osd_uld.c
> @@ -507,10 +507,9 @@ static int osd_remove(struct device *dev)
>   struct scsi_device *scsi_device = to_scsi_device(dev);
>   struct osd_uld_device *oud = dev_get_drvdata(dev);
>  
> - if (!oud || (oud->od.scsi_device != scsi_device)) {
> - OSD_ERR("Half cooked osd-device %p,%p || %p!=%p",
> - dev, oud, oud ? oud->od.scsi_device : NULL,
> - scsi_device);
> + if (oud->od.scsi_device != scsi_device) {
> + OSD_ERR("Half cooked osd-device %p, || %p!=%p",
> + dev, oud->od.scsi_device, scsi_device);
>   }
>  
>   cdev_device_del(&oud->cdev, &oud->class_dev);
> 



Re: [Lsf-pc] [LSF/MM TOPIC] do we really need PG_error at all?

2017-02-28 Thread Boaz Harrosh
On 02/28/2017 03:11 AM, Jeff Layton wrote:
<>
> 
> I'll probably have questions about the read side as well, but for now it
> looks like it's mostly used in an ad-hoc way to communicate errors
> across subsystems (block to fs layer, for instance).

If memory does not fail me it used to be checked long time ago in the
read-ahead case. On the buffered read case, the first page is read synchronous
and any error is returned to the caller, but then a read-ahead chunk is
read async all the while the original thread returned to the application.
So any errors are only recorded on the page-bit, since otherwise the uptodate
is off and the IO will be retransmitted. Then the move to read_iter changed
all that I think.
But again this is like 5-6 years ago, and maybe I didn't even understand
very well.

> --
> Jeff Layton 
> 

I would like a Documentation of all this as well please. Where are the
tests for this?

Thanks
Boaz



Re: [PATCH v2] osd: remove deadcode

2016-02-24 Thread Boaz Harrosh
On 02/24/2016 01:21 PM, Sudip Mukherjee wrote:
> The variable is_ver1 is always true and so OSD_CAP_LEN can never be
> used.
> Reported by Coverity.
> 
> Signed-off-by: Sudip Mukherjee 

ACK-by: Boaz harrosh 

Thanks
> ---
> 
> v2: Joe Perches asked to mention the tool used in the commit log.
> 
>  drivers/scsi/osd/osd_initiator.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/osd/osd_initiator.c 
> b/drivers/scsi/osd/osd_initiator.c
> index d8a2b51..3b11aad 100644
> --- a/drivers/scsi/osd/osd_initiator.c
> +++ b/drivers/scsi/osd/osd_initiator.c
> @@ -2006,9 +2006,8 @@ EXPORT_SYMBOL(osd_sec_init_nosec_doall_caps);
>   */
>  void osd_set_caps(struct osd_cdb *cdb, const void *caps)
>  {
> - bool is_ver1 = true;
>   /* NOTE: They start at same address */
> - memcpy(&cdb->v1.caps, caps, is_ver1 ? OSDv1_CAP_LEN : OSD_CAP_LEN);
> + memcpy(&cdb->v1.caps, caps, OSDv1_CAP_LEN);
>  }
>  
>  bool osd_is_sec_alldata(struct osd_security_parameters *sec_parms __unused)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] [SCSI] osd: fix signed char versus %02x issue

2015-12-08 Thread Boaz Harrosh
On 12/08/2015 04:25 PM, Rasmus Villemoes wrote:
> If char is signed and one of these bytes happen to have a value
> outside the ascii range, the corresponding output will consist of
> "ff" followed by the two hex chars that were actually
> intended. One way to fix it would be to change the casts to (u8*) aka
> (unsigned char*), but it is much simpler (and generates smaller code)
> to use the %ph extension which was created for such short hexdumps.
> 

Ha real cool, thanks I hated that crap

ACK-by: Boaz Harrosh 

> Signed-off-by: Rasmus Villemoes 
> ---
>  drivers/scsi/osd/osd_initiator.c | 5 +
>  1 file changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/scsi/osd/osd_initiator.c 
> b/drivers/scsi/osd/osd_initiator.c
> index 0cccd6033feb..d8a2b5185f56 100644
> --- a/drivers/scsi/osd/osd_initiator.c
> +++ b/drivers/scsi/osd/osd_initiator.c
> @@ -170,10 +170,7 @@ static int _osd_get_print_system_info(struct osd_dev *od,
>  
>   /* FIXME: Where are the time utilities */
>   pFirst = get_attrs[a++].val_ptr;
> - OSD_INFO("CLOCK  [0x%02x%02x%02x%02x%02x%02x]\n",
> - ((char *)pFirst)[0], ((char *)pFirst)[1],
> - ((char *)pFirst)[2], ((char *)pFirst)[3],
> - ((char *)pFirst)[4], ((char *)pFirst)[5]);
> + OSD_INFO("CLOCK  [0x%6phN]\n", pFirst);
>  
>   if (a < nelem) { /* IBM-OSD-SIM bug, Might not have it */
>   unsigned len = get_attrs[a].len;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [REGRESSION v4.3] scsi_dh: use-after-free when removing scsi device

2015-09-30 Thread Boaz Harrosh
On 09/30/2015 12:28 PM, Hannes Reinecke wrote:
<>
> Pushing things into the background is typically not the best of
> ideas; actually I've been running into issues with udev not being
> complete by the time the next round is started. So more often than
> not I would be greeted with messages:
> 
> 'write: no such file or directory'
> 
> when executing this line. Removing the '&' at the end made this
> warning go away.
> 
> And actually I'm not sure if the above script is a valid testcase;

So are you saying it is allowed to crash the Kernel with a crappy
script?

> from what I've seen there is no locking / reference counting when
> accessing sysfs attributes. So as soon as you _can_ access the sysfs
> attribute it is implicitly assumed to be valid.
> In your case you will be _removing_ the sysfs attribute even though
> it is still accessed, which of course will crash.
> 

Is that allowed? for usermode script to race and crash the Kernel?

>From the original email it sounds like this used to be fine and it
now crashes (with the &)

Thanks
Boaz

> Can you still reproduce this problem after removing the '&' in that
> line?
> 
>>  echo "-- delete $dev --" > /dev/kmsg
>>  echo 1 > /sys/class/scsi_device/${dev}/device/delete
>>
>>  n=$((n + 1))
>> done
>> --- cut here --
> 
> Having said that I've retried your test script with my ALUA handler
> update, and didn't find any issues there.
> It happily completed about 500 rounds at which point I got bored.
> Of course, this is after removing the '&' in the said line.
> 
> Cheers,
> 
> Hannes
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] SCSI-OSD: Delete an unnecessary check before the function call "put_disk"

2015-06-28 Thread Boaz Harrosh
On 06/24/2015 05:16 PM, SF Markus Elfring wrote:
> From: Markus Elfring 
> Date: Wed, 24 Jun 2015 16:06:21 +0200
> 
> The put_disk() function tests whether its argument is NULL and then
> returns immediately. Thus the test around the call is not needed.
> 
> This issue was detected by using the Coccinelle software.
> 
> Signed-off-by: Markus Elfring 

ACK-by: Boaz Harrosh 

> ---
>  drivers/scsi/osd/osd_uld.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
> index 243eab3..e2075522 100644
> --- a/drivers/scsi/osd/osd_uld.c
> +++ b/drivers/scsi/osd/osd_uld.c
> @@ -407,9 +407,7 @@ static void __remove(struct device *dev)
>  
>   OSD_INFO("osd_remove %s\n",
>oud->disk ? oud->disk->disk_name : NULL);
> -
> - if (oud->disk)
> - put_disk(oud->disk);
> + put_disk(oud->disk);
>   ida_remove(&osd_minor_ida, oud->minor);
>  
>   kfree(oud);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 9/9] snic:Add Makefile, patch Kconfig, MAINTAINERS

2015-05-27 Thread Boaz Harrosh
On 05/27/2015 01:02 PM, Narsimhulu Musini (nmusini) wrote:
<>
>>> +ifeq ($(CONFIG_SCSI_SNIC_DEBUG_FS), y)
>>> +ccflags-y += -DSNIC_DEBUG_FS
>>
>> Why do you need an extra define here just use
>> CONFIG_SCSI_SNIC_DEBUG_FS in source code directly
> Agree, I just want to use a shorter macro in the source.

Don't do this. It is a convention that if a programmer
sees a CONFIG_XXX he knows that this is settable by a
Kconfig. By making it shorter the programmer will think that
this is dead code because it is not set anywhere.

>>
>>> +snic-y += snic_debugfs.o \
>>> +   snic_trc.o
>>> +endif
>>>
>>
>> snic-$(CONFIG_SCSI_SNIC_DEBUG_FS) += snic_debugfs.o
> If CONFIG_SCSI_SNIC_DEBUGFS is not set, then it leaves a build variable
> ³snic-" in build system. ifeq() avoids such thing.

That is fine. This is done all over the Kernel Makefile.
There are two tons of these "do nothing make variables"

It the way we do it in the Kernel. (I actually like it)

>>
>> You do not the  ifeq () thing at all
>>
>> Cheers
>> Boaz
>>
> Thanks
> simha
>>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 9/9] snic:Add Makefile, patch Kconfig, MAINTAINERS

2015-05-27 Thread Boaz Harrosh
On 05/27/2015 10:19 AM, Narsimhulu Musini wrote:
> Kconfig for kbuild
> Makefile to build snic module
> 
> Updated MAINTAINERS file
> 
> Signed-off-by: Narsimhulu Musini 
> Signed-off-by: Sesidhar Baddela 
> ---
> * v3
> - Added additional config section (CONFIG_SNIC_DEBUG_FS) for enabling 
> debugging
>   functionality.
> 
> * v2
> - Added compile time flags for debugfs dependent functionality.
> 
>  MAINTAINERS|  7 +++
>  drivers/scsi/Kconfig   | 17 +
>  drivers/scsi/Makefile  |  1 +
>  drivers/scsi/snic/Makefile | 21 +
>  4 files changed, 46 insertions(+)
>  create mode 100644 drivers/scsi/snic/Makefile
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 2a97e05..368fb76 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2536,6 +2536,13 @@ L: linux-scsi@vger.kernel.org
>  S:   Supported
>  F:   drivers/scsi/fnic/
>  
> +CISCO SCSI HBA DRIVER
> +M:   Narsimhulu Musini 
> +M:   Sesidhar Baddela 
> +L:   linux-scsi@vger.kernel.org
> +S:   Supported
> +F:   drivers/scsi/snic/
> +
>  CMPC ACPI DRIVER
>  M:   Thadeu Lima de Souza Cascardo 
>  M:   Daniel Oliveira Nascimento 
> diff --git a/drivers/scsi/Kconfig b/drivers/scsi/Kconfig
> index 9c92f41..8baab3f 100644
> --- a/drivers/scsi/Kconfig
> +++ b/drivers/scsi/Kconfig
> @@ -634,6 +634,23 @@ config FCOE_FNIC
> .
> The module will be called fnic.
>  
> +config SCSI_SNIC
> + tristate "Cisco SNIC Driver"
> + depends on PCI && SCSI && X86_64
> + help
> +   This is support for the Cisco PCI-Express SCSI HBA.
> +
> +   To compile this driver as a module, choose M here and read
> +   .
> +   The module will be called snic.
> +
> +config SCSI_SNIC_DEBUG_FS
> + bool "Cisco SNIC Driver Debugfs Support"
> + depends on SCSI_SNIC && DEBUG_FS
> + help
> +   This enables to list debugging information from SNIC Driver
> +   available via debugfs file system
> +
>  config SCSI_DMX3191D
>   tristate "DMX3191D SCSI support"
>   depends on PCI && SCSI
> diff --git a/drivers/scsi/Makefile b/drivers/scsi/Makefile
> index 58158f1..f643942 100644
> --- a/drivers/scsi/Makefile
> +++ b/drivers/scsi/Makefile
> @@ -39,6 +39,7 @@ obj-$(CONFIG_LIBFC) += libfc/
>  obj-$(CONFIG_LIBFCOE)+= fcoe/
>  obj-$(CONFIG_FCOE)   += fcoe/
>  obj-$(CONFIG_FCOE_FNIC)  += fnic/
> +obj-$(CONFIG_SCSI_SNIC)  += snic/
>  obj-$(CONFIG_SCSI_BNX2X_FCOE)+= libfc/ fcoe/ bnx2fc/
>  obj-$(CONFIG_ISCSI_TCP)  += libiscsi.o   libiscsi_tcp.o iscsi_tcp.o
>  obj-$(CONFIG_INFINIBAND_ISER)+= libiscsi.o
> diff --git a/drivers/scsi/snic/Makefile b/drivers/scsi/snic/Makefile
> new file mode 100644
> index 000..572102a
> --- /dev/null
> +++ b/drivers/scsi/snic/Makefile
> @@ -0,0 +1,21 @@
> +obj-$(CONFIG_SCSI_SNIC) += snic.o
> +
> +snic-y := \
> + snic_attrs.o \
> + snic_main.o \
> + snic_res.o \
> + snic_isr.o \
> + snic_ctl.o \
> + snic_io.o \
> + snic_scsi.o \
> + snic_disc.o \
> + vnic_cq.o \
> + vnic_intr.o \
> + vnic_dev.o \
> + vnic_wq.o
> +
> +ifeq ($(CONFIG_SCSI_SNIC_DEBUG_FS), y)
> +ccflags-y += -DSNIC_DEBUG_FS

Why do you need an extra define here just use
CONFIG_SCSI_SNIC_DEBUG_FS in source code directly

> +snic-y += snic_debugfs.o \
> + snic_trc.o
> +endif
> 

snic-$(CONFIG_SCSI_SNIC_DEBUG_FS) += snic_debugfs.o

You do not the  ifeq () thing at all

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scatterlist: enable sg chaining for all architectures

2015-04-29 Thread Boaz Harrosh
On 04/29/2015 05:15 AM, James Bottomley wrote:
> 
> Perhaps the best thing to do is just fix target and call it quits?
> 

Right! drivers write code for sg_chaining and on ARCHs that do not
support it the code just works.
Only the max_sg is smaller and the chaining code never kicks in
and is dead code for these ARCHs.

> James

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/17] Clear up bidi command confusion

2015-01-29 Thread Boaz Harrosh
On 01/29/2015 04:41 PM, James Bottomley wrote:
> On Thu, 2015-01-29 at 15:00 +0200, Boaz Harrosh wrote:
>> On 01/23/2015 03:12 PM, Christoph Hellwig wrote:
>>> On Fri, Jan 23, 2015 at 01:05:30PM +0100, Bart Van Assche wrote:
>>>> There is some confusion in the SCSI core and in SCSI LLDs around the
>>>> meaning of sc_data_direction and whether or not this member can have the
>>>> value DMA_BIDIRECTIONAL. Clear up this confusion. The patches in this
>>>> series are:
>>>
>>> I wonder if we should change the code to set DMA_BIDIRECTIONAL for
>>> bidi commands.  That seems a lot more logical than the current
>>> version.
>>>
>>
>> You cannot do this. Because a Bidi Command is actually two commands
>> one for the WRITE side (DMA_TO_DEVICE) which is this one, and another
>> command for the READ side (DMA_FROM_DEVICE).
> 
> You're not thinking about this the correct way.  DMA_BIDIRECTIONAL is a
> DMA engine flag.  We use it in SCSI for historical reasons (mainly to
> prevent a translation from the SCSI version).  Since it was never used,
> most SCSI drivers have code to check and reject commands with it set.
> For actual bidirectional commands, you have to check scsi_bidi_command()
> and programme the DMA engine separately for both phases, so the flag is
> useless in this case.
> 

This is what I meant how is above different from what I said?

> The proposal is to make it the signal for bidirectional commands

This I disagree, and would be a very bad idea and will take us back, to
hacking time. As you noted, please let us leave DMA_BIDIRECTIONAL to the
DMA engines. Let us just stir away from the DMA_XXX flags altogether and
move back to block layer flags. 

 (in which case most HBAs would make it do the right thing; i.e. reject) 

This is not necessary, (the check and rejection), commands will not be issued
to LLDs that did not mark support for BiDi. So they will never receive
such commands and it is added dead code.

> but
> it still wouldn't be how you'd program the DMA enigne; that still would
> have to be done in two phases as it is today.
> 

Exactly my point. Only one small correction, these are not two phases, as in
phases-1 then phases-2. These are "two" memory buffers independently programmed
for DMA transfer. The actual transfer occurs simultaneously, on both buffers.

> James
> 
> 

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/17] Clear up bidi command confusion

2015-01-29 Thread Boaz Harrosh
On 01/29/2015 03:20 PM, Bart Van Assche wrote:
> On 01/29/15 14:07, Boaz Harrosh wrote:
>> On 01/26/2015 11:58 AM, Bart Van Assche wrote:
>>
>>> Hello Christoph,
>>>
>>> This makes sense to me. I will rework this patch series as you proposed.
>>
>> Do you have a bidi setup to test against?
>>
>> Sending xor command to scsi_dbg is only half the test for me because, yes
>> there are two buffers, but they are of same size so bugs might be masked.
>> (From experience)
> 
> Hello Boaz,
> 
> If anyone would like to submit a scsi_debug patch that adds support to
> that kernel driver for a bidi command for which the input and output
> buffers have different sizes I think that would be helpful.
> 

Hi Bart.

scsi_dbg is a scsi-blk-device type device (forgot the exact name).
I do not think there is such a command defined by the STD for this command
set.

its only for other device types like osd, printer and so on.

But sending XOR commands to scsi_dbg is a good start.

> Bart.
> 

Please Note: I do not agree for the use of the constant DMA_BIDIRECTIONAL
to denote scsi_bidi_cmnd(cmd). This is wrong and takes us back not
forward.

If anything at all is done, cmd->sc_data_direction should be just removed
all together. It is not needed and denotes nothing more than what is already
available at the request level. Please do not add any more new code on top
of cmd->sc_data_direction. (At minimum just remove the all DMA_BIDIRECTIONAL
checks and you are done)

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/17] Clear up bidi command confusion

2015-01-29 Thread Boaz Harrosh
On 01/26/2015 11:58 AM, Bart Van Assche wrote:

> Hello Christoph,
> 
> This makes sense to me. I will rework this patch series as you proposed.
> 

Do you have a bidi setup to test against?

Sending xor command to scsi_dbg is only half the test for me because, yes
there are two buffers, but they are of same size so bugs might be masked.
(From experience)

Thanks
Boaz

> Bart.

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/17] Clear up bidi command confusion

2015-01-29 Thread Boaz Harrosh
On 01/23/2015 03:12 PM, Christoph Hellwig wrote:
> On Fri, Jan 23, 2015 at 01:05:30PM +0100, Bart Van Assche wrote:
>> There is some confusion in the SCSI core and in SCSI LLDs around the
>> meaning of sc_data_direction and whether or not this member can have the
>> value DMA_BIDIRECTIONAL. Clear up this confusion. The patches in this
>> series are:
> 
> I wonder if we should change the code to set DMA_BIDIRECTIONAL for
> bidi commands.  That seems a lot more logical than the current
> version.
> 

You cannot do this. Because a Bidi Command is actually two commands
one for the WRITE side (DMA_TO_DEVICE) which is this one, and another
command for the READ side (DMA_FROM_DEVICE).

The DMA_XXX_ enum must not to be confused with the t10-scsi Bidi commands.
In DMA world the enum means: What are the allowed access on the memory buffer,
is Device allowed to read-from-memory-only or write-to-memory-only or both
simultaneously "on the same buffer".
It is a flag to the IO-mmu on the direction of its mappings.

A t10-scsi bidi command is "two buffers". The command caries two distinct
buffers one is only-written-to 2nd only-read-from. But these are two distinct
buffers each his proper direction.

If you ask me you need to remove sc_data_direction all together and just
go directly to the READ/WRITE flag at request level. SCSI has no business
duplicating the same information in another member another way.

In any way t10-scsi has no use in the entire of its STD of the DMA_BIDIRECTIONAL
sense. ie. same buffer, two directional access. So DMA_BIDIRECTIONAL is just
dead code for sc_data_direction. It could be imagined but t10-scsi does not
have a use for it.

Again Not to be confused with one-command two buffers, which is exactly the 
opposite.

> Also I don't think all the debug checks for bidi commands that you
> change should stay at all - driver need to set the QUEUE_FLAG_BIDI to
> ever see a bidi command.
> 

Exactly just remove the all checks.

> It would also nice to add a host template flag for bidi support instead
> of having to poke into the block layer request_queue while we're at it.

Cheers
Boaz


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Large disk drives

2014-11-06 Thread Boaz Harrosh
On 11/06/2014 05:53 PM, Alan Stern wrote:
>> But just the simple case of read-capacity failure should we then?
> 
> That's a separate question.  As far as I know, the case you are 
> describing has not come up.
> 

BTW: what we should do is when the partition parser at the block layer
see that the partition capacity as written in the partition-table is
bigger then the capacity reported for the device we can put a fat
message at dmesg with both sizes and user can decide.
And/or add an extra field at block attributes for part_capacity. It
is info that is known and parsed for today.

> Alan Stern

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Large disk drives

2014-11-06 Thread Boaz Harrosh
On 11/06/2014 05:54 PM, James Bottomley wrote:
> 
> We don't have a failure.  This is the problem.  Determining that a
> problem exists 
> 

OK Sorry. I assumed the bridge is smart enough to do nothing,

ie READ_CAPACITY_10 is passed as is via sata to the device that
actually supports READ_CAPACITY_16, as I understand then the 
actual good drive is not suppose to send size-modulue-2G in
response to READ_CAPACITY_10. Should it ?

Then the bridge just sends that back to me, now if I send
READ_CAPACITY_16 the bridge will return NOT-SUPPORTED because
it is unexpected.

But what are you saying that the bridge was smart enough to
do READ_CAPACITY_16 get a 64bit value from the drive and then
return the lower 32bit to me ? Really ? I would not imagine
in the life of me someone so dumb. And surly it is against
any spec.

Are you sure ? I think you are wrong I think the guy reported
that he can only see 2T out of his 3T drive which means the
bridge returned 0x, exactly 2T

> James
> 
> 

But hey, I guess stupidity is limitless, this one here pushing
real far.

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Large disk drives

2014-11-06 Thread Boaz Harrosh
On 11/06/2014 12:30 PM, James Bottomley wrote:
> On Wed, 2014-11-05 at 11:30 -0800, Christoph Hellwig wrote:
>> On Wed, Nov 05, 2014 at 11:34:11AM -0500, Alan Stern wrote:
 Sorry, meant to.  In principle I'm OK with this as the lever for the
 hack (largely because it means we don't need to do anything) but will
 the distributions support it?
>>>
>>> While I can't speak for the distributions, it's reasonable to assume 
>>> that if something becomes part of the upstream kernel then they will 
>>> include it.
>>
>> Btw, we do have precedence for looking at partition tables from SCSI
>> code with scsi_partsize(), so the layering violation of looking at
>> EFI labels for disks sizes wouldn't be something entirely new even
>> if we did it in kernel space.
> 
> We really don't want to make the decision within the kernel of whether
> we believe the partition size or the disk capacity.   For these disk
> problems we need it to be the former, but if we choose that always,
> we'll get weird results on mispartitioned devices.
> 
> The usual rule is no policy in the kernel and which to choose is policy,
> so just export the knob (as Alan's patch does) and then let userspace
> decide.
> 

I do not think it is a matter of policy.

>From what I understand the situation is that read_capacity_10() which is
32bit, Max 2T byte disks. fails with 0x and read_capacity_16() (64bit)
is not supported because of wrong scsi-version or because it is blacklisted.
(or device returns not-supported)

Now If sd_read_capacity() succeed then off-course we choose it. What I'm saying
is if read-capacity fails, then should we attempt a new read_capacity_part()?

And sure a user-mode knob if he wants to make policy, like you said, is fine
by me.

But just the simple case of read-capacity failure should we then?

> James

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Large disk drives

2014-11-05 Thread Boaz Harrosh
On 11/05/2014 06:34 PM, Alan Stern wrote:
<>
> 
> It's simpler than that: The drive is attached directly to the computer
> (i.e., via SATA rather than USB) when the partition table is created.  
> With no USB-SATA bridge chip to mess things up, there's no problem
> determining the correct capacity.
> 

Right!

I think it should be very simple to, just as we send a
READ_CAPACITY16 / READ_CAPACITY32 / READ_CAPACITY64 or is
it a GET_CODE_PAGE ? whatever we send today we can also
have READ_CAPACITY_PART() which will send a READ of
sector 0 the raw PC_COMMAND way and run it through some
partition analyzer. We only need to support gpt or msdos
partitions I think in this case.

Then if READ_CAPACITY16 returns all (s) and
READ_CAPACITY32 is not supported and/or blacklisted,
then yes attempt a READ_CAPACITY_PART() which should
be sam-2 compatible.

It should not take longer than a weekend afternoon over
cup of Machha. If any one sends a bad device my way, I'll
do it. But anyone should be able to code something so simple.

> Alan Stern
> 

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] osd: Boaz Harrosh - change of email

2014-10-22 Thread Boaz Harrosh
On 10/22/2014 12:04 AM, James Bottomley wrote:
> On Tue, 2014-10-21 at 17:05 +0300, Boaz Harrosh wrote:
>> Hi Sir Linus
>>
>> A small administrative stuff, was on vacation and the old email bounced on 
>> me.
>> I was hoping to still make the 3 weeks merge window, but it was 2 weeks at 
>> the
>> end.
>> Your call if to make this wait for next window. Needless to say that it is
>> ZERO risk, just change of email.
>>
>> Based on commit [bfe01a5b] Linux 3.17
>>
>> 3 patches available in the git repository at:
>>
>>   git://git.open-osd.org/linux-open-osd.git for-linus
>>
>> for you to fetch changes up to 1fa3a002b2546c42c343c77c144871285896ced5:
>>
>>   Boaz Harrosh - fix email in Documentation (2014-10-19 20:36:36 +0300)
>>
>> ----
>> Boaz Harrosh (3):
>>   MAINTAINERS: Change Boaz Harrosh's email
>>   Boaz Harrosh - Fix broken email address
>>   Boaz Harrosh - fix email in Documentation
>>
>>  Documentation/scsi/osd.txt  | 3 +--
>>  MAINTAINERS | 2 +-
>>  drivers/scsi/osd/Kbuild | 2 +-
>>  drivers/scsi/osd/Kconfig| 2 +-
>>  drivers/scsi/osd/osd_debug.h| 2 +-
>>  drivers/scsi/osd/osd_initiator.c| 4 ++--
>>  drivers/scsi/osd/osd_uld.c  | 4 ++--
>>  fs/exofs/Kbuild | 2 +-
>>  fs/exofs/common.h   | 2 +-
>>  fs/exofs/dir.c  | 2 +-
>>  fs/exofs/exofs.h| 2 +-
>>  fs/exofs/file.c | 2 +-
>>  fs/exofs/inode.c| 2 +-
>>  fs/exofs/namei.c| 2 +-
>>  fs/exofs/ore.c  | 4 ++--
>>  fs/exofs/ore_raid.c | 2 +-
>>  fs/exofs/ore_raid.h | 2 +-
>>  fs/exofs/super.c| 2 +-
>>  fs/exofs/symlink.c  | 2 +-
>>  fs/exofs/sys.c  | 2 +-
>>  fs/nfs/objlayout/objio_osd.c| 2 +-
>>  fs/nfs/objlayout/objlayout.c| 2 +-
>>  fs/nfs/objlayout/objlayout.h| 2 +-
>>  fs/nfs/objlayout/pnfs_osd_xdr_cli.c | 2 +-
>>  include/linux/pnfs_osd_xdr.h| 2 +-
>>  include/scsi/osd_initiator.h| 2 +-
>>  include/scsi/osd_ore.h  | 2 +-
>>  include/scsi/osd_protocol.h | 4 ++--
>>  include/scsi/osd_sec.h  | 2 +-
>>  include/scsi/osd_sense.h| 2 +-
>>  include/scsi/osd_types.h| 2 +-
>>  31 files changed, 35 insertions(+), 36 deletions(-)
> 
> This is a bit of an unnecessary massive churn.  No one expects the
> author named in the file to stay up to date, especially because the
> @domain.com usually credits the company who paid for the code, so it's
> left in as a kind of mark of respect for them.  I'm not saying it
> applies in your case, just that it creates the common expectation of
> in-file authors needing to be traced through the MAINTAINERS file.
> 
> Could you not just update the MAINTAINERS file only, like everyone else?

OK, I did not know this. The Panasas credit is still there so no
harm done. Anyway Linus pulled it, so lets leave it at that for now.

Next time I'll know.

> James

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] osd: Boaz Harrosh - change of email

2014-10-21 Thread Boaz Harrosh
Hi Sir Linus

A small administrative stuff, was on vacation and the old email bounced on me.
I was hoping to still make the 3 weeks merge window, but it was 2 weeks at the
end.
Your call if to make this wait for next window. Needless to say that it is
ZERO risk, just change of email.

Based on commit [bfe01a5b] Linux 3.17

3 patches available in the git repository at:

  git://git.open-osd.org/linux-open-osd.git for-linus

for you to fetch changes up to 1fa3a002b2546c42c343c77c144871285896ced5:

  Boaz Harrosh - fix email in Documentation (2014-10-19 20:36:36 +0300)


Boaz Harrosh (3):
  MAINTAINERS: Change Boaz Harrosh's email
  Boaz Harrosh - Fix broken email address
  Boaz Harrosh - fix email in Documentation

 Documentation/scsi/osd.txt  | 3 +--
 MAINTAINERS | 2 +-
 drivers/scsi/osd/Kbuild | 2 +-
 drivers/scsi/osd/Kconfig| 2 +-
 drivers/scsi/osd/osd_debug.h| 2 +-
 drivers/scsi/osd/osd_initiator.c| 4 ++--
 drivers/scsi/osd/osd_uld.c  | 4 ++--
 fs/exofs/Kbuild | 2 +-
 fs/exofs/common.h   | 2 +-
 fs/exofs/dir.c  | 2 +-
 fs/exofs/exofs.h| 2 +-
 fs/exofs/file.c | 2 +-
 fs/exofs/inode.c| 2 +-
 fs/exofs/namei.c| 2 +-
 fs/exofs/ore.c  | 4 ++--
 fs/exofs/ore_raid.c | 2 +-
 fs/exofs/ore_raid.h | 2 +-
 fs/exofs/super.c| 2 +-
 fs/exofs/symlink.c  | 2 +-
 fs/exofs/sys.c  | 2 +-
 fs/nfs/objlayout/objio_osd.c| 2 +-
 fs/nfs/objlayout/objlayout.c| 2 +-
 fs/nfs/objlayout/objlayout.h| 2 +-
 fs/nfs/objlayout/pnfs_osd_xdr_cli.c | 2 +-
 include/linux/pnfs_osd_xdr.h| 2 +-
 include/scsi/osd_initiator.h| 2 +-
 include/scsi/osd_ore.h  | 2 +-
 include/scsi/osd_protocol.h | 4 ++--
 include/scsi/osd_sec.h  | 2 +-
 include/scsi/osd_sense.h| 2 +-
 include/scsi/osd_types.h| 2 +-
 31 files changed, 35 insertions(+), 36 deletions(-)

Thanks, have a good Kernel Cycle
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] Boaz Harrosh - fix email in Documentation

2014-10-19 Thread Boaz Harrosh
From: Boaz Harrosh 

I forgot to update Documentation/*.txt

Signed-off-by: Boaz Harrosh 
---
 Documentation/scsi/osd.txt | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/Documentation/scsi/osd.txt b/Documentation/scsi/osd.txt
index da162f7..5a9879b 100644
--- a/Documentation/scsi/osd.txt
+++ b/Documentation/scsi/osd.txt
@@ -184,8 +184,7 @@ Any problems, questions, bug reports, lonely OSD nights, 
please email:
 More up-to-date information can be found on:
 http://open-osd.org
 
-Boaz Harrosh 
-Benny Halevy 
+Boaz Harrosh 
 
 References
 ==
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] Boaz Harrosh - Fix broken email address

2014-10-19 Thread Boaz Harrosh
From: Boaz Harrosh 

I no longer have access to the Panasas email.
So change to an email that can always reach me.

Signed-off-by: Boaz Harrosh 
---
 drivers/scsi/osd/Kbuild | 2 +-
 drivers/scsi/osd/Kconfig| 2 +-
 drivers/scsi/osd/osd_debug.h| 2 +-
 drivers/scsi/osd/osd_initiator.c| 4 ++--
 drivers/scsi/osd/osd_uld.c  | 4 ++--
 fs/exofs/Kbuild | 2 +-
 fs/exofs/common.h   | 2 +-
 fs/exofs/dir.c  | 2 +-
 fs/exofs/exofs.h| 2 +-
 fs/exofs/file.c | 2 +-
 fs/exofs/inode.c| 2 +-
 fs/exofs/namei.c| 2 +-
 fs/exofs/ore.c  | 4 ++--
 fs/exofs/ore_raid.c | 2 +-
 fs/exofs/ore_raid.h | 2 +-
 fs/exofs/super.c| 2 +-
 fs/exofs/symlink.c  | 2 +-
 fs/exofs/sys.c  | 2 +-
 fs/nfs/objlayout/objio_osd.c| 2 +-
 fs/nfs/objlayout/objlayout.c| 2 +-
 fs/nfs/objlayout/objlayout.h| 2 +-
 fs/nfs/objlayout/pnfs_osd_xdr_cli.c | 2 +-
 include/linux/pnfs_osd_xdr.h| 2 +-
 include/scsi/osd_initiator.h| 2 +-
 include/scsi/osd_ore.h  | 2 +-
 include/scsi/osd_protocol.h | 4 ++--
 include/scsi/osd_sec.h  | 2 +-
 include/scsi/osd_sense.h| 2 +-
 include/scsi/osd_types.h| 2 +-
 29 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/drivers/scsi/osd/Kbuild b/drivers/scsi/osd/Kbuild
index 5fd73d77..58cecd4 100644
--- a/drivers/scsi/osd/Kbuild
+++ b/drivers/scsi/osd/Kbuild
@@ -4,7 +4,7 @@
 # Copyright (C) 2008 Panasas Inc.  All rights reserved.
 #
 # Authors:
-#   Boaz Harrosh 
+#   Boaz Harrosh 
 #   Benny Halevy 
 #
 # This program is free software; you can redistribute it and/or modify
diff --git a/drivers/scsi/osd/Kconfig b/drivers/scsi/osd/Kconfig
index a070351..347cc5e 100644
--- a/drivers/scsi/osd/Kconfig
+++ b/drivers/scsi/osd/Kconfig
@@ -4,7 +4,7 @@
 # Copyright (C) 2008 Panasas Inc.  All rights reserved.
 #
 # Authors:
-#   Boaz Harrosh 
+#   Boaz Harrosh 
 #   Benny Halevy 
 #
 # This program is free software; you can redistribute it and/or modify
diff --git a/drivers/scsi/osd/osd_debug.h b/drivers/scsi/osd/osd_debug.h
index 579e491..2634126 100644
--- a/drivers/scsi/osd/osd_debug.h
+++ b/drivers/scsi/osd/osd_debug.h
@@ -4,7 +4,7 @@
  * Copyright (C) 2008 Panasas Inc.  All rights reserved.
  *
  * Authors:
- *   Boaz Harrosh 
+ *   Boaz Harrosh 
  *   Benny Halevy 
  *
  * This program is free software; you can redistribute it and/or modify
diff --git a/drivers/scsi/osd/osd_initiator.c b/drivers/scsi/osd/osd_initiator.c
index 5f4cbf0..52e243b 100644
--- a/drivers/scsi/osd/osd_initiator.c
+++ b/drivers/scsi/osd/osd_initiator.c
@@ -7,7 +7,7 @@
  * Copyright (C) 2008 Panasas Inc.  All rights reserved.
  *
  * Authors:
- *   Boaz Harrosh 
+ *   Boaz Harrosh 
  *   Benny Halevy 
  *
  * This program is free software; you can redistribute it and/or modify
@@ -57,7 +57,7 @@
 
 enum { OSD_REQ_RETRIES = 1 };
 
-MODULE_AUTHOR("Boaz Harrosh ");
+MODULE_AUTHOR("Boaz Harrosh ");
 MODULE_DESCRIPTION("open-osd initiator library libosd.ko");
 MODULE_LICENSE("GPL");
 
diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
index e1d9a4c..92cdd4b 100644
--- a/drivers/scsi/osd/osd_uld.c
+++ b/drivers/scsi/osd/osd_uld.c
@@ -10,7 +10,7 @@
  * Copyright (C) 2008 Panasas Inc.  All rights reserved.
  *
  * Authors:
- *   Boaz Harrosh 
+ *   Boaz Harrosh 
  *   Benny Halevy 
  *
  * This program is free software; you can redistribute it and/or modify
@@ -74,7 +74,7 @@
 static const char osd_name[] = "osd";
 static const char *osd_version_string = "open-osd 0.2.1";
 
-MODULE_AUTHOR("Boaz Harrosh ");
+MODULE_AUTHOR("Boaz Harrosh ");
 MODULE_DESCRIPTION("open-osd Upper-Layer-Driver osd.ko");
 MODULE_LICENSE("GPL");
 MODULE_ALIAS_CHARDEV_MAJOR(SCSI_OSD_MAJOR);
diff --git a/fs/exofs/Kbuild b/fs/exofs/Kbuild
index 389ba83..b47c7b8 100644
--- a/fs/exofs/Kbuild
+++ b/fs/exofs/Kbuild
@@ -4,7 +4,7 @@
 # Copyright (C) 2008 Panasas Inc.  All rights reserved.
 #
 # Authors:
-#   Boaz Harrosh 
+#   Boaz Harrosh 
 #
 # This program is free software; you can redistribute it and/or modify
 # it under the terms of the GNU General Public License version 2
diff --git a/fs/exofs/common.h b/fs/exofs/common.h
index 3bbd469..7d88ef5 100644
--- a/fs/exofs/common.h
+++ b/fs/exofs/common.h
@@ -4,7 +4,7 @@
  * Copyright (C) 2005, 2006
  * Avishay Traeger (avis...@gmail.com)
  * Copyright (C) 2008, 2009
- * Boaz Harrosh 
+ * Boaz Harrosh 
  *
  * Copyrights for code taken from ext2:
  * Copyright (C) 1992, 1993, 1994, 1995
diff --git a/fs/exofs/dir.c b/fs/exofs/dir.c
index 49f51ab..d7defd5 100644
--- a/fs/exofs/dir.c
+++ b/fs/exofs/dir.c
@@ -2,7 +2,7 @@
  * Copyright (C) 2005, 2006
  *

[PATCH 1/2] MAINTAINERS: Change Boaz Harrosh's email

2014-10-19 Thread Boaz Harrosh
From: Boaz Harrosh 

I have moved on, and do no longer have Panasas email access.
Update to an email that can reach me.

So change bharr...@panasas.com => o...@electrozaur.com

Explain of email address:
* electrozaur.com is a domain owned by me.
* ooo - Stands for Open Osd . Org

Another email alias that can be used is:
open...@gmail.com

CC: Greg KH 
Signed-off-by: Boaz Harrosh 
---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index f10ed39..fc2c9a8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6725,7 +6725,7 @@ S:Orphan
 F: drivers/net/wireless/orinoco/
 
 OSD LIBRARY and FILESYSTEM
-M:     Boaz Harrosh 
+M:     Boaz Harrosh 
 M: Benny Halevy 
 L: osd-...@open-osd.org
 W: http://open-osd.org
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] libiscsi, iser: Adjust data_length to include protection information

2014-08-14 Thread Boaz Harrosh
On 08/13/2014 04:09 PM, Sagi Grimberg wrote:
> On 8/6/2014 4:25 PM, Boaz Harrosh wrote:
> 
> Hey Boaz,
> 
> So in the current flow, I still don't think it is wrong/buggy, the
> transfer byte length related to scsi buffer length 
(In iscsi for sure 
> but I think that for all transports it is the same).
> 
> But,
> Since you have such a strong objection on this I'm also OK with changing
> stuff... We can pass a buffer to scsi_transfer_length (although it has 
> no meaning effectively) and we can keep in/out_len local variables and
> print them along with total transfer.
> 
> Do you want me to send out a patch here (since I have some hardware to 
> test it...) or are you still planning to send out something?
> 

I'll do it. As you said there is no bugs, just ugly code. I will send a
cleanup

Thanks
Boaz

> Cheers,
> Sagi.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] libiscsi, iser: Adjust data_length to include protection information

2014-08-06 Thread Boaz Harrosh
On 08/06/2014 03:43 PM, Sagi Grimberg wrote:
> Hi Boaz,
> 
<>
>>
>> I hate that you introduced this new transfer_length variable. It does
>> not exist. In BIDI supporting driver there is out_len and in_len just
>> as original code.
> 
> Effectively, out_len and in_len are the same except for the bidi case.
> Moreover, the hdr->data_length is exactly the command scsi data buffer
> length, the bidi length is taken from scsi_in when we build the AHS for
> bidi (rlen_ahdr->read_length).
> 
> So to me it is correct and makes sense. But I'm don't feel so strong
> about it...
> 
> Mike what's your take on this?
> 

I have a patch to clean all that, will send tomorrow. 

What I mean is that this is on the out-path only the in path is different.
See the code this variable is only used in the if (== DMA_TO_DEVICE) case and
should be local to that scope. This is my clean up

<>
>> this particular driver puts them together in the same payload. There can be
>> other DIFF supporting drivers that put the DIFF payload on another stream / 
>> another
>> SG list, like the sata one, right?
> 
> I think that DIF specification says that on the wire the data and
> protection are interleaved i.e. Block1, DIF1, Block2, DIF2...
> 

No it does not. This is a per transport, and actually per device host
driver. Yes in iSCSI_tcp they are inline in HW cards they might come as
two different SGs (Like the Linux model). Even with iscsi-offload they might
want to be two SG-lists.

> So I do think that the transfer length should always include
> data_length + protection_length.
> 

This is at the iscsi level. But the scsi_transfer_length() is on the scsi
level which keeps them separate. So I think the proper API should be
scsi_proto_length()

And for LLDs that want them together they can do scsi_bufflen() + 
scsi_proto_length()
and for other drivers they can do it separately. Don't infect iscsi level 
assumptions
on the generic layer API.

Again my patch fixes this.

>> And this
>>
>> This print is correct as it covers all cases. If you want to print the 
>> adjusted
>> length then OK, you'd need to store this I guess, but store it as a different
>> variable like proto_length and print it as an additional variable.
> 
> But it is the transfer length that is sent in iSCSI header. Why do you
> want to print it as additional info? 

I want to see what was the length the app/FS sent, then as added
info how much was added for DIFF, your way there is lost information.

> for transactions that include DIF
> the length is the data + protection.
> 
> It is still one-to-one isn't it?
> 

No! the original submitted length is lost from the print

> Sagi.
> 

Shalom
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V5] Save command pool address of Scsi_Host

2014-08-04 Thread Boaz Harrosh
On 08/04/2014 02:30 PM, jgr...@suse.com wrote:
> From: Juergen Gross 
> 
> If a scsi host driver specifies .cmd_len in it's scsi_host_template, a 
> driver's
> private command pool is needed. scsi_find_host_cmd_pool() will locate it, but
> scsi_alloc_host_cmd_pool() isn't saving the pool address in the host template.
> 
> This will result in an access error when the host is removed.
> 
> Avoid the problem by saving the address of a new allocated command pool where
> it is expected.
> 
> Signed-off-by: Juergen Gross 
> ---
>  drivers/scsi/scsi.c | 12 ++--
>  1 file changed, 10 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
> index 88d46fe..b0cef5b 100644
> --- a/drivers/scsi/scsi.c
> +++ b/drivers/scsi/scsi.c
> @@ -380,6 +380,10 @@ scsi_alloc_host_cmd_pool(struct Scsi_Host *shost)
>   pool->slab_flags |= SLAB_CACHE_DMA;
>   pool->gfp_mask = __GFP_DMA;
>   }
> +
> + if (hostt->cmd_size)
> + hostt->cmd_pool = pool;
> +
>   return pool;
>  }
>  
> @@ -424,8 +428,10 @@ out:
>  out_free_slab:
>   kmem_cache_destroy(pool->cmd_slab);
>  out_free_pool:
> - if (hostt->cmd_size)
> + if (hostt->cmd_size) {
>   scsi_free_host_cmd_pool(pool);

I disagree I liked the V4 version better. It is much nicer on the review
that we NULL the same one we pass in with no need for context that's outside
of this scope

Just my $0.017
Boaz

> + hostt->cmd_pool = NULL;
> + }
>   goto out;
>  }
>  
> @@ -447,8 +453,10 @@ static void scsi_put_host_cmd_pool(struct Scsi_Host 
> *shost)
>   if (!--pool->users) {
>   kmem_cache_destroy(pool->cmd_slab);
>   kmem_cache_destroy(pool->sense_slab);
> - if (hostt->cmd_size)
> + if (hostt->cmd_size) {
>   scsi_free_host_cmd_pool(pool);
> + hostt->cmd_pool = NULL;
> + }
>   }
>   mutex_unlock(&host_cmd_pool_mutex);
>  }
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] brd: Fix the partitions BUG

2014-07-30 Thread Boaz Harrosh
On 07/30/2014 07:50 PM, Ross Zwisler wrote:
<>
>> + */
>> +printk(KERN_EROR "brd: brd_find unexpected device %d\n", i);
> 
> s/KERN_EROR/KERN_ERR/
> 

Yes thanks, sigh, code should compile

driver error. I used pr_err but last inspection I saw that printk is used
everywhere and, crapped ...

>> +return NULL;
>>  }
>>  
>>  static void brd_del_one(struct brd_device *brd)
>> @@ -554,11 +552,10 @@ static struct kobject *brd_probe(dev_t dev, int *part, 
>> void *data)
>>  struct kobject *kobj;
>>  
>>  mutex_lock(&brd_devices_mutex);
>> -brd = brd_init_one(MINOR(dev) >> part_shift);
>> +brd = brd_find(MINOR(dev) >> part_shift);
>>  kobj = brd ? get_disk(brd->brd_disk) : NULL;
>>  mutex_unlock(&brd_devices_mutex);
>>  
>> -*part = 0;
>>  return kobj;
>>  }
> 
> It is possible to create new block devices with BRD at runtime:
> 
>   # mknod /dev/new_brd b 1 4 
>   # fdisk -l /dev/new_brd
> 
> This causes a new BRD disk to be created, and hits your error case:
> 
>   Jul 30 10:40:57 alara kernel: brd: brd_find unexpected device 4
>   

Ha OK, So this is the mystery key. I never knew that trick, OK
I guess I need to leave this in

> I guess in general I'm not saying that BRD needs to have partitions - indeed
> it may not give you much in the way of added functionality.  As the code
> currently stands partitions aren't surfaced anyway
> (GENHD_FL_SUPPRESS_PARTITION_INFO is set).  For PRD, however, I *do* want to
> enable partitions correctly because eventually I'd like to enhance PRD so that
> it *does* actually handle NVDIMMs correctly, and for that partitions do make
> sense.  

So lets talk about that for a bit. Why would you want legacy partitions for
NVDIMMs. For one fdisk will waste 1M of memory on this. And with NVDIMMs you
actually need a different manager all together, more like lvm stuff.

I have patches here that changes prd a lot.
- Global memory parameters are gone each device remaps and operates on it's own 
memory range.
- the prd_params are gone instead you have one memmap= parameter of the form
  memmap=nnn$ooo,[nnn$ooo]...
  where:
nnn - is size in bytes of the form number[K/M/G]
ooo - is offset in bytes of the form number[K/M/G]

  Very much the same as the memmap= Kernel parameters. So just copy/paste what
  you did there and it will work.

  Now each memory range has one prd_device created for it. Note that if
  specified ranges are just contiguous then it is like your prd_count thing 
where
  you took a contiguous range and sliced it up, only here they can be of
  different size.
  Off course it has an added fixture of dis-contiguous ranges like we have
  in our multy-nodes NUMA system in the lab that host DDR3 NvDIMMs in two
  banks.

- A sysfs interface is added to add/remove memory range on the fly like
  osdblk

- If no parameters are specified at all, the Kernel command-line is parsed
  and all memmap= sections are attempted and used if are not already claimed.

An interface like mknod above is not supported since it is actually pointless.

My current code still supports partitions but I still think it is silly.

[
 This is all speculative for DDR3-NVDIMMs, we have seen that the memory 
controller
 actually tags these DIMMs with type 12 but it looks like this is all vendor
 specific right now and I understand that DDR4 standardize all that. So I was
 hoping you guys are working on all that with the NvDIMM stuff.

 Now lets say that I have established auto-detection for each DIMM and have
 extracted its SN (Our DDR3 DIMMs each have an SN that can be displayed by the
 vendor supplied tool)

 An NvDIMM manager will need to establish an NvDIMM table and set order.
 The motivation of a partition table is that a disk moved to another machine
 and/or booted into another OS will have persistent and STD way to not
 clobber over established DATA. But here we have like a disk cluster/raid,
 an ordered set of drives.

 Unique to NvDIMMs is the interleaving. If pairs are then inserted in wrong
 order or are re-mixed. This should be detected and refused to be mounted
 (unless a re-initialize is forced) So data is not silently lost on operator
 errors.
 
 All this is then persisted on some DIMM, first or last. If code is able to
 re-arrange order, like when physical address have changed it will, but in the
 wrong interleaving case it will refuse to mount.
]

> And if I have to implement and debug partitions for PRD, it's easy to
> stick them in BRD in case anyone wants to use them.
> 

I was finally playing with BRD, and it is missing your getgeo patch, so
it is currently completely alien to fdisk and partitions.

So why, why keep a fixture that never existed, I still hope to convince you
that this is crap. Specially with brd that can have devices added on the fly
like you showed above. Who needs them at all? Why waist 1M of memory
on each device for no extra gain?

Specially in light of my new prd that does away of any needs of 

Re: [PATCH 2/2] brd: Fix brd_direct_access with partitions

2014-07-30 Thread Boaz Harrosh
On 07/30/2014 06:34 PM, Matthew Wilcox wrote:
> On Wed, Jul 30, 2014 at 05:18:47PM +0300, Boaz Harrosh wrote:
>> When brd_direct_access() is called on a partition-bdev
>> it would access the wrong sector. And caller would then
>> corrupt the device's data.
>>
>> This is a preliminary fix, Matthew Wilcox has a patch
>> in his DAX patchset which will define a global wrapper
>> to bdev->bd_disk->fops->direct_access that will do the
>> proper checks and translations before calling a driver
>> global member. (The way it is done at the rest of the
>> block stack)
> 
> Uh, no, let's just focus on getting the DAX code in instead of putting
> in interim patches that will conflict.  Patch 4/22 is uncontroversial,
> fixes this problem, has no dependencies, and is key to the rest of the DAX
> patchset.  If this problem wants to be fixed, then put 4/22 in instead.
> 

OK, I agree, could you please push  4/22 as reply to here for
Jens to take for 3.17 ? (together with my 1/2)

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] brd: Fix brd_direct_access with partitions

2014-07-30 Thread Boaz Harrosh

When brd_direct_access() is called on a partition-bdev
it would access the wrong sector. And caller would then
corrupt the device's data.

This is a preliminary fix, Matthew Wilcox has a patch
in his DAX patchset which will define a global wrapper
to bdev->bd_disk->fops->direct_access that will do the
proper checks and translations before calling a driver
global member. (The way it is done at the rest of the
block stack)

CC: Matthew Wilcox 
Signed-off-by: Boaz Harrosh 
---
 drivers/block/brd.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index 92334f6..7506864 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -378,9 +378,10 @@ static int brd_direct_access(struct block_device *bdev, 
sector_t sector,
 
if (!brd)
return -ENODEV;
+   sector += get_start_sect(bdev);
if (sector & (PAGE_SECTORS-1))
return -EINVAL;
-   if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk))
+   if (unlikely(sector + PAGE_SECTORS > part_nr_sects_read(bdev->bd_part)))
return -ERANGE;
page = brd_insert_page(brd, sector);
if (!page)
-- 
1.9.3


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] brd: Fix the partitions BUG

2014-07-30 Thread Boaz Harrosh

With current code after a call to:
bdev = blkdev_get_by_path(dev_name, mode, fs_type);
size = i_size_read(bdev->bd_inode);
part_size  = bdev->bd_part->nr_sects << 9;

I get the following bad results:
dev_name == /dev/ram0
  foo: [foo_mount:880] size=0x4000 bdev=88003dc24340 \
bd_inode=88003dc24430 bd_part=88003ca22848 part_size=0x4000
dev_name == /dev/ram0p1
  foo: [foo_mount:880] size=0x4000 bdev=88003d2f6d80 \
bd_inode=88003d2f6e70 bd_part=88003ca22848 part_size=0x4000
dev_name == /dev/ram0p2
  foo: [foo_mount:880] size=0x4000 bdev=88003dc24680 \
bd_inode=88003dc24770 bd_part=88003ca22848 part_size=0x4000
Note how all three bdev(s) point to the same bd_part.

This is do to a single bad clubber in brd_probe() which is
removed in this patch:
-   *part = 0;

because of this all 3 bdev(s) above get to point to the same bd_part[0]

While at it fix/rename brd_init_one() since all devices are created on
load of driver, brd_probe() will never be called with a new un-created
device.
brd_init_one() is now renamed to brd_find() which is what it does.

TODO: There is one more partitions BUG regarding
  brd_direct_access() which is fixed on the next patch.

Signed-off-by: Boaz Harrosh 
---
 drivers/block/brd.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index c7d138e..92334f6 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -523,22 +523,20 @@ static void brd_free(struct brd_device *brd)
kfree(brd);
 }
 
-static struct brd_device *brd_init_one(int i)
+static struct brd_device *brd_find(int i)
 {
struct brd_device *brd;
 
list_for_each_entry(brd, &brd_devices, brd_list) {
if (brd->brd_number == i)
-   goto out;
+   return brd;
}
 
-   brd = brd_alloc(i);
-   if (brd) {
-   add_disk(brd->brd_disk);
-   list_add_tail(&brd->brd_list, &brd_devices);
-   }
-out:
-   return brd;
+   /* brd always allocates all its devices at load time, therefor
+* brd_probe will never be called with a new brd_number
+*/
+   printk(KERN_EROR "brd: brd_find unexpected device %d\n", i);
+   return NULL;
 }
 
 static void brd_del_one(struct brd_device *brd)
@@ -554,11 +552,10 @@ static struct kobject *brd_probe(dev_t dev, int *part, 
void *data)
struct kobject *kobj;
 
mutex_lock(&brd_devices_mutex);
-   brd = brd_init_one(MINOR(dev) >> part_shift);
+   brd = brd_find(MINOR(dev) >> part_shift);
kobj = brd ? get_disk(brd->brd_disk) : NULL;
mutex_unlock(&brd_devices_mutex);
 
-   *part = 0;
return kobj;
 }
 
-- 
1.9.3


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] brd.ko Never supported partitions should we remove the dead code ?

2014-07-30 Thread Boaz Harrosh
On 07/29/2014 08:19 PM, Ross Zwisler wrote:
> On Tue, 2014-07-29 at 19:37 +0300, Boaz Harrosh wrote:
>> Hi folks
>>
>> I've been playing with yet unseen prd block device, and hit an issue with 
>> partitioning
>> but since prd.c is a copy/paste of brd.c the same exact issue exists with 
>> brd.
>>
>> It does not support partitions!
>>
>> An attempt to fdisk say a couple of partitions, then mkfs && mount an 
>> individual
>> partition will result in the primary device being accessed and a corrupted 
>> disk data.
>>
>> If I enable max_part(=2) properly on modprobe, then fdisk a couple of 
>> partitions all
>> goes well
>>
>> But then if I pass the device to Kernel (Say via mount), after a call to
>>  bdev = blkdev_get_by_path(dev_name, mode, fs_type);
>>
>> I get the following results: 
>> (size is extracted using:i_size_read(bdev->bd_inode);
>>  part_size is extracted using:   bdev->bd_part->nr_sects << 9;
>> )
>>
>> dev_name == /dev/ram0
>> [Jul29 17:40] foo: [foo_mount:880] size=0x4000 bdev=88003dc24340 
>> bd_inode=88003dc24430 bd_part=88003ca22848 part_size=0x4000
>>
>> dev_name == /dev/ram0p1
>> [ +16.816065] foo: [foo_mount:880] size=0x4000 bdev=88003d2f6d80 
>> bd_inode=88003d2f6e70 bd_part=88003ca22848 part_size=0x4000
>>
>> dev_name == /dev/ram0p2
>> [Jul29 17:41] foo: [foo_mount:880] size=0x4000 bdev=88003dc24680 
>> bd_inode=88003dc24770 bd_part=88003ca22848 part_size=0x4000
>>
>> We can see that all 3 different bdev's point to the same bd_part, which is 
>> the BUG. Funny is that bd_inode is different but contains
>> same wrong size, for exactly the same reason.
> 
> Yep, I've seen this as well.  It turns out that partitions *do* work in brd if
> you don't specify the 'rd_nr' module parameter, but if you do specify the
> module parameter you get the partition overlapping behavior that you observed.
> 
> AFAICT the difference between these two scenarios is that when you set rd_nr,
> the 'range' variable is set so something small, but when you don't set it
> 'range' is 1,048,576.  That's done by this bit of code in brd_init:
> 
>   if (rd_nr) {
>   nr = rd_nr;
>   range = rd_nr << part_shift;
>   } else {
>   nr = CONFIG_BLK_DEV_RAM_COUNT;
>   range = 1UL << MINORBITS;
>   }
> 
> The difference in behavior happens up the stack somewhere as a result of you
> passing 'range' into blk_register_region().
> 
> Anyway, the quick and dirty workaround is to always have 'range' be something
> large.  This is what I've done in the current master branch of the upstream
> PRD tree - see commit 0256739ccab03d1662221f595b03797370c2ac12.
> 
> For what it's worth, I'm planning on updating prd (and probably brd) so that
> it dynamically allocates partition numbers.  See commit
> 469071a37afc8a627b6b2ddf29db0a097d864845 for how this was done in the nvme
> driver.
> 

Hi Ross

I've looked at 469071a37afc8a627b6b2ddf29db0a097d864845, but surly as you said
it is not the fix. I have posted the fix, in the code you snipped out.

So I get it that you guys are not convinced that we need to remove support for
Partitions for brd? For me a partition for /dev/ramX is totally bogus. What's
the point at all? Since we have the same effect if we just do rd_nr and
partition memory this way, and save 1M of memory. And it is not as if you can
loose power and want the partitioning persistent for the next boot. But sigh I
will not fight you about it, I made my point of this being pointless and that is
that.

I will post a proper patch for brd on top of Jens's tree unless you want it
rebased over some other tree, I guess it can just be queued for 3.17.

> - Ross

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFD] brd.ko Never supported partitions should we remove the dead code ?

2014-07-29 Thread Boaz Harrosh
On 07/29/2014 07:56 PM, Matthew Wilcox wrote:
> On Tue, Jul 29, 2014 at 07:37:49PM +0300, Boaz Harrosh wrote:
>> But before we are running to fix this bug. Could we please do better and 
>> just remove the support for partitions
>> all together.
>>  Since it *never* worked anyway, so probably no one needs it! Surly no 
>> one used it
> 
> I fixed this in patch 4/22 of the v8 series.  The correct place to
> handle partitioning is in the block layer, not the individual drivers
> (nor the filesystems).
> 

Sir Mathew hi

> @@ -378,10 +378,12 @@ static int brd_direct_access(struct block_device *bdev, 
> sector_t sector,
>  
>   if (!brd)
>   return -ENODEV;
> + sector += get_start_sect(bdev);
>   if (sector & (PAGE_SECTORS-1))
>   return -EINVAL;
> +/* Check is wrong here we need to check against bdev->bd_part->nr_sects */
>   if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk))
>   return -ERANGE;
>   page = brd_insert_page(brd, sector);
>   if (!page)
>   return -ENOSPC;

you mean you fixed the brd_direct_access() hunk by pointing all users to a 
wrapper
that does the proper offsetting, sure.

But that is not the main bug I was talking about, the main BUG is that 
partitions are
not supported at all because of the clubber of *part in brd_probe()

> @@ -558,7 +564,8 @@ static struct kobject *brd_probe(dev_t dev, int *part, 
> void *data)
>   kobj = brd ? get_disk(brd->brd_disk) : NULL;
>   mutex_unlock(&brd_devices_mutex);
>  
> - *part = 0;
> +//   Fix the partition BUG *part comes in correctly must not need to touch it
> +//   *part = 0;
>   return kobj;
>  }

And my point is that: No one uses partitions with brd, why should we not remove 
it
all together, why fix it and keep it?

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFD] brd.ko Never supported partitions should we remove the dead code ?

2014-07-29 Thread Boaz Harrosh
Hi folks

I've been playing with yet unseen prd block device, and hit an issue with 
partitioning
but since prd.c is a copy/paste of brd.c the same exact issue exists with brd.

It does not support partitions!

An attempt to fdisk say a couple of partitions, then mkfs && mount an individual
partition will result in the primary device being accessed and a corrupted disk 
data.

If I enable max_part(=2) properly on modprobe, then fdisk a couple of 
partitions all
goes well

But then if I pass the device to Kernel (Say via mount), after a call to
bdev = blkdev_get_by_path(dev_name, mode, fs_type);

I get the following results: 
(size is extracted using:   i_size_read(bdev->bd_inode);
 part_size is extracted using:  bdev->bd_part->nr_sects << 9;
)

dev_name == /dev/ram0
[Jul29 17:40] foo: [foo_mount:880] size=0x4000 bdev=88003dc24340 
bd_inode=88003dc24430 bd_part=88003ca22848 part_size=0x4000

dev_name == /dev/ram0p1
[ +16.816065] foo: [foo_mount:880] size=0x4000 bdev=88003d2f6d80 
bd_inode=88003d2f6e70 bd_part=88003ca22848 part_size=0x4000

dev_name == /dev/ram0p2
[Jul29 17:41] foo: [foo_mount:880] size=0x4000 bdev=88003dc24680 
bd_inode=88003dc24770 bd_part=88003ca22848 part_size=0x4000

We can see that all 3 different bdev's point to the same bd_part, which is the 
BUG. Funny is that bd_inode is different but contains
same wrong size, for exactly the same reason.

The fix for this is easy:

diff --git drivers/block/brd.c drivers/block/brd.c
index c7d138e..0d982d7 100644
--- drivers/block/brd.c
+++ drivers/block/brd.c
@@ -378,10 +378,12 @@ static int brd_direct_access(struct block_device *bdev, 
sector_t sector,
 
if (!brd)
return -ENODEV;
+   sector += get_start_sect(bdev);
if (sector & (PAGE_SECTORS-1))
return -EINVAL;
+/* Check is wrong here we need to check against bdev->bd_part->nr_sects */
if (sector + PAGE_SECTORS > get_capacity(bdev->bd_disk))
return -ERANGE;
page = brd_insert_page(brd, sector);
if (!page)
return -ENOSPC;
@@ -532,11 +532,17 @@ static struct brd_device *brd_init_one(int i)
goto out;
}
 
+/* In the structure of this driver this will never happen and is wrong
+ * to do. We should just return NULL if not found.
+ * This is not the bug it is just DEAD CODE
+ * 
brd = brd_alloc(i);
if (brd) {
add_disk(brd->brd_disk);
list_add_tail(&brd->brd_list, &brd_devices);
}
+*/
+   prd = NULL;
 out:
return brd;
 }
@@ -558,7 +564,8 @@ static struct kobject *brd_probe(dev_t dev, int *part, void 
*data)
kobj = brd ? get_disk(brd->brd_disk) : NULL;
mutex_unlock(&brd_devices_mutex);
 
-   *part = 0;
+// Fix the partition BUG *part comes in correctly must not need to touch it
+// *part = 0;
return kobj;
 }
---

After this simple fix of commenting out *part = 0, running again I get

dev_name == /dev/ram0
[  +0.04] foo: [foo_mount:880] size=0x4000 bdev=88003d2f7740 
inode=88003d2f7830 part=88001da8c048 part_size=0x4000
dev_name == /dev/ram0p1
[  +0.02] foo: [foo_mount:880] size=0x1e748200 bdev=88003dc25040 
inode=88003dc25130 part=880036f6a000 part_size=0x1e748200
dev_name == /dev/ram0p2
[  +0.02] foo: [foo_mount:880] size=0x217b7e00 bdev=88003d2f7a80 
inode=88003d2f7b70 part=880036f69000 part_size=0x217b7e00
 

But before we are running to fix this bug. Could we please do better and just 
remove the support for partitions
all together.
Since it *never* worked anyway, so probably no one needs it! Surly no 
one used it

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 2/3] libiscsi, iser: Adjust data_length to include protection information

2014-07-27 Thread Boaz Harrosh
On 06/11/2014 12:09 PM, Sagi Grimberg wrote:
> In case protection information exists over the wire
> iscsi header data length is required to include it.
> Use protection information aware scsi helpers to set
> the correct transfer length.
> 
> In order to avoid breakage, remove iser transfer length
> checks for each task as they are not always true and
> somewhat redundant anyway.
> 
> Signed-off-by: Sagi Grimberg 
> Reviewed-by: Mike Christie 
> ---
>  drivers/infiniband/ulp/iser/iser_initiator.c |   34 +++--
>  drivers/scsi/libiscsi.c  |   18 +++---
>  2 files changed, 19 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
> b/drivers/infiniband/ulp/iser/iser_initiator.c
> index 2e2d903..8d44a40 100644
> --- a/drivers/infiniband/ulp/iser/iser_initiator.c
> +++ b/drivers/infiniband/ulp/iser/iser_initiator.c
> @@ -41,11 +41,11 @@
>  #include "iscsi_iser.h"
>  
>  /* Register user buffer memory and initialize passive rdma
> - *  dto descriptor. Total data size is stored in
> - *  iser_task->data[ISER_DIR_IN].data_len
> + *  dto descriptor. Data size is stored in
> + *  task->data[ISER_DIR_IN].data_len, Protection size
> + *  os stored in task->prot[ISER_DIR_IN].data_len
>   */
> -static int iser_prepare_read_cmd(struct iscsi_task *task,
> -  unsigned int edtl)
> +static int iser_prepare_read_cmd(struct iscsi_task *task)
>  
>  {
>   struct iscsi_iser_task *iser_task = task->dd_data;
> @@ -73,14 +73,6 @@ static int iser_prepare_read_cmd(struct iscsi_task *task,
>   return err;
>   }
>  
> - if (edtl > iser_task->data[ISER_DIR_IN].data_len) {
> - iser_err("Total data length: %ld, less than EDTL: "
> -  "%d, in READ cmd BHS itt: %d, conn: 0x%p\n",
> -  iser_task->data[ISER_DIR_IN].data_len, edtl,
> -  task->itt, iser_task->ib_conn);
> - return -EINVAL;
> - }
> -
>   err = device->iser_reg_rdma_mem(iser_task, ISER_DIR_IN);
>   if (err) {
>   iser_err("Failed to set up Data-IN RDMA\n");
> @@ -100,8 +92,9 @@ static int iser_prepare_read_cmd(struct iscsi_task *task,
>  }
>  
>  /* Register user buffer memory and initialize passive rdma
> - *  dto descriptor. Total data size is stored in
> - *  task->data[ISER_DIR_OUT].data_len
> + *  dto descriptor. Data size is stored in
> + *  task->data[ISER_DIR_OUT].data_len, Protection size
> + *  is stored at task->prot[ISER_DIR_OUT].data_len
>   */
>  static int
>  iser_prepare_write_cmd(struct iscsi_task *task,
> @@ -135,14 +128,6 @@ iser_prepare_write_cmd(struct iscsi_task *task,
>   return err;
>   }
>  
> - if (edtl > iser_task->data[ISER_DIR_OUT].data_len) {
> - iser_err("Total data length: %ld, less than EDTL: %d, "
> -  "in WRITE cmd BHS itt: %d, conn: 0x%p\n",
> -  iser_task->data[ISER_DIR_OUT].data_len,
> -  edtl, task->itt, task->conn);
> - return -EINVAL;
> - }
> -
>   err = device->iser_reg_rdma_mem(iser_task, ISER_DIR_OUT);
>   if (err != 0) {
>   iser_err("Failed to register write cmd RDMA mem\n");
> @@ -417,11 +402,12 @@ int iser_send_command(struct iscsi_conn *conn,
>   if (scsi_prot_sg_count(sc)) {
>   prot_buf->buf  = scsi_prot_sglist(sc);
>   prot_buf->size = scsi_prot_sg_count(sc);
> - prot_buf->data_len = sc->prot_sdb->length;
> + prot_buf->data_len = data_buf->data_len >>
> +  ilog2(sc->device->sector_size) * 8;
>   }
>  
>   if (hdr->flags & ISCSI_FLAG_CMD_READ) {
> - err = iser_prepare_read_cmd(task, edtl);
> + err = iser_prepare_read_cmd(task);
>   if (err)
>   goto send_command_error;
>   }
> diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
> index 26dc005..3f46234 100644
> --- a/drivers/scsi/libiscsi.c
> +++ b/drivers/scsi/libiscsi.c
> @@ -338,7 +338,7 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task 
> *task)
>   struct iscsi_session *session = conn->session;
>   struct scsi_cmnd *sc = task->sc;
>   struct iscsi_scsi_req *hdr;
> - unsigned hdrlength, cmd_len;
> + unsigned hdrlength, cmd_len, transfer_length;

I hate that you introduced this new transfer_length variable. It does
not exist. In BIDI supporting driver there is out_len and in_len just
as original code.

>   itt_t itt;
>   int rc;
>  
> @@ -391,11 +391,11 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task 
> *task)
>   if (scsi_get_prot_op(sc) != SCSI_PROT_NORMAL)
>   task->protected = true;
>  
> + transfer_length = scsi_transfer_length(sc);
> + hdr->data_length = cpu_to_be32(transfer_length);
>   if (sc->sc_data_direction == DMA_TO_DEVICE) {
> - unsigned 

Re: [PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper

2014-07-27 Thread Boaz Harrosh
On 06/25/2014 01:32 PM, Sagi Grimberg wrote:
> On 6/25/2014 11:48 AM, Sagi Grimberg wrote:
> 
> 
>>
>>> I made the patch below which should fix both bidi
>>> support in iscsi and also WRITE_SAME (and similar commands) support.
>>
>> I'm a bit confused, for all commands (bidi or not) the iscsi header 
>> data_length
>> should describe the scsi_data_buffer length, bidi information will lie 
>> in AHS header.
>> (in case bidi will ever co-exist with PI, we might need another helper 
>> that will look
>> in req->special + PI, something like scsi_bidi_transfer_length)
>>
>> So why not keep scsi_transfer_length to work on sdb length (take 
>> scsi_bufflen(scmnd) or
>> scsi_out(scmnd)->length as MKP suggested) and that's it - don't touch 
>> libiscsi.
>>
>> Let me test that.
>>
> 
> So I tested a bidirectional command using:
> $ sg_raw --infile=/root/filex --send=1024 --request=1024 
> --outfile=/root/filex "/dev/bsg/7:0:0:0" 53 00 00 00 00 00 00 00 02 00
> 
> And I see:
> kernel: session1: iscsi_prep_scsi_cmd_pdu iscsi prep [bidirectional cid 
> 0 sc 880468ca1e00 cdb 0x53 itt 0x16 len 1024 bidi_len 1024 cmdsn 223 
> win 64]
> 

This is a very bad example and tested nothing, since len && bidi_len
are the same. So even if you had a bug and took length from the
wrong side it would come out the same.

You must test with a bidi command that has two different lengths for
each side

If you have a tree that you want me to test I will be glad too.
>From this thread I'm confused as to what patches you want me to
test? please point me to a tree you need testing. You can bug me
any time, any tree. I will be happy to test.

Cheers
Boaz

> This confirms what I wrote above, so AFAICT no additional fix is 
> required for libiscsi wrt bidi commands support.
> 
> Mike?
> 
> Sagi.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/3] scsi_cmnd: Introduce scsi_transfer_length helper

2014-07-27 Thread Boaz Harrosh
On 06/25/2014 04:17 AM, Vladislav Bolkhovitin wrote:
> Martin K. Petersen, on 06/23/2014 06:58 PM wrote:
>>> "Mike" == Mike Christie  writes:
 + unsigned int xfer_len = blk_rq_bytes(scmd->request);
>>
>> Mike> Can you do bidi and dif/dix?
>>
>> Nope.
> 
> Correction: at the moment.
> 
> There is a proposal of READ GATHERED command, which is bidirectional and 
> potentially 
> DIF/DIX.
> 

That's all very good. But current infrastructure can not support BIDI+DIFF.
Because you we'll need *two* diff vectors one for each side.

Now in fact at the block layer this is actually supported. Since BIDI is
two requests, and each can have its own DIFF bio. But on the scsi layer
this is not implemented.

So I'd say: *For now DIFF and BIDI are mutually exclusive.*

(You'll need someone to love these new commands a lot in order to
 enable it)

> Vlad
> 

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] block: support > 16 byte CDBs for SG_IO

2014-07-20 Thread Boaz Harrosh
On 07/20/2014 04:27 PM, Christoph Hellwig wrote:
> On Sun, Jul 20, 2014 at 02:47:49PM +0300, Boaz Harrosh wrote:
>>
>> So two things here:
>> - hdr->cmd_len is char so can be MAX of 255. I understand that 4 bytes 
>> alignment is a SCSI
>>   thing so you found no point of checking any max size?
> 
> I don't see any point to force the aligmnet - the devices need to reject
> garbage commands, and if for some reason we'd see future commands
> that are > 252 and < 255 we are prepared to handle them.
> 

I agree

>> - Why the zero alloc, if you are going to paste over it the exact same 
>> length. Now if like in scsi
>>   you need 4 bytes alignment and zero padding yes, but here you do not do 
>> this (and probably shouldn't).
>>   Hence why zero-alloc?
> 
> No good reason except that's what sg and bsg do.
> 

Ha sorry didn't look there. Looks redundant to me that's all

<>
>> Inside here: blk_fill_sghdr_rq() calls blk_verify_command() which does:
>> (Below cmd[] is the buffer copied from user)
>>
>>  /* Anybody who can open the device can do a read-safe command */
>>  if (test_bit(cmd[0], filter->read_ok))
>>  return 0;
>>
>>  /* Write-safe commands require a writable open */
>>  if (test_bit(cmd[0], filter->write_ok) && has_write_perm)
>>  return 0;
>>
>> Now I am not sure what type "Commands" you guys intend for these large 
>> commands
>> but if they are say SCSI-VARLEN this test will not work. Also a user might 
>> fool
>> the system with pretending to be "read" a device modifying command.
>>
>> I would pass len to this test function and only permit these above if 
>> command is
>> <= 16. Any "special" large command is root only.
> 
> Honestly that whole filter crap should never have been merged to start with,
> you'll just need proper CAP_SYS_RAWIO for variable length commands.
> 
> 

I agree. What I'm saying is - Are you sure that with current code as is we will 
not
pass on large commands without the proper CAP_SYS_RAWIO.

Should we not make sure and add:
/* root can do any command. */
if (capable(CAP_SYS_RAWIO))
return 0;
+
+   if (cmnd_len > BLK_MAX_CDB)
+   return -EPERM;

Rrrr you are right. I finally get the filter code. Anything that is not 
"white-listed"
is rejected. OK sorry for the noise.

Reviewed-by: Boaz Harrosh 

Thanks
Boaz



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] block: support > 16 byte CDBs for SG_IO

2014-07-20 Thread Boaz Harrosh
On 07/20/2014 01:23 PM, Christoph Hellwig wrote:
> Signed-off-by: Christoph Hellwig 

Hi Christoph I've quickly reviewed your code and have a few questions

> ---
>  block/scsi_ioctl.c | 24 +---
>  1 file changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
> index c4e6633..a804f3e 100644
> --- a/block/scsi_ioctl.c
> +++ b/block/scsi_ioctl.c
> @@ -288,8 +288,6 @@ static int sg_io(struct request_queue *q, struct gendisk 
> *bd_disk,
>  
>   if (hdr->interface_id != 'S')
>   return -EINVAL;
> - if (hdr->cmd_len > BLK_MAX_CDB)
> - return -EINVAL;
>  
>   if (hdr->dxfer_len > (queue_max_hw_sectors(q) << 9))
>   return -EIO;
> @@ -306,14 +304,21 @@ static int sg_io(struct request_queue *q, struct 
> gendisk *bd_disk,
>   break;
>   }
>  
> + ret = -ENOMEM;
>   rq = blk_get_request(q, writing ? WRITE : READ, GFP_KERNEL);
>   if (!rq)
> - return -ENOMEM;
> + goto out;
>   blk_rq_set_block_pc(rq);
>  
> + if (hdr->cmd_len > BLK_MAX_CDB) {
> + rq->cmd = kzalloc(hdr->cmd_len, GFP_KERNEL);

So two things here:
- hdr->cmd_len is char so can be MAX of 255. I understand that 4 bytes 
alignment is a SCSI
  thing so you found no point of checking any max size?

- Why the zero alloc, if you are going to paste over it the exact same length. 
Now if like in scsi
  you need 4 bytes alignment and zero padding yes, but here you do not do this 
(and probably shouldn't).
  Hence why zero-alloc?

- Looking at sg.h where hdr->cmd_len is defined it stills has a comment of <= 
16 you might want to
  remove the comment by now.

> + if (!rq->cmd)
> + goto out_put_request;
> + }
> +
>   ret = -EFAULT;
>   if (blk_fill_sghdr_rq(q, rq, hdr, mode))

Inside here: blk_fill_sghdr_rq() calls blk_verify_command() which does:
(Below cmd[] is the buffer copied from user)

/* Anybody who can open the device can do a read-safe command */
if (test_bit(cmd[0], filter->read_ok))
return 0;

/* Write-safe commands require a writable open */
if (test_bit(cmd[0], filter->write_ok) && has_write_perm)
return 0;

Now I am not sure what type "Commands" you guys intend for these large commands
but if they are say SCSI-VARLEN this test will not work. Also a user might fool
the system with pretending to be "read" a device modifying command.

I would pass len to this test function and only permit these above if command is
<= 16. Any "special" large command is root only.

Thanks
Boaz

> - goto out;
> + goto out_free_cdb;
>  
>   if (hdr->iovec_count) {
>   size_t iov_data_len;
> @@ -323,7 +328,7 @@ static int sg_io(struct request_queue *q, struct gendisk 
> *bd_disk,
>   0, NULL, &iov);
>   if (ret < 0) {
>   kfree(iov);
> - goto out;
> + goto out_free_cdb;
>   }
>  
>   iov_data_len = ret;
> @@ -346,7 +351,7 @@ static int sg_io(struct request_queue *q, struct gendisk 
> *bd_disk,
> GFP_KERNEL);
>  
>   if (ret)
> - goto out;
> + goto out_free_cdb;
>  
>   bio = rq->bio;
>   memset(sense, 0, sizeof(sense));
> @@ -365,8 +370,13 @@ static int sg_io(struct request_queue *q, struct gendisk 
> *bd_disk,
>   hdr->duration = jiffies_to_msecs(jiffies - start_time);
>  
>   ret = blk_complete_sghdr_rq(rq, hdr, bio);
> -out:
> +
> +out_free_cdb:
> + if (rq->cmd != rq->__cmd)
> + kfree(rq->cmd);
> +out_put_request:
>   blk_put_request(rq);
> +out:
>   return ret;
>  }
>  
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] sg: relax 16 byte cdb restriction

2014-06-05 Thread Boaz Harrosh
On 06/05/2014 06:40 PM, Boaz Harrosh wrote:
> On 06/03/2014 08:18 PM, Douglas Gilbert wrote:
>> v4 of this patch was sent 20131201.
>>
>> ChangeLog:
>>  - remove the 16 byte CDB (SCSI command) length limit
>>from the sg driver by handling longer CDBs the same
>>way as the bsg driver. Remove comment from sg.h
>>public interface about the cmd_len field being
>>limited to 16 bytes.
>>  - remove some dead code caused by this change
>>  - cleanup comment block at the top of sg.h, fix urls
>>
>> Signed-off-by: Douglas Gilbert 
>>
>>
>> sg_cdb_dg5.patch
>>
>>
> <>
>>  
>> +/* SG_MAX_CDB_SIZE should be 260 (spc4r37 section 3.1.30) however the type
>> + * of sg_io_hdr::cmd_len can only represent 255
>> + */
>> +#define SG_MAX_CDB_SIZE 255
>> +
<>

BTW here too. Why new code is using the old interface?
I thought that the all point for having bsg at all is
that new stuff are implemented there in a cleaner interface
for example large CDBs.
Otherwise what was the point for bsg in the first place?
(It was before my time, It would be nice to know?)

What is then left unique at bsg after this patch? only bidi?

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5] sg: relax 16 byte cdb restriction

2014-06-05 Thread Boaz Harrosh
/include/scsi/sg.h b/include/scsi/sg.h
> index a9f3c6f..d8c0c43 100644
> --- a/include/scsi/sg.h
> +++ b/include/scsi/sg.h
> @@ -4,77 +4,34 @@
>  #include 
>  
>  /*
> -   History:
> -Started: Aug 9 by Lawrence Foard (entr...@world.std.com), to allow user
> - process control of SCSI devices.
> -Development Sponsored by Killy Corp. NY NY
> -Original driver (sg.h):
> -*   Copyright (C) 1992 Lawrence Foard
> -Version 2 and 3 extensions to driver:
> -*   Copyright (C) 1998 - 2006 Douglas Gilbert
> -
> -Version: 3.5.34 (20060920)
> -This version is for 2.6 series kernels.
> -
> -For a full changelog see http://www.torque.net/sg
> -
> -Map of SG verions to the Linux kernels in which they appear:
> -   ----
> -   original  all kernels < 2.2.6
> -   2.1.402.2.20
> -   3.0.x optional version 3 sg driver for 2.2 series
> -   3.1.17++  2.4.0++
> -   3.5.30++  2.6.0++
> -
> -Major new features in SG 3.x driver (cf SG 2.x drivers)
> - - SG_IO ioctl() combines function if write() and read()
> - - new interface (sg_io_hdr_t) but still supports old interface
> - - scatter/gather in user space, direct IO, and mmap supported
> -
> - The normal action of this driver is to use the adapter (HBA) driver to DMA
> - data into kernel buffers and then use the CPU to copy the data into the 
> - user space (vice versa for writes). That is called "indirect" IO due to 
> - the double handling of data. There are two methods offered to remove the
> - redundant copy: 1) direct IO and 2) using the mmap() system call to map
> - the reserve buffer (this driver has one reserve buffer per fd) into the
> - user space. Both have their advantages.
> - In terms of absolute speed mmap() is faster. If speed is not a concern, 
> - indirect IO should be fine. Read the documentation for more information.
> -
> - ** N.B. To use direct IO 'echo 1 > /proc/scsi/sg/allow_dio' or
> - 'echo 1 > /sys/module/sg/parameters/allow_dio' is needed.
> - That attribute is 0 by default. **
> - 
> - Historical note: this SCSI pass-through driver has been known as "sg" for 
> - a decade. In broader kernel discussions "sg" is used to refer to scatter
> - gather techniques. The context should clarify which "sg" is referred to.
> -
> - Documentation
> - =
> - A web site for the SG device driver can be found at:
> - http://www.torque.net/sg  [alternatively check the MAINTAINERS file]
> - The documentation for the sg version 3 driver can be found at:
> - http://www.torque.net/sg/p/sg_v3_ho.html
> - This is a rendering from DocBook source [change the extension to "sgml"
> - or "xml"]. There are renderings in "ps", "pdf", "rtf" and "txt" (soon).
> - The SG_IO ioctl is now found in other parts kernel (e.g. the block layer).
> - For more information see http://www.torque.net/sg/sg_io.html
> -
> - The older, version 2 documents discuss the original sg interface in detail:
> - http://www.torque.net/sg/p/scsi-generic.txt
> - http://www.torque.net/sg/p/scsi-generic_long.txt
> - Also available: /Documentation/scsi/scsi-generic.txt
> -
> - Utility and test programs are available at the sg web site. They are 
> - packaged as sg3_utils (for the lk 2.4 and 2.6 series) and sg_utils
> - (for the lk 2.2 series).
> -*/
> + * History:
> + *  Started: Aug 9 by Lawrence Foard (entr...@world.std.com), to allow user
> + *   process control of SCSI devices.
> + *  Development Sponsored by Killy Corp. NY NY
> + *
> + * Original driver (sg.h):
> + *   Copyright (C) 1992 Lawrence Foard
> + * Version 2 and 3 extensions to driver:
> + *   Copyright (C) 1998 - 2014 Douglas Gilbert
> + *
> + *  Version: 3.5.36 (20140603)
> + *  This version is for 2.6 and 3 series kernels.
> + *
> + * Documentation
> + * =
> + * A web site for the SG device driver can be found at:
> + *   http://sg.danny.cz/sg  [alternatively check the MAINTAINERS file]
> + * The documentation for the sg version 3 driver can be found at:
> + *   http://sg.danny.cz/sg/p/sg_v3_ho.html
> + * Also see: /Documentation/scsi/scsi-generic.txt
> + *
> + * For utility and test programs see: http://sg.danny.cz/sg/sg3_utils.html
> + */
>  
>  #ifdef __KERNEL__
>  extern int sg_big_buff; /* for sysctl */
>  #endif
>  
> -/* New interface introduced in the 3.x SG drivers follows */
>  
>  typedef struct sg_iovec /* same structure as used by readv() Linux system */
>  {   /* call. It defines one scatter-gather element. */
> @@ -87,7 +44,7 @@ typedef struct sg_io_hdr
>  {
>  int interface_id;   /* [i] 'S' for SCSI generic (required) */
>  int dxfer_direction;/* [i] data transfer direction  */
> -unsigned char cmd_len;  /* [i] SCSI command length ( <= 16 bytes) */
> +unsigned char cmd_len;  /* [i] SCSI command length */
>  unsigned char mx_sb_len;/* [i] max length to write to sbp */
>  unsigned short iovec_count; /* [i] 0 implies no scatter gather */
>  unsigned int dxfer_len; /* [i] byte count of data transfer */

Reviewed-by: Boaz Harrosh 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] sg: add SG_FLAG_Q_AT_TAIL flag

2014-06-05 Thread Boaz Harrosh
On 06/04/2014 05:58 PM, Douglas Gilbert wrote:
> When the SG_IO ioctl was copied into the block layer and
> later into the bsg driver, subtle differences emerged.
> 
> One difference is the way injected commands are queued through
> the block layer (i.e. this is not SCSI device queueing nor SATA
> NCQ). Summarizing:
>- SG_IO in the block layer: blk_exec*(at_head=false)
>- sg SG_IO: at_head=true
>- bsg SG_IO: at_head=true
> 
> Some time ago Boaz Harrosh introduced a sg v4 flag called
> BSG_FLAG_Q_AT_TAIL to override the bsg driver default.
> This patch does the equivalent for the sg driver.
> 

Yep

> 
> ChangeLog:
>  Introduce SG_FLAG_Q_AT_TAIL flag to cause commands
>  to be injected into the block layer with
>  at_head=false.
> 
> Changes since v1:
>  Make guard condition (only take sg v3 interface or later
>  invocations) clearer.
> 
> Signed-off-by: Douglas Gilbert 
> 

Douglas Hi

Please I'm just curious? up until now all application users
used "SG_FLAG_Q_AT_HEAD". Therefor I deduce that you guys
came across a new application that can make use of the new
SG_FLAG_Q_AT_TAIL facility.

What was the application's writer considerations for using
the old sg interface and not use the newer bsg interface
that already has this support. For me I can see 2 main areas
that I find bsg missing.

1. aio "scatter_gather" type io. 
   (ie multiple pointers multiple length buffers that are
written / read from same linear range on device)
  [The async aspect of aio can be implemented via bsg
   with the write+read system calls]
2. mmap of direct device range to user vm memory

Which of these, or other, considerations where cardinal
in using of the old interface in new code?

(For me personally both above shortcomings are very
 much missed]

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] scsi patch queue tree

2014-06-02 Thread Boaz Harrosh
> 
> Looks more like an ack than a hack :)
> 

Oops so I do need that coffee  ;0)


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] scsi patch queue tree

2014-06-02 Thread Boaz Harrosh
On 06/02/2014 10:58 AM, Christoph Hellwig wrote:
> With Linus opening the merge window things should calm down now.
> 
> I've pushed the mvsas pciid update, and a revert of the be2scsi patch
> that wasn't quite ready to the drivers-for-3.16 branch.
> 
> I've also created a new drivers-for-3.16-2 branch that contains the
> remaining hpsa updates that might be too big for Linus taste that late
> but should probably be fine.  I also expect to add the relatively small
> lpfc update to it once I get second review in proper form.
> 

Hi Christoph

Could you please add this very trivial patch for this merge window?
http://www.spinics.net/lists/linux-scsi/msg74774.html

my hack here: http://www.spinics.net/lists/linux-scsi/msg74775.html

Forgot to CC you sorry
Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/1] include/scsi/osd_protocol.h: remove unnecessary __constant

2014-06-01 Thread Boaz Harrosh
On 06/01/2014 05:06 PM, Fabian Frederick wrote:
> __constant_cpu_to_be16 converted to cpu_to_be16
> 
> This patch fixes checkpatch warnings:
> 
> "WARNING: __constant_cpu_to_be16 should be cpu_to_be16"
> 
> Cc: Boaz Harrosh 
> Cc: Benny Halevy 
> Signed-off-by: Fabian Frederick 

Ack-by: Boaz Harrosh 

James please apply
Thanks
Boaz

> ---
>  include/scsi/osd_protocol.h | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/include/scsi/osd_protocol.h b/include/scsi/osd_protocol.h
> index 25ac628..a2594af 100644
> --- a/include/scsi/osd_protocol.h
> +++ b/include/scsi/osd_protocol.h
> @@ -263,16 +263,16 @@ static inline struct osd_cdb_head *osd_cdb_head(struct 
> osd_cdb *ocdb)
>   * Ex name = FORMAT_OSD we have OSD_ACT_FORMAT_OSD && OSDv1_ACT_FORMAT_OSD
>   */
>  #define OSD_ACT___(Name, Num) \
> - OSD_ACT_##Name = __constant_cpu_to_be16(0x8880 + Num), \
> - OSDv1_ACT_##Name = __constant_cpu_to_be16(0x8800 + Num),
> + OSD_ACT_##Name = cpu_to_be16(0x8880 + Num), \
> + OSDv1_ACT_##Name = cpu_to_be16(0x8800 + Num),
>  
>  /* V2 only actions */
>  #define OSD_ACT_V2(Name, Num) \
> - OSD_ACT_##Name = __constant_cpu_to_be16(0x8880 + Num),
> + OSD_ACT_##Name = cpu_to_be16(0x8880 + Num),
>  
>  #define OSD_ACT_V1_V2(Name, Num1, Num2) \
> - OSD_ACT_##Name = __constant_cpu_to_be16(Num2), \
> - OSDv1_ACT_##Name = __constant_cpu_to_be16(Num1),
> + OSD_ACT_##Name = cpu_to_be16(Num2), \
> + OSDv1_ACT_##Name = cpu_to_be16(Num1),
>  
>  enum osd_service_actions {
>   OSD_ACT_V2(OBJECT_STRUCTURE_CHECK,  0x00)
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: lpfc: lpfc_init: use kcalloc for allocating memory

2014-04-09 Thread Boaz Harrosh
On 04/08/2014 11:42 PM, Matei Oprea wrote:
> It's easier to use kcalloc for allocating arrays. While at it
> also remove useless casting value.
> 
> Signed-off-by: Matei Oprea 
> Cc: ROSEdu Kernel Community 
> ---
>  drivers/scsi/lpfc/lpfc_init.c | 14 --
>  1 file changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
> index 635eeb3..1528f34 100644
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -4742,12 +4742,14 @@ lpfc_sli_driver_resource_setup(struct lpfc_hba *phba)
>   phba->cfg_sg_seg_cnt = LPFC_DEFAULT_MENLO_SG_SEG_CNT;
>   }
>  
> - if (!phba->sli.ring)
> - phba->sli.ring = (struct lpfc_sli_ring *)
> - kzalloc(LPFC_SLI3_MAX_RING *
> - sizeof(struct lpfc_sli_ring), GFP_KERNEL);
> - if (!phba->sli.ring)
> - return -ENOMEM;
> + if (unlikely(!phba->sli.ring)) {

As Joe Perches said this one is not unlikely, it might or it might not
depending on state.

> + phba->sli.ring = kcalloc(LPFC_SLI3_MAX_RING,
> + sizeof(struct lpfc_sli_ring),
> + GFP_KERNEL);
> +
> + if (!phba->sli.ring)

but this one is unlikely. This is the error path so we want
the hot path to be decoded by the CPU.

> + return -ENOMEM;
> + }
>  
>   /*
>* Since lpfc_sg_seg_cnt is module parameter, the sg_dma_buf_size
> 

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: lpfc: lpfc_init: use kcalloc for allocating memory

2014-04-07 Thread Boaz Harrosh
On 03/18/2014 10:51 PM, Matei Oprea wrote:
> It's easier to use kcalloc for allocating arrays. While at it
> also remove useless casting value.
> 
> Signed-off-by: Matei Oprea 
> Cc: ROSEdu Kernel Community 
> ---
>  drivers/scsi/lpfc/lpfc_init.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/scsi/lpfc/lpfc_init.c b/drivers/scsi/lpfc/lpfc_init.c
> index 68c94cc..0a51ca5 100644
> --- a/drivers/scsi/lpfc/lpfc_init.c
> +++ b/drivers/scsi/lpfc/lpfc_init.c
> @@ -4731,9 +4731,9 @@ lpfc_sli_driver_resource_setup(struct lpfc_hba *phba)
>   }
>  
>   if (!phba->sli.ring)
> - phba->sli.ring = (struct lpfc_sli_ring *)
> - kzalloc(LPFC_SLI3_MAX_RING *
> + phba->sli.ring = kcalloc(LPFC_SLI3_MAX_RING,
>   sizeof(struct lpfc_sli_ring), GFP_KERNEL);
> +

Just a nit please put this recheck inside the if(){} so in the hot path
when phba->sli.ring is already allocated it will not test twice.

>   if (!phba->sli.ring)

and unlikely() on this if

>   return -ENOMEM;
>  
> 

thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: misc scsi midlayer updates

2014-03-30 Thread Boaz Harrosh
On 03/27/2014 06:14 PM, Christoph Hellwig wrote:
> Various patches from the scsi multiqueue development that make sense on their
> own.
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Hi Christoph

Many years ago I have sent these exact patches but no-one cared Including
me I guess.

(one instance is here, I sent it several times)
http://comments.gmane.org/gmane.linux.scsi/63347

Please note the 4th patch which is a real BUG, titled:
scsi_lib: Can't RETRY scsi_cmnd if some bytes were completed

I think my:
scsi_lib: Remove that __scsi_release_buffers contraption
Is more elegant, is layered better and is smaller code. (please
consider my version)

Also the first patch is some more cleanup you'd like.

The main patch of collapsing  scsi_end_request is basically the same.

Funny that I've just rebased an exactly 2 years ago version and it
rebased cleanly. So here they are for your reference.

[PATCH 1/4] scsi_lib: request_queue is only needed inside
[PATCH 2/4] scsi_lib: Remove that __scsi_release_buffers contraption
[PATCH 3/4] scsi_lib: Collapse scsi_end_request into only user
[PATCH 4/4] scsi_lib: BUG: Can't RETRY scsi_cmnd if some bytes were

[Your patches have been tested within my MQ testing right?]

Thanks for pushing on this
Boaz

>From 9809b82bbb9813370738ac00400c01d61b0c51b4 Mon Sep 17 00:00:00 2001
From: Boaz Harrosh 
Date: Mon, 4 Jan 2010 10:45:56 +0200
Subject: [PATCH 1/4] scsi_lib: request_queue is only needed inside
 scsi_requeue_command

This is a pure cleanup. Just starting the engines for things
to come.

Signed-off-by: Boaz Harrosh 
---
 drivers/scsi/scsi_lib.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index 62ec84b..dd834ca 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -491,10 +491,11 @@ void scsi_requeue_run_queue(struct work_struct *work)
  *		sector.
  * Notes:	Upon return, cmd is a stale pointer.
  */
-static void scsi_requeue_command(struct request_queue *q, struct scsi_cmnd *cmd)
+static void scsi_requeue_command(struct scsi_cmnd *cmd)
 {
 	struct scsi_device *sdev = cmd->device;
 	struct request *req = cmd->request;
+	struct request_queue *q = req->q;
 	unsigned long flags;
 
 	/*
@@ -565,7 +566,6 @@ static void __scsi_release_buffers(struct scsi_cmnd *, int);
 static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error,
 	  int bytes, int requeue)
 {
-	struct request_queue *q = cmd->device->request_queue;
 	struct request *req = cmd->request;
 
 	/*
@@ -584,7 +584,7 @@ static struct scsi_cmnd *scsi_end_request(struct scsi_cmnd *cmd, int error,
  * queue, and goose the queue again.
  */
 scsi_release_buffers(cmd);
-scsi_requeue_command(q, cmd);
+scsi_requeue_command(cmd);
 cmd = NULL;
 			}
 			return cmd;
@@ -779,7 +779,6 @@ static int __scsi_error_from_host_byte(struct scsi_cmnd *cmd, int result)
 void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 {
 	int result = cmd->result;
-	struct request_queue *q = cmd->device->request_queue;
 	struct request *req = cmd->request;
 	int error = 0;
 	struct scsi_sense_hdr sshdr;
@@ -1003,7 +1002,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 			scsi_print_command(cmd);
 		}
 		if (blk_end_request_err(req, error))
-			scsi_requeue_command(q, cmd);
+			scsi_requeue_command(cmd);
 		else
 			scsi_next_command(cmd);
 		break;
@@ -1012,7 +1011,7 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 		 * A new command will be prepared and issued.
 		 */
 		scsi_release_buffers(cmd);
-		scsi_requeue_command(q, cmd);
+		scsi_requeue_command(cmd);
 		break;
 	case ACTION_RETRY:
 		/* Retry the same command immediately */
-- 
1.8.5.3

>From aac579b02c7d3665665a1f1f7c064a82b70b873a Mon Sep 17 00:00:00 2001
From: Boaz Harrosh 
Date: Mon, 4 Jan 2010 12:46:06 +0200
Subject: [PATCH 2/4] scsi_lib: Remove that __scsi_release_buffers contraption

Remove "do_bidi_check" by checking and nullifying cmd->request
when blk_end_* indicates the request is no longer valid.

Signed-off-by: Boaz Harrosh 
---
 drivers/scsi/scsi_lib.c | 41 +
 1 file changed, 17 insertions(+), 24 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index dd834ca..c4cdc6a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -539,8 +539,6 @@ void scsi_run_host_queues(struct Scsi_Host *shost)
 		scsi_run_queue(sdev->request_queue);
 }
 
-static void __scsi_release_buffers(struct scsi_cmnd *, int);
-
 /*
  * Function:scsi_end_request()
  *
@@ -591,11 +589,12 @@ static struct scsi_cmnd *scsi_end_reque

Re: [PATCH v6 2/2] block,scsi: convert and handle ERR_PTR from blk_get_request

2013-11-04 Thread Boaz Harrosh
On 10/31/2013 03:50 PM, Joe Lawrence wrote:
> The blk_get_request function may fail in low-memory conditions or during
> device removal (even if __GFP_WAIT is set). To distinguish between these
> errors, modify the blk_get_request call stack to return the appropriate
> ERR_PTR. Verify that all callers check the return status and consider
> IS_ERR instead of a simple NULL pointer check.
> 
> Signed-off-by: Joe Lawrence 
> ---
<>
> diff --git a/drivers/scsi/osd/osd_initiator.c 
> b/drivers/scsi/osd/osd_initiator.c
> index aa66361ed44b..0250efda647b 100644
> --- a/drivers/scsi/osd/osd_initiator.c
> +++ b/drivers/scsi/osd/osd_initiator.c
> @@ -1567,8 +1567,8 @@ static struct request *_make_request(struct 
> request_queue *q, bool has_write,
>   struct request *req;
>  
>   req = blk_get_request(q, has_write ? WRITE : READ, flags);
> - if (unlikely(!req))
> - return ERR_PTR(-ENOMEM);
> + if (unlikely(IS_ERR(req)))

Just a nit IS_ERR already has an unlikely so it can be dropped here. (No bigy)

ACK-by: Boaz Harrosh 

> + return req;
>  
>   return req;
>   }
<>

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bypass block layer and Fill SCSI lower layer driver queue

2013-09-18 Thread Boaz Harrosh
On 09/18/2013 05:07 PM, Douglas Gilbert wrote:
> On 13-09-18 03:58 AM, Jack Wang wrote:
>> On 09/18/2013 08:41 AM, Alireza Haghdoost wrote:
>>> Hi
>>>
>>> I am working on a high throughput and low latency application which
>>> does not tolerate block layer overhead to send IO request directly to
>>> fiber channel lower layer SCSI driver. I used to work with libaio but
>>> currently I am looking for a way to by pass the block layer and send
>>> SCSI commands from the application layer directly to the SCSI driver
>>> using /dev/sgX device and ioctl() system call.
>>>
>>> I have noticed that sending IO request through sg device even with
>>> nonblocking and direct IO flags is quite slow and does not fill up
>>> lower layer SCSI driver TCQ queue. i.e IO depth or
>>> /sys/block/sdX/in_flight is always ZERO. Therefore the application
>>> throughput is even lower that sending IO request through block layer
>>> with libaio and io_submit() system call. In both cases I used only one
>>> IO context (or fd) and single threaded.
>>>
>> Hi Alireza,
>>
>> I think what you want is in_flight command scsi dispatch to low level
>> device.
>> I submit a simple patch to export device_busy
>>
>> http://www.spinics.net/lists/linux-scsi/msg68697.html
>>
>> I also notice fio sg engine will not fill queue properly, but haven't
>> look into deeper.
>>
>> Cheers
>> Jack
>>
>>> I have noticed that some well known benchmarking tools like fio does
>>> not support IO depth for sg devices as well. Therefore, I was
>>> wondering if it is feasible to bypass block layer and achieve higher
>>> throughput and lower latency (for sending IO request only).
>>>
>>>
>>> Any comment on my issue is highly appreciated.
> 
> I'm not sure if this is relevant to your problem but by
> default both the bsg and sg drivers "queue at head"
> when they inject SCSI commands into the block layer.
> 
> The bsg driver has a BSG_FLAG_Q_AT_TAIL flag to change
> that queueing to what may be preferable for your purposes.
> The sg driver could, but does not, support that flag.
> 

Yes!

The current best bet for keeping the queues full are with libaio
and direct + asynchronous IO. It should not be significantly slower than
bsg. (Believe me, with direct IO Block-Device cache is bypassed and
the only difference is in who prepares the struct requests for submission)

As Doug said sg can not do it. Also with bsg and above BSG_FLAG_Q_AT_TAIL
You will need to use the write() option and not the ioctl() because the
later is synchronous and you want an asynchronous submit of commands
and background completion of them. (Which is what libaio does with async IO)

With bsg you achieve that with using write() in combination of read() to
receive the completions.

> Doug Gilbert
> 

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] block: handle pointer error from blk_get_request

2013-05-23 Thread Boaz Harrosh
On 23/05/13 23:09, Joe Lawrence wrote:
> Hi Jens,
> 
> Subject: [PATCH v4] block: handle pointer error from blk_get_request
> 
> The blk_get_request function may fail in low-memory conditions or during
> device removal (even if __GFP_WAIT is set). To distinguish between these
> errors, modify the blk_get_request call stack to return the appropriate
> error pointer. Verify that all callers check the return status and
> consider IS_ERR instead of a simple NULL pointer check.
> 
> Signed-off-by: Joe Lawrence 
> Cc: Jens Axboe 
> Cc: "James E.J. Bottomley" 
> Cc: Bart Van Assche 
> Cc: linux-scsi@vger.kernel.org

ACK-by: Boaz Harrosh 

> ---
<>
>  drivers/scsi/osd/osd_initiator.c|  4 ++--
<>
> diff --git a/drivers/scsi/osd/osd_initiator.c 
> b/drivers/scsi/osd/osd_initiator.c
> index d8293f2..b4cd56b 100644
> --- a/drivers/scsi/osd/osd_initiator.c
> +++ b/drivers/scsi/osd/osd_initiator.c
> @@ -1567,8 +1567,8 @@ static struct request *_make_request(struct 
> request_queue *q, bool has_write,
>   struct request *req;
>  
>   req = blk_get_request(q, has_write ? WRITE : READ, flags);
> - if (unlikely(!req))
> - return ERR_PTR(-ENOMEM);
> + if (unlikely(IS_ERR(req)))
> + return req;
>  
>   return req;
>   }


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC] Thin provisioning SOFT_THRESHOLD error handling

2013-02-07 Thread Boaz Harrosh
On 01/29/2013 10:14 AM, Hannes Reinecke wrote:
> Hi all,
> 
> Thin-provisioned devices have the ability to set a 'soft threshold', 
> which is triggered if the real free space for this device is beyond 
> this mark.
> 
> The intention behind this is to allow the system to induce some 
> garbage collection with possibly freeing up unused space.
> 
> Initially it would be possible to execute garbage collection on 
> filesystems (eg for btrfs).
> 
> However, as this concept applies to other areas within the kernel
> (like dm-thinp or even btrfs itself) it might be an idea to have
> a general mechanism / error handling etc in place.
> 
> I would like to discuss at LSF the possible implementations
> and handling mechanism for this kind of failure scenarios.
> 

Hannes hi!

This is received as an "unit attentions", right?
Will it not be worth while to solve the general "unit attentions"
under udev events, once and for all. Than such a btrfs GC above
can just be a simple oneline udev rule.

(I think that the event-storm problem you had at the time can
 be solved with some Kernel side "unit attentions" queue, and
 greatly reduce the chance for missed events)

Thanks
Boaz

> Cheers,
> 
> Hannes
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][ATTEND] protection information and userspace

2013-02-07 Thread Boaz Harrosh
On 02/07/2013 02:33 PM, Hannes Reinecke wrote:
> On 02/07/2013 01:16 PM, Boaz Harrosh wrote:
>> (Again libaio should be changed in concert with Kernel's new API, and we
>>   can sacrifice old user-mode performance, with a COMPAT layer. Distro
>>   maintainers should consider replacing libaio, together with the new
>>   Kernel, so it is only those that do their own mix-and-match, who can
>>   fix that mismatch too)
>>
> And while we're at it, I still would _love_ to connect aio_cancel() 
> and blk_abort_request().
> 
> That way we could sensibly abort an I/O and get out of the darn 'D' 
> state.
> 

Yes!! Thanks. It is very interesting how the socket side of the world
had it correct for ages, and the same "fd" object on disks is second grade
citizen in UNIX land. (Anybody voting for epoll on async disk IO? )

Thanks Hannes yes that too. And wait_interuptable() too, at couple of
places, will need some serious error handling audit for that.

> Cheers,
> 
> Hannes
> 

Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][ATTEND] protection information and userspace

2013-02-07 Thread Boaz Harrosh
On 02/07/2013 02:29 PM, Bart Van Assche wrote:
> On 02/07/13 13:08, Boaz Harrosh wrote:
>> (My addition is for support of sg_lists to bsg, in a way that makes Tomo 
>> happy
>>   I know that qemu was wanting this for a while as well as the multitude of
>>   user-mode servers)
> 
> Do you think it would help / make sense if sg_alloc_table() would be 
> modified such that it allocates the entire scatterlist table via one 
> vmalloc() call instead of chaining several page-sized scatterlist tables 
> ? Note: such a change is not possible without modifying 
> scsi_alloc_sgtable().
> 

I don't think so, no. sg_alloc_table() is used not only for direct IO
also for buffered, Now vmalloc() is terribly slow and would be a bottleneck
in today's SSD performance.

I love it that the Linux Kernel never uses vmalloc internally, and only ever
chains everything to upto PAGE_SIZE sized objects. Coming from all these
other OSs that don't, believe me, it is great great performance pain.
(TLBs are a bitch)

> Bart.
> 

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [LSF/MM TOPIC][ATTEND] protection information and userspace

2013-02-07 Thread Boaz Harrosh
On 02/07/2013 02:08 PM, Boaz Harrosh wrote:
> On 02/07/2013 01:27 PM, Hannes Reinecke wrote:
>> On 02/07/2013 11:01 AM, Darrick J. Wong wrote:
>>> On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote:
>>>> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:
>>>>>
>>>>> On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong"  
>>>>> wrote:
>>>>>
>>>>>> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm interested in discussing how to pass protection information to and 
>>>>>>> from
>>>>>>> userspace.  Maybe Martin could be enlisted for the discussion.
>>>>>>>
>>>>>>> I read that some work has already been done in this area but have not 
>>>>>>> been able
>>>>>>> to locate it.  It looks like the bio-integrity code already makes it 
>>>>>>> possible
>>>>>>> to generate the t10-dif crc in the filesystem.  It would be good to be 
>>>>>>> able to
>>>>>>> get the guard and application tags back out to backup applications such 
>>>>>>> as
>>>>>>> xfsdump.  Enabling other applications to generate their own tags in 
>>>>>>> userspace
>>>>>>> is also interesting.
>>>>>>
>>>>>> This one's been on my list for a couple of years (and companies) too.  A 
>>>>>> few
>>>>>> years ago Joel Becker had support for it in his sys_dio proposal (that 
>>>>>> hasn't
>>>>>> gone anywhere), and more recently I've theorized that we could add a 
>>>>>> magic
>>>>>> fcntl/ioctl to make the kernel recognize, say, the first iovec of a 
>>>>>> O_DIRECT
>>>>>> *{read,write}v call as the PI buffer, which I think is similar to how 
>>>>>> DIX gets
>>>>>> PI data to a disk.  But it's not like I have any code to show for it.
>>>>>>
>>>>>> I /think/ it's fairly straightforward to change the directio submit code 
>>>>>> to
>>>>>> find the userspace PI buffer and amend the block integrity code to 
>>>>>> attach our
>>>>>> own PI buffer.  You'd still have to let the block layer set the sector # 
>>>>>> field,
>>>>>> but afaik that won't affect the crc or the app tag.
>>>>>>
>>>>>> I hear that the NFS guys want to propose some sort of protocol for 
>>>>>> transmitting
>>>>>> PI data (across NFS), but I haven't seen anything concrete yet.
>>>>>
>>>>> I'm writing a requirements document for the NFS protocol which I can 
>>>>> discuss at LSF.  The use cases for NFS for now would be virtual disk 
>>>>> devices (hypervisors) or direct NFS access to storage from user space.
>>>>>
>>>>> Like everyone else we are waiting for a magical VFS and user space API to 
>>>>> appear that can pass PI to and from storage.
>>>>
>>>> I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
>>>> coding hasn't happened.  I do think we're better off with some kind of
>>>> explicit API than some magic state on the file.  I mean, even something
>>>> like:
>>>>
>>>>ssize_t write_with_pi(int fd, const void *buf, size_t count,
>>>>  const void *pi, size_t pi_count);
>>>>
>>>> It's not as nice as a non-historical API (eg sys_dio), but it also
>>>> probably plays nicer with buffered I/O.
>>>
>>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to 
>>> libaio
>>> and all the other plumbing necessary to make that happen...
>>>
>>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
>>>int iovcnt, long long offset, const void *pi,
>>>size_t pi_count);
>>>
>> This is also what I've envisioned.
>> Updating io_prep / async I/O is reasonably easy as its been using a 
>> separate structure for passing in the I/O details.
>>
>> Normal read/write calls don't really map as you simply don't have 
>>

Re: [LSF/MM TOPIC][ATTEND] protection information and userspace

2013-02-07 Thread Boaz Harrosh
On 02/07/2013 01:27 PM, Hannes Reinecke wrote:
> On 02/07/2013 11:01 AM, Darrick J. Wong wrote:
>> On Thu, Feb 07, 2013 at 01:40:14AM -0800, Joel Becker wrote:
>>> On Wed, Feb 06, 2013 at 03:34:49PM -0500, Chuck Lever wrote:

 On Feb 6, 2013, at 3:24 PM, "Darrick J. Wong"  
 wrote:

> On Wed, Feb 06, 2013 at 01:51:22PM -0600, Ben Myers wrote:
>> Hi,
>>
>> I'm interested in discussing how to pass protection information to and 
>> from
>> userspace.  Maybe Martin could be enlisted for the discussion.
>>
>> I read that some work has already been done in this area but have not 
>> been able
>> to locate it.  It looks like the bio-integrity code already makes it 
>> possible
>> to generate the t10-dif crc in the filesystem.  It would be good to be 
>> able to
>> get the guard and application tags back out to backup applications such 
>> as
>> xfsdump.  Enabling other applications to generate their own tags in 
>> userspace
>> is also interesting.
>
> This one's been on my list for a couple of years (and companies) too.  A 
> few
> years ago Joel Becker had support for it in his sys_dio proposal (that 
> hasn't
> gone anywhere), and more recently I've theorized that we could add a magic
> fcntl/ioctl to make the kernel recognize, say, the first iovec of a 
> O_DIRECT
> *{read,write}v call as the PI buffer, which I think is similar to how DIX 
> gets
> PI data to a disk.  But it's not like I have any code to show for it.
>
> I /think/ it's fairly straightforward to change the directio submit code 
> to
> find the userspace PI buffer and amend the block integrity code to attach 
> our
> own PI buffer.  You'd still have to let the block layer set the sector # 
> field,
> but afaik that won't affect the crc or the app tag.
>
> I hear that the NFS guys want to propose some sort of protocol for 
> transmitting
> PI data (across NFS), but I haven't seen anything concrete yet.

 I'm writing a requirements document for the NFS protocol which I can 
 discuss at LSF.  The use cases for NFS for now would be virtual disk 
 devices (hypervisors) or direct NFS access to storage from user space.

 Like everyone else we are waiting for a magical VFS and user space API to 
 appear that can pass PI to and from storage.
>>>
>>> I'm happy to chat about it.  Unfortunately, like Darrick says, sys_dio()
>>> coding hasn't happened.  I do think we're better off with some kind of
>>> explicit API than some magic state on the file.  I mean, even something
>>> like:
>>>
>>> ssize_t write_with_pi(int fd, const void *buf, size_t count,
>>>   const void *pi, size_t pi_count);
>>>
>>> It's not as nice as a non-historical API (eg sys_dio), but it also
>>> probably plays nicer with buffered I/O.
>>
>> I also pondered simply adding a new io_prep_* function + IO_CMD_ code to 
>> libaio
>> and all the other plumbing necessary to make that happen...
>>
>> void io_prep_preadv_pi(struct iocb *iocb, int fd, const struct iovec *iov,
>> int iovcnt, long long offset, const void *pi,
>> size_t pi_count);
>>
> This is also what I've envisioned.
> Updating io_prep / async I/O is reasonably easy as its been using a 
> separate structure for passing in the I/O details.
> 
> Normal read/write calls don't really map as you simply don't have 
> enough parameter to feed PI information into the kernel.
> So for that you'd need to invent a new interface / syscall.
> 
> For aio we just need to add additional fields to an existing structure.
> 
> So yeah, I'd be interested in that discussion as well.
> 

Me too, in multiple fronts. It's part of my general concern about
   "things we would like for user-mode servers"

I think that the current aio and libaio Interface is broken for a long
time, for multitude of reasons. For instance the nested structure definitions
are COMPAT broken, and lots of missing pieces. (For example search in archives
for why bsg does not support sg-lists.)

And there are all these additions that everyone wants on top, that call for
a new interface anyway.

So I would like to see a deep fixup of this interface, with an aio version2
that can take into considerations, all of future needs including these
above. Kernel code will be very happy to be implemented with the new, interface
and a COMPAT layer could be put in place for the old interface.

All interested parties should bring to the table what is the extension/changes
they need. And we can try and union all of them together.

(My addition is for support of sg_lists to bsg, in a way that makes Tomo happy
 I know that qemu was wanting this for a while as well as the multitude of
 user-mode servers)

Thanks
Boaz

> Cheers,
> 
> Hannes
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of 

Re: [patch] [SCSI] libosd: check for kzalloc() failure

2013-01-30 Thread Boaz Harrosh
On 01/30/2013 09:06 AM, Dan Carpenter wrote:
> There wasn't any error handling for this kzalloc().
> 

ACK-by: Boaz Harrosh 

James please queue for inclusion

> Signed-off-by: Dan Carpenter 

Thanks Dan

> 
> diff --git a/drivers/scsi/osd/osd_initiator.c 
> b/drivers/scsi/osd/osd_initiator.c
> index c06b8e5..d8293f2 100644
> --- a/drivers/scsi/osd/osd_initiator.c
> +++ b/drivers/scsi/osd/osd_initiator.c
> @@ -144,6 +144,10 @@ static int _osd_get_print_system_info(struct osd_dev *od,
>   odi->osdname_len = get_attrs[a].len;
>   /* Avoid NULL for memcmp optimization 0-length is good enough */
>   odi->osdname = kzalloc(odi->osdname_len + 1, GFP_KERNEL);
> + if (!odi->osdname) {
> + ret = -ENOMEM;
> + goto out;
> + }
>   if (odi->osdname_len)
>   memcpy(odi->osdname, get_attrs[a].val_ptr, odi->osdname_len);
>   OSD_INFO("OSD_NAME   [%s]\n", odi->osdname);
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] [SCSI] libosd: check for kzalloc() failure

2013-01-30 Thread Boaz Harrosh
On 01/30/2013 04:34 PM, walter harms wrote:
> 
<>
> I start to see the complexity of the situation. Would you mind to add
> the comment "it can be anything.  UTF-8 is more likely but not guaranteed 
> either" ?
> For now using a pascal-string seems the best solution but it should be warned
> that get_attrs[a].val_ptr is NOT a c-string and therefore can not be used with
> the printf-family (i guess the situation will become more clear in future)
> 

OK, So... The long story

The STD says that osdname can be *anything* binary or otherwise, and of *any*
length. And is used to uniquely identify the OSD-device/partition in a cluster.

We decided that if so we would stick an 40 char UUID in it, generated by a 
uuidgen
and all the external utilities and surrounding code forces it, and treat it as
a c-string. But this code here in the core cannot make that assumption and still
support the STD.

On the other hand we did want the osdname to be printable in traces and messages
because it is such an important identifier. So I have decided to sacrifice an
extra char in-memory to carry a \NUL and safely stick it into printk(s). Those
(none existent) OSD devices that will put unprintable characters in here will
still function fine, but will look real scary in printk(s).

Please note that the one that sets policy is the osd-target vendor. (And they
all currently use my code base)

So save the kzalloc check this code (tested) is safe and will show strings when
present, but will gracefully show ugly things but still work when not.

> I have no clue why you need that, using c-strings makes life more easy in the
> last minute a sprintf(buf,"%u") will save the day ;)
> 

Actually it is very funny, because just recently (last week) I have discovered
something that eliminates all those funny business.

printf("%*s", odi->osdname, odi->osdname_len);

The "*" will instruct c to expect an extra variable following %s which is the
max_length of %s. This is exactly for pascal strings and the such like above.

So I added a TODO to clean that a bit by always printing with "%*s"

>>
> It look clever to add the NULL test here.
> Noone reviewing the code will understand that.
> (Rule of least surprise)
> 

Thanks for looking, I agree it needs a fat comment, I'll do that when I'll
convert to above system. Thanks for looking, That code is really old and never
had any extra eyes on it.

> just my 2 cents,
> re,
>  wh
> 

Cheers
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] exofs: 3 changes to exofs & osd

2012-12-18 Thread Boaz Harrosh
On 12/18/2012 01:28 PM, James Bottomley wrote:
<>
>> Both these patches where in linux-next for a long time. So I believe
>> the merge will go just fine. Lets leave it like this, or I can rebase
>> and remove it?
> 
> If it merges OK, I'd just leave it as is.  It wouldn't be anywhere close
> to the first time we've had the same patch via different trees.
> 

Thanks James, it will not happen again. I just had a look on linux-next
of Dec 17 they both where merged in and I did not see any complains.

> James
> 
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [GIT PULL] exofs: 3 changes to exofs & osd

2012-12-18 Thread Boaz Harrosh
On 12/18/2012 12:58 PM, James Bottomley wrote:
> On Mon, 2012-12-17 at 18:53 +0200, Boaz Harrosh wrote:
>> Hi Linus.
>>
>> Please pull the following changes since commit [ddffeb8c] Linux 3.7-rc1
>> They are available in the git repository at:
>>
>>   git://git.open-osd.org/linux-open-osd.git for-linus
>>
>> for you to fetch changes up to
>>  [861d6660] exofs: don't leak io_state and pages on read error 
>> (2012-12-14 12:17:32 +0200)
>>
>> These are just 3 patches, the last two are bug fixes on the error paths
>> in exofs.
>>
>> The important patch is the one to osd_uld which adds sysfs info to osd
>> devices for use by user-mode clustering discovery software. I'm already
>> sitting on this patch since before February this year, It is important for
>> some of the big installation cluster systems, who's been compiling their
>> own kernel just for that patch.
> 
> I'm a bit perplexed by this.  You got notice when it was added to the
> SCSI tree and now it's already upstream:
> 
> commit 51976a8c85cec0c62e410bc38b8a11dbc690764d
> Author: Boaz Harrosh 
> Date:   Wed Oct 24 14:51:41 2012 -0700
> 
> [SCSI] osd_uld: Add osdname & systemid sysfs at scsi_osd class
> 
> But the authorship info differs ... it looks like you forgot to include
> the From: tag in your original patch send.
> 

I'm so sorry, I completely goofed on this one. It's what happens when
you are swamped with other work and are doing things without thinking.
I totally forgot that I need to remove this patch. 

Both these patches where in linux-next for a long time. So I believe
the merge will go just fine. Lets leave it like this, or I can rebase
and remove it?

> James
> 
> 

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GIT PULL] exofs: 3 changes to exofs & osd

2012-12-17 Thread Boaz Harrosh
Hi Linus.

Please pull the following changes since commit [ddffeb8c] Linux 3.7-rc1
They are available in the git repository at:

  git://git.open-osd.org/linux-open-osd.git for-linus

for you to fetch changes up to
[861d6660] exofs: don't leak io_state and pages on read error 
(2012-12-14 12:17:32 +0200)

These are just 3 patches, the last two are bug fixes on the error paths
in exofs.

The important patch is the one to osd_uld which adds sysfs info to osd
devices for use by user-mode clustering discovery software. I'm already
sitting on this patch since before February this year, It is important for
some of the big installation cluster systems, who's been compiling their
own kernel just for that patch.

Thanks in advance
Boaz

--------
Boaz Harrosh (1):
  exofs: don't leak io_state and pages on read error

Idan Kedar (1):
  exofs: clean up the correct page collection on write error

Sachin Bhamare (1):
  osduld: Add osdname & systemid sysfs at scsi_osd class

 drivers/scsi/osd/osd_uld.c | 28 
 fs/exofs/inode.c   | 16 +---
 2 files changed, 37 insertions(+), 7 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [osd-dev] [PATCH] osduld: Add osdname & systemid sysfs at scsi_osd class

2012-11-05 Thread Boaz Harrosh
On 10/24/2012 02:55 PM, Boaz Harrosh wrote:
> On 10/24/2012 02:51 PM, Boaz Harrosh wrote:
>>
>> This patch adds the support for the following two read-only sysfs attributes
>> to scsi_osd class members : osdname & systemid
>>
>> These attributes will show up as below in sysfs class hierarchy:
>> /sys/class/scsi_osd/osdX/osdname
>> /sys/class/scsi_osd/osdX/systemid
>>
>> The osdname & systemid are OSD device attributes which uniquely
>> identify a device on the network, while it's IP and certainly
>> it's /dev/osdX device path might change.
>> Userspace utilities (e.g. mkfs.exofs) can parse these attributes to
>> identify the correct OSD in safer and faster way.
>>
>> (Today osd apps open each device in the system and send a
>>  attributes query for these, in order to access the user
>>  requested device)
>>
>> Signed-off-by: Sachin Bhamare 
>> Signed-off-by: Boaz Harrosh 
> 
> James hi.
> 
> This is for the next (v3.8) merge window. Please submit to scsi-misc tree.
> 

James ping!

Do you want that I push it through my osd tree?

Thanks
Boaz

> It is actually a very old well tested, but forgotten patch.
> 
> Thanks
> Boaz
> 
>> ---
>>  drivers/scsi/osd/osd_uld.c |   28 
>>  1 files changed, 28 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
>> index d4ed9eb..4375417 100644
>> --- a/drivers/scsi/osd/osd_uld.c
>> +++ b/drivers/scsi/osd/osd_uld.c
>> @@ -97,9 +97,37 @@ struct osd_dev_handle {
>>  
>>  static DEFINE_IDA(osd_minor_ida);
>>  
>> +/*
>> + * scsi sysfs attribute operations
>> + */
>> +static ssize_t osdname_show(struct device *dev, struct device_attribute 
>> *attr,
>> +char *buf)
>> +{
>> +struct osd_uld_device *ould = container_of(dev, struct osd_uld_device,
>> +   class_dev);
>> +return sprintf(buf, "%s\n", ould->odi.osdname);
>> +}
>> +
>> +static ssize_t systemid_show(struct device *dev, struct device_attribute 
>> *attr,
>> +char *buf)
>> +{
>> +struct osd_uld_device *ould = container_of(dev, struct osd_uld_device,
>> +   class_dev);
>> +
>> +memcpy(buf, ould->odi.systemid, ould->odi.systemid_len);
>> +return ould->odi.systemid_len;
>> +}
>> +
>> +static struct device_attribute osd_uld_attrs[] = {
>> +__ATTR(osdname, S_IRUGO, osdname_show, NULL),
>> +__ATTR(systemid, S_IRUGO, systemid_show, NULL),
>> +__ATTR_NULL,
>> +};
>> +
>>  static struct class osd_uld_class = {
>>  .owner  = THIS_MODULE,
>>  .name   = "scsi_osd",
>> +.dev_attrs  = osd_uld_attrs,
>>  };
>>  
>>  /*
>>
> 
> 
> ___
> osd-dev mailing list
> osd-...@open-osd.org
> http://mailman.open-osd.org/mailman/listinfo/osd-dev
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [osd-dev] [PATCH] osduld: Add osdname & systemid sysfs at scsi_osd class

2012-10-24 Thread Boaz Harrosh
On 10/24/2012 02:51 PM, Boaz Harrosh wrote:
> 
> This patch adds the support for the following two read-only sysfs attributes
> to scsi_osd class members : osdname & systemid
> 
> These attributes will show up as below in sysfs class hierarchy:
> /sys/class/scsi_osd/osdX/osdname
> /sys/class/scsi_osd/osdX/systemid
> 
> The osdname & systemid are OSD device attributes which uniquely
> identify a device on the network, while it's IP and certainly
> it's /dev/osdX device path might change.
> Userspace utilities (e.g. mkfs.exofs) can parse these attributes to
> identify the correct OSD in safer and faster way.
> 
> (Today osd apps open each device in the system and send a
>  attributes query for these, in order to access the user
>  requested device)
> 
> Signed-off-by: Sachin Bhamare 
> Signed-off-by: Boaz Harrosh 

James hi.

This is for the next (v3.8) merge window. Please submit to scsi-misc tree.

It is actually a very old well tested, but forgotten patch.

Thanks
Boaz

> ---
>  drivers/scsi/osd/osd_uld.c |   28 
>  1 files changed, 28 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
> index d4ed9eb..4375417 100644
> --- a/drivers/scsi/osd/osd_uld.c
> +++ b/drivers/scsi/osd/osd_uld.c
> @@ -97,9 +97,37 @@ struct osd_dev_handle {
>  
>  static DEFINE_IDA(osd_minor_ida);
>  
> +/*
> + * scsi sysfs attribute operations
> + */
> +static ssize_t osdname_show(struct device *dev, struct device_attribute 
> *attr,
> + char *buf)
> +{
> + struct osd_uld_device *ould = container_of(dev, struct osd_uld_device,
> +class_dev);
> + return sprintf(buf, "%s\n", ould->odi.osdname);
> +}
> +
> +static ssize_t systemid_show(struct device *dev, struct device_attribute 
> *attr,
> + char *buf)
> +{
> + struct osd_uld_device *ould = container_of(dev, struct osd_uld_device,
> +class_dev);
> +
> + memcpy(buf, ould->odi.systemid, ould->odi.systemid_len);
> + return ould->odi.systemid_len;
> +}
> +
> +static struct device_attribute osd_uld_attrs[] = {
> + __ATTR(osdname, S_IRUGO, osdname_show, NULL),
> + __ATTR(systemid, S_IRUGO, systemid_show, NULL),
> + __ATTR_NULL,
> +};
> +
>  static struct class osd_uld_class = {
>   .owner  = THIS_MODULE,
>   .name   = "scsi_osd",
> + .dev_attrs  = osd_uld_attrs,
>  };
>  
>  /*
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] osduld: Add osdname & systemid sysfs at scsi_osd class

2012-10-24 Thread Boaz Harrosh

This patch adds the support for the following two read-only sysfs attributes
to scsi_osd class members : osdname & systemid

These attributes will show up as below in sysfs class hierarchy:
/sys/class/scsi_osd/osdX/osdname
/sys/class/scsi_osd/osdX/systemid

The osdname & systemid are OSD device attributes which uniquely
identify a device on the network, while it's IP and certainly
it's /dev/osdX device path might change.
Userspace utilities (e.g. mkfs.exofs) can parse these attributes to
identify the correct OSD in safer and faster way.

(Today osd apps open each device in the system and send a
 attributes query for these, in order to access the user
 requested device)

Signed-off-by: Sachin Bhamare 
Signed-off-by: Boaz Harrosh 
---
 drivers/scsi/osd/osd_uld.c |   28 
 1 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/osd/osd_uld.c b/drivers/scsi/osd/osd_uld.c
index d4ed9eb..4375417 100644
--- a/drivers/scsi/osd/osd_uld.c
+++ b/drivers/scsi/osd/osd_uld.c
@@ -97,9 +97,37 @@ struct osd_dev_handle {
 
 static DEFINE_IDA(osd_minor_ida);
 
+/*
+ * scsi sysfs attribute operations
+ */
+static ssize_t osdname_show(struct device *dev, struct device_attribute *attr,
+   char *buf)
+{
+   struct osd_uld_device *ould = container_of(dev, struct osd_uld_device,
+  class_dev);
+   return sprintf(buf, "%s\n", ould->odi.osdname);
+}
+
+static ssize_t systemid_show(struct device *dev, struct device_attribute *attr,
+   char *buf)
+{
+   struct osd_uld_device *ould = container_of(dev, struct osd_uld_device,
+  class_dev);
+
+   memcpy(buf, ould->odi.systemid, ould->odi.systemid_len);
+   return ould->odi.systemid_len;
+}
+
+static struct device_attribute osd_uld_attrs[] = {
+   __ATTR(osdname, S_IRUGO, osdname_show, NULL),
+   __ATTR(systemid, S_IRUGO, systemid_show, NULL),
+   __ATTR_NULL,
+};
+
 static struct class osd_uld_class = {
.owner  = THIS_MODULE,
.name   = "scsi_osd",
+   .dev_attrs  = osd_uld_attrs,
 };
 
 /*
-- 
1.7.6.5

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] target: Remove unused se_cmd.cmd_spdtl

2012-08-17 Thread Boaz Harrosh
We should switch this topic to the scsi mailing list

On 08/17/2012 01:49 PM, Boaz Harrosh wrote:

> On 08/17/2012 01:12 AM, Nicholas A. Bellinger wrote:
> 
>> On Thu, 2012-08-16 at 09:24 -0700, Roland Dreier wrote:
>>> Actually I'm not sure removing cmd_spdtl is the right answer.
>>>
>>> If /dev/sda is a device on an iSCSI initiator exported by an LIO
>>> target, try doing:
>>>
>>> # sg_raw -r512 /dev/sda 28 0 0 0 0 0 0 0 2 0
>>>
>>> This issues a READ (10) for 2 sectors, but only sends a length of 512
>>> at the transfer level.
>>>
>>> The target responds by setting the residual to 512 but transmits all
>>> 1024 bytes, 
> 
> 
> Is this the correct behavior from the Target? I would imagine that the
> target needs to only send 512 bytes (transfer level size) and set the
> OVERFLOW bit and residual to 512.
> 
> Not that it really matter because as you stated below the Initiator makes
> sure nothing bad happens.
> 
> BTW what target are we talking about, on the other side of the initiator
> here? (There are two targets in this setup right?)
> 
>>> and the Linux initiator at least rejects it because it
>>> hits:
>>>
>>> if (tcp_task->data_offset + tcp_conn->in.datalen > total_in_length) {
>>> ISCSI_DBG_TCP(conn, "data_offset(%d) + data_len(%d) > "
>>>   "total_length_in(%d)\n", tcp_task->data_offset,
>>>   tcp_conn->in.datalen, total_in_length);
>>> return ISCSI_ERR_DATA_OFFSET;
>>> }
>>>
>>
>> Ok, this is the 'overflow' case when the fabric ->data_length (expected
>> data transfer length of the initiator's buffer) is smaller than the SCSI
>> CDB allocation length or sectors*block_size (attempted transfer length)
>> the target has been asked to process.
>>
>> The following patch which appears to do the right thing from the
>> perspective of the target for overflow, but AFAICT open-iscsi still
>> returns GOOD status w/ no data-in payload (at least via sg_raw) when the
>> iscsi-target is signaling overflow bit in iSCSI Data IN PDU.  Not sure
>> yet why this is the case, but drivers/scsi/libiscsi.c:iscsi_data_in_rsp
>> code:
>>
>>if (rhdr->flags & (ISCSI_FLAG_DATA_UNDERFLOW |
>>ISCSI_FLAG_DATA_OVERFLOW)) {
>> int res_count = be32_to_cpu(rhdr->residual_count);
>>
>> if (res_count > 0 &&
>> (rhdr->flags & ISCSI_FLAG_CMD_OVERFLOW ||
>>  res_count <= scsi_in(sc)->length))
>> scsi_in(sc)->resid = res_count;
>> else
>> sc->result = (DID_BAD_TARGET << 16) | 
>> rhdr->cmd_status;
>> }
>>
> 
> 
> OK I admit I kind of rearranged all this code a few years ago. Guilty ;-) 
> 
> Well now that I look at it again, I think it is totally wrong!!
> The scsi and block layer do not know anything about CMD_OVERFLOW
> If a scsi_in/out(sc)->resid is set it only ever means UNDERFLOW.
> 
> Both scsi and block expects to only do transfer_length - resid.
> This is why you get empty buffer back because the transfer_length=512
> minus resid=512 means zero bytes.
> 
> So the "|| ISCSI_FLAG_CMD_OVERFLOW" case is wrong.
> 
> Now the big question is what to do. Fail the all command, or say
> nothing?
> 
> The correct thing is to teach the middle layers about overflow,
> With some kind of warning level system.
> I was also thinking that we can make resid signed and signal
> an overflow with a negative resid. Now the math of
> transfer_length - resid will become correct since it means
> more, in the case above 512 - (-512) would be our 1024 CDB len.
> 
> For now this code must be fixed. And the command must fail.
> Do you need that I prepare a patch? (Please you do it, I'm
> swamped, it'll take me time)
> There are 3 more places like this.
> 
> BTW did you notice how in the code above we have mixed up the
> use of: ISCSI_FLAG_DATA_OVERFLOW and ISCSI_FLAG_CMD_OVERFLOW?
> That's another bug, here it should be ISCSI_FLAG_DATA_OVERFLOW
> only. The other places with the other flags.
> 
>> appears to be setting resid for non bidi cases correctly, right..? (mnc
>> + boaz CC'ed)
>>
>> How about the following to ensure we restrict overflow to keep the
>> existing (smaller) cmd->data_length assignment, and only re-assign this
>

Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-30 Thread Boaz Harrosh
On 07/30/2012 10:12 AM, Paolo Bonzini wrote:

> Il 30/07/2012 01:50, Rusty Russell ha scritto:
>>> Also, being the first user of chained scatterlist doesn't exactly give
>>> me warm fuzzies.
>>
>> We're far from the first user: they've been in the kernel for well over
>> 7 years.  They were introduced for the block layer, but they tended to
>> ignore private uses of scatterlists like this one.
> 
> Yeah, but sg_chain has no users in drivers, only a private one in
> lib/scatterlist.c.  The internal API could be changed to something else
> and leave virtio-scsi screwed...
> 
>> Yes, we should do this.  But note that this means an iteration, so we
>> might as well combine the loops :)
> 
> I'm really bad at posting pseudo-code, but you can count the number of
> physically-contiguous entries at the beginning of the list only.  So if
> everything is contiguous, you use a single non-indirect buffer and save
> a kmalloc.  If you use indirect buffers, I suspect it's much less
> effective to collapse physically-contiguous entries.  More elaborate
> heuristics do need a loop, though.
> 


[All the below with a grain of salt, from my senile memory]

You must not forget some facts about the scatterlist received here at the LLD.
It has already been DMA mapped and locked by the generic layer.
Which means that the DMA engine has already collapsed physically-contiguous
entries. Those you get here are already unique physically.
(There were bugs in the past, where this was not true, please complain
 if you find them again)

A scatterlist is two different lists taking the same space, but with two
different length.
- One list is the PAGE pointers plus offset && length, which is bigger or
  equal to the 2nd list. The end marker corresponds to this list.

  This list is the input into the DMA engine.

- Second list is the physical DMA addresses list. With their physical-lengths.
  Offset is not needed because it is incorporated in the DMA address.

  This list is the output from the DMA engine.

  The reason 2nd list is shorter is because the DMA engine tries to minimize
  the physical scatter-list entries which is usually a limited HW resource.

  This list might follow chains but it's end is determined by the received
  sg_count from the DMA engine, not by the end marker.

At the time my opinion, and I think Rusty agreed, was that the scatterlist
should be split in two. The input page-ptr list is just the BIO, and the
output of the DMA-engine should just be the physical part of the sg_list,
as a separate parameter. But all this was berried under too much APIs and
the noise was two strong, for any single brave sole.

So I'd just trust blindly the returned sg_count from the DMA engine, it is
already optimized. I THINK

> Paolo


Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-26 Thread Boaz Harrosh
On 07/26/2012 10:23 AM, Paolo Bonzini wrote:

> 
> In the meanwhile, we still have a bug to fix, and we need to choose
> between Sen Wang's v1 (sg_set_page) or v2 (value assignment).  I'm still
> leaning more towards v2, if only because I already tested that one myself.
> 


It's your call, you know what I think ;-)

I Agree none of them are bugs in current code, they will both work
just the same.

> Paolo


Please CC me on the "convert to sg copy-less" patches, It looks interesting

Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-25 Thread Boaz Harrosh
On 07/25/2012 11:06 PM, Paolo Bonzini wrote:

> Il 25/07/2012 21:16, Boaz Harrosh ha scritto:
>> The picture confused me. It looked like the first element is the 
>> virtio_scsi_cmd_req
>> not an sgilist-element that points to the struct's buffer.
>>
>> In that case then yes your plan of making a two-elements fragment that 
>> points to the
>> original scsi-sglist is perfect. All you have to do is that, and all you 
>> have to do
>> at virtio is use the sg_for_each macro and you are done.
>>
>> You don't need any sglist allocation or reshaping. And you can easily support
>> chaining. Looks like order of magnitude more simple then what you do now
> 
> It is.
> 
>> So what is the problem?
> 
> That not all architectures have ARCH_HAS_SG_CHAIN (though all those I
> care about do).  So I need to go through all architectures and make sure
> they use for_each_sg, or at least to change ARCH_HAS_SG_CHAIN to a
> Kconfig define so that dependencies can be expressed properly.
> 


What is actually preventing ARCH_HAS_SG_CHAIN from all these ARCHES?
is that the DMA drivers not using for_each_sg(). Sounds like easy
to fix.

But yes a deep change would convert ARCH_HAS_SG_CHAIN to a Kconfig.

If you want to be lazy, like me, You might just put a BUILD_BUG_ON
in code, requesting the user to disable the driver for this ARCH.

I bet there is more things to do at ARCH to enable virtualization
then just support ARCH_HAS_SG_CHAIN. Be it just another requirement.

If you Document it and make sure current ARCHs are fine, it should
not ever trigger.

>> And BTW you won't need that new __sg_set_page API anymore.
> 
> Kind of.
> 
>sg_init_table(sg, 2);
>sg_set_buf(sg[0], req, sizeof(req));
>sg_chain(sg[1], scsi_out(sc));
> 
> is still a little bit worse than
> 
>__sg_set_buf(sg[0], req, sizeof(req));
>__sg_chain(sg[1], scsi_out(sc));
> 


I believe they are the same, specially for the
on the stack 2 elements array. Actually I think
In both cases you need to at least call sg_init_table()
once after allocation, No?

Your old code with big array copy and re-shaping was
a better example of the need for your new API. Which I agree.

But please for my sake do not call it __sg_chain. Call it
something like sg_chain_not_end(). I hate those "__" which
for god sack means what? 
(A good name is when I don't have to read the code, an "__"
 means "fuck you go read the code")

> Paolo


Thanks
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-25 Thread Boaz Harrosh
On 07/25/2012 08:43 PM, Paolo Bonzini wrote:

> Il 25/07/2012 17:28, Boaz Harrosh ha scritto:
>>> 1) what I get is a scsi_cmnd which contains an N-element scatterlist.
>>>
>>> 2) virtio-scsi has to build the "packet" that is passed to the hardware
>>> (it does not matter that the hardware is virtual).  This packet (per
>>> virtio-scsi spec) has an N+1-element scatterlist, where the first
>>> element is a request descriptor (struct virtio_scsi_cmd_req), and the
>>> others describe the written data.
>>
>> Then "virtio-scsi spec" is crap. It overloads the meaning of
>> "struct scatterlist" of the first element in an array. to be a
>> "struct virtio_scsi_cmd_req".
> 
> What the holy fuck?  The first element simply _points_ to the "struct
> virtio_scsi_cmd_req", just like subsequent elements point to the data.
> 
> And the protocol of the device is _not_ a struct scatterlist[].  The
> virtio _API_ takes that array and converts to a series of physical
> address + offset pairs.
> 
>> Since you need to change the standard to support chaining then
>> it is a good time to fix this.
> 
> Perhaps it is a good time for you to read the virtio spec.  You are
> making a huge confusion between the LLD->virtio interface and the
> virtio->hardware interface.  I'm talking only of the former.
> 
>>> 3) virtio takes care of converting the "packet" from a scatterlist
>>> (which currently must be a flat one) to the hardware representation.
>>> Here a walk is inevitable, so we don't care about this walk.
>>
>> "hardware representation" you mean aio or biovec, what ever the
>> IO submission path uses at the host end?
> 
> No, I mean the way the virtio spec encodes the physical address + offset
> pairs.
> 
> I stopped reading here.
> 


The picture confused me. It looked like the first element is the 
virtio_scsi_cmd_req
not an sgilist-element that points to the struct's buffer.

In that case then yes your plan of making a two-elements fragment that points 
to the
original scsi-sglist is perfect. All you have to do is that, and all you have 
to do
at virtio is use the sg_for_each macro and you are done.

You don't need any sglist allocation or reshaping. And you can easily support
chaining. Looks like order of magnitude more simple then what you do now
So what is the problem?

And BTW you won't need that new __sg_set_page API anymore.

> Paolo


Cheers
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: virtio(-scsi) vs. chained sg_lists (was Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list)

2012-07-25 Thread Boaz Harrosh
On 07/25/2012 05:17 PM, Paolo Bonzini wrote:

> Il 25/07/2012 15:26, Boaz Harrosh ha scritto:
>>>> In SCSI land most LLDs should support chaining just by virtu of using the
>>>> for_each_sg macro. That all it takes. Your code above does support it.
>>>
>>> Yes, it supports it but still has to undo them before passing to virtio.
>>>
>>> What my LLD does is add a request descriptor in front of the scatterlist
>>> that the LLD receives.  I would like to do this with a 2-item
>>> scatterlist: one for the request descriptor, and one which is a chain to
>>> the original scatterlist.  
>>
>> I hate that plan. Why yet override the scatter element yet again with a third
>> union of a "request descriptor"
> 
> I'm not overriding (or did you mean overloading?) anything, and I think
> you're reading too much in my words.
> 
> What I am saying is (for a WRITE command):
> 
> 1) what I get is a scsi_cmnd which contains an N-element scatterlist.
> 
> 2) virtio-scsi has to build the "packet" that is passed to the hardware
> (it does not matter that the hardware is virtual).  This packet (per
> virtio-scsi spec) has an N+1-element scatterlist, where the first
> element is a request descriptor (struct virtio_scsi_cmd_req), and the
> others describe the written data.
> 


Then "virtio-scsi spec" is crap. It overloads the meaning of
"struct scatterlist" of the first element in an array. to be a
"struct virtio_scsi_cmd_req".

Instead of simply defining the protocol as passing a "struct 
virtio_scsi_cmd_req".

The following scatterlist can then be pointed to or followed in the stream.
But the above is just bad programming, regardless of standard or not.

Since you need to change the standard to support chaining then
it is a good time to fix this. (You need to change it because virtio
will now need to decode a guest-pointer at each chain-end to yet a new
guest-memory buffer)

In Practice you will get the same stream. a first packet of a
struct virtio_scsi_cmd_req followed or pointing to an array of
struct scatterlist. Only you don't have that contraption of virtio_scsi_cmd_req
must be the same size as "struct scatterlist" and all that union crap.

What where you guys smoking that day? ;-)

> 3) virtio takes care of converting the "packet" from a scatterlist
> (which currently must be a flat one) to the hardware representation.
> Here a walk is inevitable, so we don't care about this walk.
> 


"hardware representation" you mean aio or biovec, what ever the
IO submission path uses at the host end?

> 4) What I'm doing now: copying (and flattening) the N-element
> scatterlist onto the last elements of an N+1 array that I pass to virtio.
> 
>   _ _ _ _ _ _
>  |_|_|_|_|_|_|  scsi_cmnd scatterlist
> 
>  vvv COPY vvv
> _ _ _ _ _ _ _
>|_|_|_|_|_|_|_|  scatterlist passed to virtio
> |
> virtio_scsi_cmd_req
> 
> Then I hand off the scatterlist to virtio.  virtio walks it and converts
> it to hardware format.
> 


Crap design. All that extra full vector copy. Just for that request
header - virtio_scsi_cmd_req.

Why are they at all related. Why does virtio_scsi_cmd_req need to be
as part of scatterlist and of the same size as the first element?

> 5) What I want to do: create a 2-element scatterlist, the first being
> the request descriptor and the second chaining to scsi_cmnd's N-element
> scatterlist.
> 
>  _ _ _ _ _ _
> |_|_|_|_|_|_|  scsi_cmnd scatterlist
> _ _/
>|_|C|   scatterlist passed to virtio
> |
> virtio_scsi_cmd_req
> 
> Then I hand off the scatterlist to virtio.  virtio will still walk the
> scatterlist chain, and convert it to N+1 elements for the hardware to
> consume.  Still, removing one walk largely reduces the length of my
> critical sections.  I also save some pointer-chasing because the
> 2-element scatterlist are short-lived and can reside on the stack.
> 


Sure this is much better. 

But since you are already changing the two code sites, Here, and at
virtio to support chained scatterlist, why not fix it properly

  _ _ _ _ _ _
 |_|_|_|_|_|_|  scsi_cmnd scatterlist
 _ _/___
|virtio_scsi_cmd_req|virtio_scsi_cmd_req passed to virtio
 

Just a regularly defined header with an embedded pointer to a scatterlist.

And BTW the "scsi_cmnd scatterlist" can now be a chained one and virtio
can support that as well.

> 
> Other details (you can probably skip these):
> 
> There is also a response descriptor.  In the case of writes th

Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list

2012-07-25 Thread Boaz Harrosh
On 07/25/2012 04:36 PM, Paolo Bonzini wrote:

> Il 25/07/2012 15:26, Boaz Harrosh ha scritto:
>> On 07/25/2012 03:49 PM, Paolo Bonzini wrote:
>>
>>>
>>> Except here the destination array has to be given to virtio, which
>>> doesn't (yet) understand chaining.  I'm using for_each_sg rather than a
>>> simple memcpy exactly because I want to flatten the input scatterlist
>>> onto consecutive scatterlist entries, which is what virtio expects (and
>>> what I'll change when I get to it).
>>>
>>> for_each_sg guarantees that I get non-chain scatterlists only, so it is
>>> okay to value-assign them to sg[].
>>
>> So if the virtio does not understand chaining at all then surly it will
>> not understand the 2-bit end marker and will get a wrong page pointer
>> with the 1st bit set.
> 
> It doesn't understand chaining, but it does use sg_phys(x) so it will
> not get a wrong page pointer for the end marker.
> 
>> Fine then your code is now a crash because the terminating bit was just
>> copied over, which it was not before.
> 
> I did test the patch with value-assignment.
> 


Still you should use the sg_set_page()!!
1. It is not allowed to directly manipulate sg entries. One should always
   use the proper accessor. Even if open coding does work and is not a bug
   it should not be used anyway!
2. Future code that will support chaining will need to do as I say so why
   change it then, again?

Please don't change two things in one patch. The fix is for high-pages
please fix only that here. You can blasphemy open-code the sg manipulation
in a separate patch.

Please
Boaz

>> Lets separate the two topics from now on. Send me one mail concerning
>> the proper above patch, And a different mail for how to support chaining.
> 
> Ok, and I'll change the topic.
> 
> Paolo


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list

2012-07-25 Thread Boaz Harrosh
On 07/25/2012 03:49 PM, Paolo Bonzini wrote:

> 
> Except here the destination array has to be given to virtio, which
> doesn't (yet) understand chaining.  I'm using for_each_sg rather than a
> simple memcpy exactly because I want to flatten the input scatterlist
> onto consecutive scatterlist entries, which is what virtio expects (and
> what I'll change when I get to it).
> 
> for_each_sg guarantees that I get non-chain scatterlists only, so it is
> okay to value-assign them to sg[].
> 


So if the virtio does not understand chaining at all then surly it will
not understand the 2-bit end marker and will get a wrong page pointer
with the 1st bit set.

As I said!! Since the last code did sg_set_buff() and worked then you must
change it with sg_set_page().

If there are *any* chaining-or-no-chaining markers errors these should
be fixed as a separate patch!

Please lets concentrate at the problem at hand.



> 
> It was _not_ properly terminated, and didn't matter because virtio
> doesn't care about termination.  Changing all the virtio devices to
> properly terminate chains (and to use for_each_sg) is a prerequisite for
> properly supporting long sglists).
> 


Fine then your code is now a crash because the terminating bit was just
copied over, which it was not before.


Now Back to the how to support chaining:

Lets separate the two topics from now on. Send me one mail concerning
the proper above patch, And a different mail for how to support chaining.

>> In SCSI land most LLDs should support chaining just by virtu of using the
>> for_each_sg macro. That all it takes. Your code above does support it.
> 
> Yes, it supports it but still has to undo them before passing to virtio.
> 
> What my LLD does is add a request descriptor in front of the scatterlist
> that the LLD receives.  I would like to do this with a 2-item
> scatterlist: one for the request descriptor, and one which is a chain to
> the original scatterlist.  


I hate that plan. Why yet override the scatter element yet again with a third
union of a "request descriptor"

The reason it was overloaded as a link-pointer in the first place was because
of historical compatibility reasons and not because of a good design.

You should have a proper "request descriptor" structure defined, pointing to
(or followed by), an sglist-chain. And all of the above is mute.
 

> Except that if I call sg_chain and my
> architecture does not define ARCH_HAS_SG_CHAIN, it will BUG out.  So I
> need to keep all the scatterlist allocation and copying crap that I have
> now within #ifdef, and it will bitrot.
> 


except that with the correct design you don't call sg_chain you just do:
request_descriptor->sg_list = sg;

>>> I would need to add support for the long sglists to virtio; this is not
>>> a problem, but in the past Rusty complained that long sg-lists are not
>>> well suited to virtio (which would like to add elements not just at the
>>> beginning of a given sglist, but also at the end).  
>>
>> Well that can be done as well, (If done carefully) It should be easy to add
>> chained fragments to both the end of a given chain just as at beginning.
>> It only means that the last element of the appended-to chain moves to
>> the next fragment and it's place is replaced by a link.
> 
> But you cannot do that in constant time, can you?  And that means you do
> not enjoy any benefit in terms of cache misses etc.
> 


I did not understand "constant time" it is O(0) if that what you meant.

(And surly today's code of copy the full list "cache misses")

> Also, this assumes that I can modify the appended-to chain.  I'm not
> sure I can do this?
> 


Each case it's own. If the appended-to chain is const, yes you'll have
to reallocate it and copy. Is that your case?

Cheers
Boaz

>>> It seems to me that virtio would prefer to work with a struct
>>> scatterlist ** rather than a long sglist.
>>
>> That's just going backwards, and lazy. As you said if you want to enjoy
>> the better performance cake you better break some eggs ;-)
> 
> :)
> 
> Paolo


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list

2012-07-25 Thread Boaz Harrosh
On 07/25/2012 03:34 PM, Paolo Bonzini wrote:

> Il 25/07/2012 14:13, Wang Sen ha scritto:
>> When using the commands below to write some data to a virtio-scsi LUN of the
>> QEMU guest(32-bit) with 1G physical memory(qemu -m 1024), the qemu will 
>> crash.
>>
>> # sudo mkfs.ext4 /dev/sdb  (/dev/sdb is the virtio-scsi LUN.)
>> # sudo mount /dev/sdb /mnt
>> # dd if=/dev/zero of=/mnt/file bs=1M count=1024
>>
>> In current implementation, sg_set_buf is called to add buffers to sg list 
>> which
>> is put into the virtqueue eventually. But if there are some HighMem pages in
>> table->sgl you can not get virtual address by sg_virt. So, sg_virt(sg_elem) 
>> may
>> return NULL value. This will cause QEMU exit when virtqueue_map_sg is called
>> in QEMU because an invalid GPA is passed by virtqueue.
>>
>> I take Paolo's solution mentioned in last thread to avoid failure on 
>> handling 
>> flag bits.
> 
> Please include an URL or (better) summarize the reason why sg_set_page
> is not correct in the commit message.  For example, replace this
> paragraph with the following:
> 
> "To fix this, we can simply copy the original scatterlist entries into
> virtio-scsi's; we need to copy the entries entirely, including the flag
> bits, so using sg_set_page is not correct".
> 

> Please send v3 with this change and I'll add my Acked-by.
> 


NACK-by: Boaz Harrosh


Apart from the HighMem pages problem, where in previous sg_set_buf()
code was the marker copied? It was not because it is not needed because
the allocation of sg took care of that. For example in 64bit the is no
bugs, right?

If there was a destination sg_list termination bug, it should be fixed
as a separate patch from this "HighMem pages problem". But I bet if
you inspect the code carefully there isn't such a bug.

Cheers
Boaz

> Paolo
> 
>>
>> I have tested the patch on my workstation. QEMU would not crash any more.
>>
>> Signed-off-by: Wang Sen 
>> ---
>>  drivers/scsi/virtio_scsi.c |2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
>> index 1b38431..6661610 100644
>> --- a/drivers/scsi/virtio_scsi.c
>> +++ b/drivers/scsi/virtio_scsi.c
>> @@ -198,7 +198,7 @@ static void virtscsi_map_sgl(struct scatterlist *sg, 
>> unsigned int *p_idx,
>>  int i;
>>  
>>  for_each_sg(table->sgl, sg_elem, table->nents, i)
>> -sg_set_buf(&sg[idx++], sg_virt(sg_elem), sg_elem->length);
>> +sg[idx++] = *sg_elem;
>>  
>>  *p_idx = idx;
>>  }
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list

2012-07-25 Thread Boaz Harrosh
On 07/25/2012 02:44 PM, Sen Wang wrote:

> 2012/7/25 Paolo Bonzini :
>> Il 25/07/2012 10:29, Wang Sen ha scritto:
>>> When using the commands below to write some data to a virtio-scsi LUN of the
>>> QEMU guest(32-bit) with 1G physical memory(qemu -m 1024), the qemu will 
>>> crash.
>>>
>>>   # sudo mkfs.ext4 /dev/sdb  (/dev/sdb is the virtio-scsi LUN.)
>>>   # sudo mount /dev/sdb /mnt
>>>   # dd if=/dev/zero of=/mnt/file bs=1M count=1024
>>>
>>> In current implementation, sg_set_buf is called to add buffers to sg list 
>>> which
>>> is put into the virtqueue eventually. But there are some HighMem pages in
>>> table->sgl can not get virtual address by sg_virt. So, sg_virt(sg_elem) may
>>> return NULL value. This will cause QEMU exit when virtqueue_map_sg is called
>>> in QEMU because an invalid GPA is passed by virtqueue.
>>
>> Heh, I was compiling (almost) the same patch as we speak. :)
> 
> Uh, what a coincidence! :)
> 
>>
>> I've never seen QEMU crash; the VM would more likely just fail to boot
>> with a panic.  But it's the same bug anyway.
> 
> I never met this before. How this situation happens?
> 
>>
>>> My solution is using sg_set_page instead of sg_set_buf.
>>>
>>> I have tested the patch on my workstation. QEMU would not crash any more.
>>>
>>> Signed-off-by: Wang Sen 
>>> ---
>>>  drivers/scsi/virtio_scsi.c |3 ++-
>>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
>>> index 1b38431..fc5c88a 100644
>>> --- a/drivers/scsi/virtio_scsi.c
>>> +++ b/drivers/scsi/virtio_scsi.c
>>> @@ -198,7 +198,8 @@ static void virtscsi_map_sgl(struct scatterlist *sg, 
>>> unsigned int *p_idx,
>>>   int i;
>>>
>>>   for_each_sg(table->sgl, sg_elem, table->nents, i)
>>> - sg_set_buf(&sg[idx++], sg_virt(sg_elem), sg_elem->length);
>>> + sg_set_page(&sg[idx++], sg_page(sg_elem), sg_elem->length,
>>> + sg_elem->offset);
>>
>> This can simply be
>>
>>sg[idx++] = *sg_elem;
>>
> 
> Yes, I saw your another E-mail. I think you're right. Simply calling
> sg_set_page can not handle
> the flag bits correctly. So, I'll repost the patch soon. Thank you!
> 


No this code is correct, though you will need to make sure to properly
terminate the destination sg_list.

But since old code was using sg_set_buf(), than it means it was properly
terminated before, and there for this code is good as is please don't
touch it.

Thanks
Boaz

>> Can you repost it with this change, and also add sta...@vger.kernel.org
>> to the Cc?  Thanks very much!
>>
>> Paolo
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
>> the body of a message to majord...@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list

2012-07-25 Thread Boaz Harrosh
On 07/25/2012 12:41 PM, Paolo Bonzini wrote:

> Il 25/07/2012 11:22, Boaz Harrosh ha scritto:
>>>>>>  for_each_sg(table->sgl, sg_elem, table->nents, i)
>>>>>> -sg_set_buf(&sg[idx++], sg_virt(sg_elem), 
>>>>>> sg_elem->length);
>>>>>> +sg_set_page(&sg[idx++], sg_page(sg_elem), 
>>>>>> sg_elem->length,
>>>>>> +sg_elem->offset);
>>>>
>>>> This can simply be
>>>>
>>>>sg[idx++] = *sg_elem;
>>>>
>>>> Can you repost it with this change, and also add sta...@vger.kernel.org
>>>> to the Cc?  Thanks very much!
>>>>
>>
>> No! Please use sg_set_page()! Look at sg_set_page(), which calls 
>> sg_assign_page().
>> It has all these jump over chained arrays. When you'll start using long
>> sg_lists (which you should) then jumping from chain to chain must go through
>> sg_page(sg_elem) && sg_assign_page(), As in the original patch.
> 
> Hi Boaz,
> 
> actually it seems to me that using sg_set_page is wrong, because it will
> not copy the end marker from table->sgl to sg[].  If something chained
> the sg[] scatterlist onto something else, sg_next's test for sg_is_last
> would go beyond the table->nents-th item and access invalid memory.
> 


Yes, you did not understand this structure. And Yes I am right, when
using chained lists you *must* use sg_set_page().

You see the chaining belongs to the allocation not the value of the
sg-elements. One must not copy the chaining marker to the destination
array which might have it's own. And one must not crap all over the
destination chaining markers, set at allocation time.

The sizes and mostly the pointers of source and destination are not
the same.

> Using chained sglists is on my to-do list, I expect that it would make a
> nice performance improvement.  However, I was a bit confused as to
> what's the plan there; there is hardly any user, and many arches still
> do not define ARCH_HAS_SG_CHAIN.  Do you have any pointer to discussions
> or LWN articles?
> 


Only the source code I'm afraid.

In SCSI land most LLDs should support chaining just by virtu of using the
for_each_sg macro. That all it takes. Your code above does support it.
(In Wang version).

Though more code need probably be added at sg allocation to actually
allocate and prepare a chain.

> I would need to add support for the long sglists to virtio; this is not
> a problem, but in the past Rusty complained that long sg-lists are not
> well suited to virtio (which would like to add elements not just at the
> beginning of a given sglist, but also at the end).  


Well that can be done as well, (If done carefully) It should be easy to add
chained fragments to both the end of a given chain just as at beginning.

It only means that the last element of the appended-to chain moves to
the next fragment and it's place is replaced by a link.

If you have ready made two long segments A and C which you would like not
to reallocate and copy, you insert a two-elements segment in the middle, say
call it B.

The first element of B is the last element of A which is now used as the pointer
to B, and the 2nd element of B is a pointer to C.

> It seems to me that
> virtio would prefer to work with a struct scatterlist ** rather than a
> long sglist.
> 


That's just going backwards, and lazy. As you said if you want to enjoy
the better performance cake you better break some eggs ;-)

> Paolo


Cheers
Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] scsi: virtio-scsi: Fix address translation failure of HighMem pages used by sg list

2012-07-25 Thread Boaz Harrosh
On 07/25/2012 11:44 AM, Paolo Bonzini wrote:

> Il 25/07/2012 10:29, Wang Sen ha scritto:
>> When using the commands below to write some data to a virtio-scsi LUN of the 
>> QEMU guest(32-bit) with 1G physical memory(qemu -m 1024), the qemu will 
>> crash.
>>
>>  # sudo mkfs.ext4 /dev/sdb  (/dev/sdb is the virtio-scsi LUN.)
>>  # sudo mount /dev/sdb /mnt
>>  # dd if=/dev/zero of=/mnt/file bs=1M count=1024
>>
>> In current implementation, sg_set_buf is called to add buffers to sg list 
>> which
>> is put into the virtqueue eventually. But there are some HighMem pages in 
>> table->sgl can not get virtual address by sg_virt. So, sg_virt(sg_elem) may
>> return NULL value. This will cause QEMU exit when virtqueue_map_sg is called 
>> in QEMU because an invalid GPA is passed by virtqueue.
> 
> Heh, I was compiling (almost) the same patch as we speak. :)
> 
> I've never seen QEMU crash; the VM would more likely just fail to boot
> with a panic.  But it's the same bug anyway.
> 
>> My solution is using sg_set_page instead of sg_set_buf.
>>
>> I have tested the patch on my workstation. QEMU would not crash any more.
>>
>> Signed-off-by: Wang Sen 
>> ---
>>  drivers/scsi/virtio_scsi.c |3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/scsi/virtio_scsi.c b/drivers/scsi/virtio_scsi.c
>> index 1b38431..fc5c88a 100644
>> --- a/drivers/scsi/virtio_scsi.c
>> +++ b/drivers/scsi/virtio_scsi.c
>> @@ -198,7 +198,8 @@ static void virtscsi_map_sgl(struct scatterlist *sg, 
>> unsigned int *p_idx,
>>  int i;
>>  
>>  for_each_sg(table->sgl, sg_elem, table->nents, i)
>> -sg_set_buf(&sg[idx++], sg_virt(sg_elem), sg_elem->length);
>> +sg_set_page(&sg[idx++], sg_page(sg_elem), sg_elem->length,
>> +sg_elem->offset);
> 
> This can simply be
> 
>sg[idx++] = *sg_elem;
> 
> Can you repost it with this change, and also add sta...@vger.kernel.org
> to the Cc?  Thanks very much!
> 


No! Please use sg_set_page()! Look at sg_set_page(), which calls 
sg_assign_page().
It has all these jump over chained arrays. When you'll start using long
sg_lists (which you should) then jumping from chain to chain must go through
sg_page(sg_elem) && sg_assign_page(), As in the original patch.

Thanks
Boaz

> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3 ver2] iscsi bidi & varlen support

2008-02-18 Thread Boaz Harrosh
On Mon, Feb 18 2008 at 19:22 +0200, James Bottomley <[EMAIL PROTECTED]> wrote:
> On Mon, 2008-02-18 at 17:08 +0200, Boaz Harrosh wrote:
>> But ... James? is
>> there any chance these can go into scsi-rc-fixes for the 2.6.25
>> kernel? The reason they are so late was mainly because of a fallout
>> in the merge process and a bug that was introduced because of that,
>> but they were intended to go together with bidi into 2.6.25. Also
>> as an important client code to the bidi-api that is introduced in
>> 2.6.25 kernel.
> 
> Well, I think you know the answer to that one under Linus' rules for non
> merge window submission.  It's not a bug fix; we haven't even put it
> into -mm for testing and it's a pretty invasive change.
> 
> James
> 
> 

It was extensively tested by all iscsi people. It has the Sign-off-by
of the iscsi maintainer. They are not new patches.

But, yes you are right. I now remember the trouble we had with Linus
last time. So it's 2.6.26 then. :-( . People that need it for 2.6.25
will just get it off the git tree.

Boaz

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] iscsi bidi & varlen support

2008-02-18 Thread Boaz Harrosh
On Tue, Feb 12 2008 at 22:17 +0200, Pete Wyckoff <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote on Tue, 12 Feb 2008 15:12 -0500:
>> [EMAIL PROTECTED] wrote on Thu, 31 Jan 2008 20:08 +0200:
>>> Cheers after 1.3 years these can go in.
>>>
>>> [PATCH 1/3] iscsi: extended cdb support
>>>The varlen support is not yet in mainline for
>>>   block and scsi-ml. But the API for drivers will
>>>   not change. All LLD need to do is max_command to
>>>   the it's maximum and be ready for bigger commands.
>>>   This is what's done here. Once these commands start
>>>   coming iscsi will be ready for them.
>>>
>>> [PATCH 2/3] iscsi: bidi support - libiscsi
>>> [PATCH 3/3] iscsi: bidi support - iscsi_tcp
>>>   bidirectional commands support in iscsi.
>>>   iSER is not yet ready, but it will not break.
>>>   There is already a mechanism in libiscsi that will
>>>   return error if bidi commands are sent iSER way.
>>>
>>> Pete please send me the iSER bits so we can port them
>>> to this latest version.
>>>
>>> Mike these patches are ontop of iscs branch of the iscsi
>>> git tree, they will apply but for compilation you will need
>>> to sync with Linus mainline. The patches are for the in-tree
>>> iscsi code. I own you the compat patch for the out-off-tree
>>> code, but this I will only be Sunday.
>> Here's the patch to add bidi support to iSER too.  It works
>> with my setup, but could use more testing.  Note that this does
>> rely on the 3 patches quoted above.
> 
> Similar, for varlen support to iSER.  Probably apply this one before
> the bidi one I just sent.
> 
>   -- Pete
> 
> 
> From: Pete Wyckoff <[EMAIL PROTECTED]>
> Subject: [PATCH] iscsi iser: varlen
> 
> Handle variable-length CDBs in iSER.
> 
> Signed-off-by: Pete Wyckoff <[EMAIL PROTECTED]>
> ---
>  drivers/infiniband/ulp/iser/iscsi_iser.c |5 +++--
>  drivers/infiniband/ulp/iser/iscsi_iser.h |2 +-
>  drivers/infiniband/ulp/iser/iser_initiator.c |   16 ++--
>  3 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.c 
> b/drivers/infiniband/ulp/iser/iscsi_iser.c
> index 5f2284d..9dfc310 100644
> --- a/drivers/infiniband/ulp/iser/iscsi_iser.c
> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.c
> @@ -401,7 +401,8 @@ iscsi_iser_session_create(struct iscsi_transport *iscsit,
>   ctask  = session->cmds[i];
>   iser_ctask = ctask->dd_data;
>   ctask->hdr = (struct iscsi_cmd *)&iser_ctask->desc.iscsi_header;
> - ctask->hdr_max = sizeof(iser_ctask->desc.iscsi_header);
> + ctask->hdr_max = sizeof(iser_ctask->desc.iscsi_header) +
> +  sizeof(iser_ctask->desc.hdrextbuf);
>   }
>  
>   for (i = 0; i < session->mgmtpool_max; i++) {
> @@ -604,7 +605,7 @@ static struct iscsi_transport iscsi_iser_transport = {
>   .host_template  = &iscsi_iser_sht,
>   .conndata_size  = sizeof(struct iscsi_conn),
>   .max_lun= ISCSI_ISER_MAX_LUN,
> - .max_cmd_len= ISCSI_ISER_MAX_CMD_LEN,
> + .max_cmd_len= 260,

Same bug I had. .max_cmd_len is still char, before the varlen patch to scsi-ml.
So it must be at most 252, Until that patch is introduced and it can return to
the correct 260 or better yet SCSI_MAX_VARLEN_CDB_SIZE. That also is only 
defined in the scsi-ml varlen patch.

>   /* session management */
>   .create_session = iscsi_iser_session_create,
>   .destroy_session= iscsi_session_teardown,
> diff --git a/drivers/infiniband/ulp/iser/iscsi_iser.h 
> b/drivers/infiniband/ulp/iser/iscsi_iser.h
> index db8f81a..66905df 100644
> --- a/drivers/infiniband/ulp/iser/iscsi_iser.h
> +++ b/drivers/infiniband/ulp/iser/iscsi_iser.h
> @@ -90,7 +90,6 @@
>  /* FMR space for 1 MB of 4k-page transfers, plus 1 if not page aligned */
>  #define ISCSI_ISER_SG_TABLESIZE (((1<<20) >> SHIFT_4K) + 1)
>  #define ISCSI_ISER_MAX_LUN   256
> -#define ISCSI_ISER_MAX_CMD_LEN   16
>  
>  /* QP settings */
>  /* Maximal bounds on received asynchronous PDUs */
> @@ -217,6 +216,7 @@ enum iser_desc_type {
>  struct iser_desc {
>   struct iser_hdr  iser_header;
>   struct iscsi_hdr iscsi_header;
> + char hdrextbuf[ISCSI_MAX_AHS_SIZE];
>   struct iser_regd_buf hdr_regd_buf;
>   void *data; /* used by RX & TX_CONTROL 
> */
>   struct iser_regd_buf data_regd_buf; /* used by RX & TX_CONTROL 
> */
> diff --git a/drivers/infiniband/ulp/iser/iser_initiator.c 
> b/drivers/infiniband/ulp/iser/iser_initiator.c
> index 83247f1..ea3f5dc 100644
> --- a/drivers/infiniband/ulp/iser/iser_initiator.c
> +++ b/drivers/infiniband/ulp/iser/iser_initiator.c
> @@ -246,9 +246,13 @@ post_rx_kmalloc_failure:
>   return err;
>  }
>  
> -/* creates a new tx descriptor and adds header regd buffer */
> +/**
> 

[PATCH 3/3 ver2] iscsi: bidi support - iscsi_tcp

2008-02-18 Thread Boaz Harrosh

  bidi support for iscsi_tcp
  - access the right scsi_in() and/or scsi_out() side of things.
also for resid

Signed-off-by: Boaz Harrosh <[EMAIL PROTECTED]>
Reviewed-by: Pete Wyckoff <[EMAIL PROTECTED]>
Signed-off-by: Mike Christie <[EMAIL PROTECTED]>
---
 drivers/scsi/iscsi_tcp.c |   31 +--
 1 files changed, 17 insertions(+), 14 deletions(-)

diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c
index 8a17867..72b9b2a 100644
--- a/drivers/scsi/iscsi_tcp.c
+++ b/drivers/scsi/iscsi_tcp.c
@@ -528,6 +528,7 @@ iscsi_data_rsp(struct iscsi_conn *conn, struct 
iscsi_cmd_task *ctask)
struct iscsi_session *session = conn->session;
struct scsi_cmnd *sc = ctask->sc;
int datasn = be32_to_cpu(rhdr->datasn);
+   unsigned total_in_length = scsi_in(sc)->length;
 
iscsi_update_cmdsn(session, (struct iscsi_nopin*)rhdr);
if (tcp_conn->in.datalen == 0)
@@ -542,10 +543,10 @@ iscsi_data_rsp(struct iscsi_conn *conn, struct 
iscsi_cmd_task *ctask)
tcp_ctask->exp_datasn++;
 
tcp_ctask->data_offset = be32_to_cpu(rhdr->offset);
-   if (tcp_ctask->data_offset + tcp_conn->in.datalen > scsi_bufflen(sc)) {
+   if (tcp_ctask->data_offset + tcp_conn->in.datalen > total_in_length) {
debug_tcp("%s: data_offset(%d) + data_len(%d) > 
total_length_in(%d)\n",
  __FUNCTION__, tcp_ctask->data_offset,
- tcp_conn->in.datalen, scsi_bufflen(sc));
+ tcp_conn->in.datalen, total_in_length);
return ISCSI_ERR_DATA_OFFSET;
}
 
@@ -558,8 +559,8 @@ iscsi_data_rsp(struct iscsi_conn *conn, struct 
iscsi_cmd_task *ctask)
 
if (res_count > 0 &&
(rhdr->flags & ISCSI_FLAG_CMD_OVERFLOW ||
-res_count <= scsi_bufflen(sc)))
-   scsi_set_resid(sc, res_count);
+res_count <= total_in_length))
+   scsi_in(sc)->resid = res_count;
else
sc->result = (DID_BAD_TARGET << 16) |
rhdr->cmd_status;
@@ -670,11 +671,11 @@ iscsi_r2t_rsp(struct iscsi_conn *conn, struct 
iscsi_cmd_task *ctask)
r2t->data_length, session->max_burst);
 
r2t->data_offset = be32_to_cpu(rhdr->data_offset);
-   if (r2t->data_offset + r2t->data_length > scsi_bufflen(ctask->sc)) {
+   if (r2t->data_offset + r2t->data_length > scsi_out(ctask->sc)->length) {
iscsi_conn_printk(KERN_ERR, conn,
  "invalid R2T with data len %u at offset %u "
  "and total length %d\n", r2t->data_length,
- r2t->data_offset, scsi_bufflen(ctask->sc));
+ r2t->data_offset, 
scsi_out(ctask->sc)->length);
__kfifo_put(tcp_ctask->r2tpool.queue, (void*)&r2t,
sizeof(void*));
return ISCSI_ERR_DATALEN;
@@ -771,6 +772,7 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct 
iscsi_hdr *hdr)
if (tcp_conn->in.datalen) {
struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
struct hash_desc *rx_hash = NULL;
+   struct scsi_data_buffer *sdb = scsi_in(ctask->sc);
 
/*
 * Setup copy of Data-In into the Scsi_Cmnd
@@ -788,8 +790,8 @@ iscsi_tcp_hdr_dissect(struct iscsi_conn *conn, struct 
iscsi_hdr *hdr)
  tcp_ctask->data_offset,
  tcp_conn->in.datalen);
return iscsi_segment_seek_sg(&tcp_conn->in.segment,
-scsi_sglist(ctask->sc),
-scsi_sg_count(ctask->sc),
+sdb->table.sgl,
+sdb->table.nents,
 tcp_ctask->data_offset,
 tcp_conn->in.datalen,
 iscsi_tcp_process_data_in,
@@ -1332,7 +1334,8 @@ iscsi_tcp_ctask_init(struct iscsi_cmd_task *ctask)
return 0;
 
/* If we have immediate data, attach a payload */
-   err = iscsi_tcp_send_data_prep(conn, scsi_sglist(sc), scsi_sg_count(sc),
+   err = iscsi_tcp_send_data_prep(conn, scsi_out(sc)->table.sgl,
+  s

  1   2   3   4   5   6   7   >