Hi Mike,
Thank you for your help again. Following the guidance in your reply, I
traced the kernel code a bit more and eventually found out a possible
path for open-iscsi to get 4K pages in the scatterlist.
Kernel version: 2.6.30
open-iscsi version: 2.0.871
Trace 1: SCSI from a write request to data written into pages
--> function pointer
sg_fops.write --> sg_write()
/ \
sg_common_write()<- sg_new_write()
|
sg_start_req()
|
blk_rq_map_user_iov()------------------
/ (first) \ |
__bio_map_user_iov() __bio_copy_user_iov() |
\ / |
bio_add_pc_page() (then)|
| |
__bio_add_page() |
|
-----------------------------
|
|
blk_rq_bio_prep()
blk_rq_map_user_iov
__bio_copy_user_iov first creates new memory pages for the incoming
data, and then calls bio_add_pc_page() (and in turn __bio_add_page) to
insert the pages created into a structure named bio, which stands for
block IO. And then it calls __bio_copy_iov() to copy the user data
into the those pages. __bio_map_user_iov(), unlike
__bio_copy_user_iov, calls bio_add_pc_page() (and in turn
__bio_add_page) to directly map the user pages into the structure bio
without any duplication. After the structure bio is filled with proper
data, blk_rq_bio_prep() is called to associate the struture bio with
the write request.
In __bio_add_page(), we see "bvec->bv_page = page" and "bvec->bv_len
= len". In the context of the above function calles, len should
(mostly) be PAGE_SIZE, which is 4096 on a x86 32 bit machine. Now we
know how user data is arranged into 4K size pages.
Trace 2: From a request dequeue to data read out of SCSI buffer
elv_next_request()
| (through funcdtion pointer q->prep_rq_fn)
sr_prep_fn()
|
scsi_setup_blk_pc_cmnd()
|
scsi_init_io()
|
scsi_init_sgtable
| (using (req->q, req, sdb->table.sgl))
blk_rq_map_sg(struct request_queue *q, struct request *rq, struct
scatterlist *sglist)
In blk_rq_map_sg(), the pages saved in the structure bio, which is
part of the structure request, are mapped to the parameter sglist,
which is the scatterlist in the structure scsi_data_buffer (task-
>sc.sdb.table.sgl in open-iscsi code). Also, we can see "nbytes = bvec-
>bv_len" and "sg_set_page(sg, bvec->bv_page, nbytes, bvec-
>bv_offset)" (Please note that this part takes open-iscsi option
".use_clustering = DISABLE_CLUSTERING" into consideration). The latter
will set sc.sdb.table.sgl->length to nbytes, which is bvec->bv_len.
>From the first part we know that bvec->bv_len is PAGE_SIZE. Now we see
why the size of the elements in the scatterlist used in open-iscsi is
4096, which is the PAGE_SIZE on x86-32 machines.
And in iscsi_tcp.c, we can have "r = tcp_sw_conn->sendpage(sk,
sg_page(sg), offset, copy, flags)". Since sg_page(sg) returns one page
in the scatterlist, it explains why open-iscsi tries to send 4096
bytes at one time on x86-32 machines.
On May 5, 10:47 am, Mike Christie <[email protected]> wrote:
> On 05/03/2010 06:51 AM, Jack Z wrote:
>
>
>
>
>
> > Hi group,
>
> > I have been tracing the code related to sending PDUs from iscsi
> > initiator (ver 2.0-871).
>
> > And through some printk()s i realize that starting from
> > iscsi_sw_tcp_pdu_init(), all the functions using scatterlist (struct
> > scatterlist *sg) seem to use 4096 as the length (sg->length).
>
> > But I was not able to trace down where this 4096 is initially assigned
> > to sg->length... I searched through the code for "4096" and only two
> > spots came up: ".sg_tablesize = 4096" in struct scsi_host_template
> > iscsi_sw_tcp_sht and "#define ISCSI_TOTAL_CMDS_MAX 4096". But changing
> > these two values did not affect the sg->length value, which was still
> > 4096.
>
> > I was guessing this 4096 had something to do with the fs block size
> > and this value was somewhat from "struct scsi_data_buffer *sdb =
> > scsi_out(task->sc);" in iscsi_sw_tcp_pdu_init()... but still don't
> > have a clue about how and why iscsi initiator gets this value as the
> > length for the scatterlist...
>
> > Could anyone maybe explain a bit or point me to some relevant
> > document?
>
> The fs/block layer is going to send down some struct called a bio, which
> has a mapping of pages to some sectors to read/write. The block layer's
> elevator code is then going to try and make large IO requests by merging
> bios. So if there was a bio to read sectors 0 - 7 into page0 and a bio
> to read sectors 8 - 15 into page1, then they would be merged into the
> same request to read sector 0 - 15.
>
> At some point this request is then sent to the scsi layer, which will
> use some block layer helper to create a scatterlist from the pages in
> the requests's bios. The sg->page pointer points to the first page in a
> group of pages that are contiguous in memory, and sg->length is then the
> total length in bytes of all those pages. So in my example, if the 2
> pages in each bio were next to each other then they could be merged into
> 1 sg entry. This does not happen for iscsi_tcp though. In your case, you
> see sg->lenth as at most 4096 because iscsi_tcp only supports 1 page per
> sg entry and the page size on your arch is PAGE_SIZE=4096 (we set the
> scsi_host_template->use_clustering flag to indicate that we only want
> one page per sg entry btw).
>
> Next is where sg_tablesize comes into play. Here, we are setting it to
> 4096 to indicate that at most we want 4096 entries on that scatterlist
> that is made (4096 being the page size and sg_tablesize is just a
> coincidence). So for us in your setup we can have at most 4096 sg
> entries, with each entry having 4096 bytes. We could actually have a
> smaller sg list, because there are other settings that limit the size of
> the request like the sht->max_sectors.
>
> --
> You received this message because you are subscribed to the Google Groups
> "open-iscsi" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group
> athttp://groups.google.com/group/open-iscsi?hl=en.
--
You received this message because you are subscribed to the Google Groups
"open-iscsi" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/open-iscsi?hl=en.