Hi Mike, Just got two more questions...
First one is about the scsi_host_template->use_clustering flag, is there any specific reason why it is disabled in open-iscsi? Second one is the way we use sendpage() in iscsi_tcp.c. I was wondering can we maybe send more than one pages per sendpage() call, say we copy a few pages to one buffer fist and then send them at one time? Thanks a lot! Jack On May 5, 9:26 pm, Jack Z <[email protected]> wrote: > Hi Mike, > > Thank you for your help again. Following the guidance in your reply, I > traced the kernel code a bit more and eventually found out a possible > path for open-iscsi to get 4K pages in the scatterlist. > > Kernel version: 2.6.30 > open-iscsi version: 2.0.871 > > Trace 1: SCSI from a write request to data written into pages > > --> function pointer > > sg_fops.write --> sg_write() > / \ > sg_common_write()<- sg_new_write() > | > sg_start_req() > | > blk_rq_map_user_iov()------------------ > / (first) \ | > __bio_map_user_iov() __bio_copy_user_iov() | > \ / | > bio_add_pc_page() (then)| > | | > __bio_add_page() | > | > > ----------------------------- > | > | > blk_rq_bio_prep() > > blk_rq_map_user_iov > > __bio_copy_user_iov first creates new memory pages for the incoming > data, and then calls bio_add_pc_page() (and in turn __bio_add_page) to > insert the pages created into a structure named bio, which stands for > block IO. And then it calls __bio_copy_iov() to copy the user data > into the those pages. __bio_map_user_iov(), unlike > __bio_copy_user_iov, calls bio_add_pc_page() (and in turn > __bio_add_page) to directly map the user pages into the structure bio > without any duplication. After the structure bio is filled with proper > data, blk_rq_bio_prep() is called to associate the struture bio with > the write request. > > In __bio_add_page(), we see "bvec->bv_page = page" and "bvec->bv_len > = len". In the context of the above function calles, len should > (mostly) be PAGE_SIZE, which is 4096 on a x86 32 bit machine. Now we > know how user data is arranged into 4K size pages. > > Trace 2: From a request dequeue to data read out of SCSI buffer > > elv_next_request() > | (through funcdtion pointer q->prep_rq_fn) > sr_prep_fn() > | > scsi_setup_blk_pc_cmnd() > | > scsi_init_io() > | > scsi_init_sgtable > | (using (req->q, req, sdb->table.sgl)) > blk_rq_map_sg(struct request_queue *q, struct request *rq, struct > scatterlist *sglist) > > In blk_rq_map_sg(), the pages saved in the structure bio, which is > part of the structure request, are mapped to the parameter sglist, > which is the scatterlist in the structure scsi_data_buffer > (task->sc.sdb.table.sgl in open-iscsi code). Also, we can see "nbytes = bvec- > >bv_len" and "sg_set_page(sg, bvec->bv_page, nbytes, bvec- > >bv_offset)" (Please note that this part takes open-iscsi option > > ".use_clustering = DISABLE_CLUSTERING" into consideration). The latter > will set sc.sdb.table.sgl->length to nbytes, which is bvec->bv_len. > From the first part we know that bvec->bv_len is PAGE_SIZE. Now we see > why the size of the elements in the scatterlist used in open-iscsi is > 4096, which is the PAGE_SIZE on x86-32 machines. > > And in iscsi_tcp.c, we can have "r = tcp_sw_conn->sendpage(sk, > sg_page(sg), offset, copy, flags)". Since sg_page(sg) returns one page > in the scatterlist, it explains why open-iscsi tries to send 4096 > bytes at one time on x86-32 machines. > > On May 5, 10:47 am, Mike Christie <[email protected]> wrote: > > > > > > > On 05/03/2010 06:51 AM, Jack Z wrote: > > > > Hi group, > > > > I have been tracing the code related to sending PDUs from iscsi > > > initiator (ver 2.0-871). > > > > And through some printk()s i realize that starting from > > > iscsi_sw_tcp_pdu_init(), all the functions using scatterlist (struct > > > scatterlist *sg) seem to use 4096 as the length (sg->length). > > > > But I was not able to trace down where this 4096 is initially assigned > > > to sg->length... I searched through the code for "4096" and only two > > > spots came up: ".sg_tablesize = 4096" in struct scsi_host_template > > > iscsi_sw_tcp_sht and "#define ISCSI_TOTAL_CMDS_MAX 4096". But changing > > > these two values did not affect the sg->length value, which was still > > > 4096. > > > > I was guessing this 4096 had something to do with the fs block size > > > and this value was somewhat from "struct scsi_data_buffer *sdb = > > > scsi_out(task->sc);" in iscsi_sw_tcp_pdu_init()... but still don't > > > have a clue about how and why iscsi initiator gets this value as the > > > length for the scatterlist... > > > > Could anyone maybe explain a bit or point me to some relevant > > > document? > > > The fs/block layer is going to send down some struct called a bio, which > > has a mapping of pages to some sectors to read/write. The block layer's > > elevator code is then going to try and make large IO requests by merging > > bios. So if there was a bio to read sectors 0 - 7 into page0 and a bio > > to read sectors 8 - 15 into page1, then they would be merged into the > > same request to read sector 0 - 15. > > > At some point this request is then sent to the scsi layer, which will > > use some block layer helper to create a scatterlist from the pages in > > the requests's bios. The sg->page pointer points to the first page in a > > group of pages that are contiguous in memory, and sg->length is then the > > total length in bytes of all those pages. So in my example, if the 2 > > pages in each bio were next to each other then they could be merged into > > 1 sg entry. This does not happen for iscsi_tcp though. In your case, you > > see sg->lenth as at most 4096 because iscsi_tcp only supports 1 page per > > sg entry and the page size on your arch is PAGE_SIZE=4096 (we set the > > scsi_host_template->use_clustering flag to indicate that we only want > > one page per sg entry btw). > > > Next is where sg_tablesize comes into play. Here, we are setting it to > > 4096 to indicate that at most we want 4096 entries on that scatterlist > > that is made (4096 being the page size and sg_tablesize is just a > > coincidence). So for us in your setup we can have at most 4096 sg > > entries, with each entry having 4096 bytes. We could actually have a > > smaller sg list, because there are other settings that limit the size of > > the request like the sht->max_sectors. > > > -- > > You received this message because you are subscribed to the Google Groups > > "open-iscsi" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group > > athttp://groups.google.com/group/open-iscsi?hl=en. > > -- > You received this message because you are subscribed to the Google Groups > "open-iscsi" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]. > For more options, visit this group > athttp://groups.google.com/group/open-iscsi?hl=en. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
