Re: [driver-discuss] cannot mount root path on iscsi boot lun
Hi Javen, Yes that is right, atleast that is the only thing i could understand after looking at the src code,(esp the solaris iscsi initiator src) .So below is what i am doing in my tran_bus_config. myxx_tran_bus_config(dev_info_t *parent, uint_t flags, ddi_bus_config_op_t op, void *arg, dev_info_t **childp) { return ndi_busop_bus_config(parent, flags, op, arg, childp, 0); } I figured that the implementation of BUS_CONFIG_ONE and BUS_CONFIG_ALL was more relevant to it ,because of the way it names its target(iqn way), i just use a targetID On hotplug i do an ndi_devi_alloc ,update the target/lun props and online it In my tran_tgt_init() i do as below : if (ndi_dev_is_persistent_node(tgt_dip) == 0) { (void) ndi_merge_node(tgt_dip, scsi_name_child); ddi_set_name_addr(tgt_dip, NULL); return (DDI_FAILURE); } // Then my driver scans through its internal list of available targets looking for the same targetnumber input via the sd(struct scsi_device *) parameter and if found returns DDI_SUCCESS. So, pls let me know what i am doing wrong and if so ,how to correct it. Thanks Som --- Javen Wu [EMAIL PROTECTED] wrote: Hi Som, From you last email, I found another potential problem which maybe irrelevant to your panic. My understanding about your last email is that you use ndi_busop_bus_configure() to enumerate the immediate children, but for the target added into the system after the system boot, you use ndi_devi_online(). In another words, You relies on .conf files of target drivers(sd.conf, st.conf...) to enumerate iSCSI targets during boot, but use ndi_devi_online handle hotplug. Is my understanding correct? If my understanding is correct, that could cause the attributes of dip nodes(dev_info_t) of your iSCSI targets in the device tree are different. You know, solaris has .conf node and persistent node which are different. Persistent node was allocated by ndi_devi_alloc() and onlined by ndi_devi_online(). But .conf nodes are different. I guess according your implementation, the immediate children are .conf nodes, however, the iSCSI targets reported asynchronizly are persistent node which would cause endless problem. Is it possible you show us how you implement tran_bus_config() BUS_CONFIG_ONE BUS_CONFIG_ALL? How you name your iSCSI target? iqn name or small digital targetID? How you load your driver? What's your driver class? --javen Somnath kotur wrote: Hi Javen, Well, this is consistently reproducible exactly 50% of the time, i.e * every other reboot * ... so im thinking that there must be some simple explanation to this ..like some setting ,prtconf output was OK ..im still looking on seeing the output of tran_tgt_init and probe .. BTW i am indeed using ndi_devi_online when i want to async report my targets online after an iscsi login to a target Thanks Som --- Javen Wu [EMAIL PROTECTED] wrote: Hi Som, I met the same panic stack during configuration failure in my driver. But I am not sure if your problem is exactly same as mine. I am curious why it happens occasionally in your case. But I think that's a start point to position/debug the problem. Is it possible the configure routine failed by some reason occasionally? I thought your implement dynamic enumeration by calling ndi_devi_online previously, but it seems your driver is not. Anyway, I guess before panic in mount root, your tran_bus_configure failed and return DDI_FAILURE. You can verify my guess by watching the return value of your tran_bus_configure by kmdb. If my above guess is correct. So after you invoke ndi_busop_bus_configure(), please trace the return value of your tran_tgt_init() and tran_tgt_probe() routine and watch the return value. Most likely your tran_tgt_probe() failed occasionally due to some werid situation in my mind. If you didn't implement tran_tgt_probe() routine by yourself, the SCSA will set scsi_hba_probe() as default. So you can trace the return value of scsi_hba_probe(). If scsi_hba_probe failed as well. That means at least *this time*, your IO met some problem to send inquiry...So you can do deeper investigation. Cheers Javen Somnath kotur wrote: Javen, Thank you once again, but one thing i do not understand is if that is the case,then why does it boot fine after the reboot ,i mean everytime there is a panic ..the subsequent reboot that is initiated by it always leads to a succesful boot ..(reboot from this stage again leads to a panic ..alternating) my tran_bus_config() implementation just calls ndi_busop_bus_config(parent, flags,op, arg, childp, 0) = passing down the rest of the args as it is ..with timeout of 0, you think that should be changed? Thanks som --- Javen Wu [EMAIL
Re: [driver-discuss] cannot mount root path on iscsi boot lun
Hi Som, Actually, your tran_bus_config() should allocate/online immediate children as persistent node before you call ndi_busos_bus_config(). In one world, you have to use ndi_devi_alloc/ndi_devi_online to online all immediate children as well. I believe your HBA is able to discover the specific target existence or all attached targets by some vendor specific method. Once HBA detect targets, you use ndi_devi_alloc() and ndi_devi_online() to configure your devices. Please refer to my below pseudo code: myxx_tran_bus_config(dev_info_t *parent, uint_t flags, ddi_bus_config_op_t op, void *arg, dev_info_t **childp) { int ret = DDI_FAILURE; /* * get the softstate of your initiator */ myxx_softstate = (myxx_softs *)ddi_get_soft_state(mpt_state, ddi_get_instance(parent)); if (myxx_softstate == NULL) { return (NDI_FAILURE); } ndi_devi_enter(parent, circ1); switch (op) { case BUS_CONFIG_ONE: /* parse target name out of name given */ if ((ptr = strchr((char *)arg, '@')) == NULL) { ret = NDI_FAILURE; break; } /* * per the string to parse target and lun */ myxx_parse_target_lun(prt, target, lun) ret = myxx_config_one(myxx_softstate, target,lun); break; case BUS_CONFIG_DRIVER: case BUS_CONFIG_ALL: myxx_config_all(myxx_softstate); ret = DDI_SUCCESS; break; default: break; } /* * Only when above configure success, then invoke ndi_busop */ if (ret == NDI_SUCCESS) { ret = ndi_busop_bus_config(parent, flags, op, arg, childp, 0); ndi_devi_exit(parent, circ1); } int myxx_config_one(myxx_softs *myxx, int target, int lun, dev_info_t **childp) { /* * Check if the target+lun has already been in your internel list * if exist, just set childp pointer and return success, if failed go to next step * to configure */ .code for search internal list... if (find the node from internal list) { set childp return (NDI_SUCCESS); } /* * your discovery process to make sure the target is there */ target discovery existence... or just scsi_hba_probe() to probe the target,lun if the target,lun exist we need configure the device ndi_devi_alloc() scsi_hba_nodename_compatible_get() if (ndi_prop_update_int(DDI_DEV_T_NONE, *lun_dip, TARGET_PROP, (int)target) != DDI_PROP_SUCCESS) { ndi_rtn = NDI_FAILURE; goto error_handle; } if (ndi_prop_update_int(DDI_DEV_T_NONE, *lun_dip, LUN_PROP, lun) != DDI_PROP_SUCCESS) { mpt_log(mpt, CE_WARN, mpt driver unable to create property for target %d lun %d (LUN_PROP), target, lun); ndi_rtn = NDI_FAILURE; goto error_handle; } ret = ndi_devi_online(); if (ret ==NDI_SUCCESS) set the dip to childp; return (ret); } void myxx_config_all() { discovery all attached iSCSI targets for (each target) { issue REPORT_LUN to get LUNs for (each lun) { myxx_config_one(); } } Above is just a rough pseduo codes for configuration routine, hope it can help you. BTW, I am not sure if you handle device offline, I mean when a iscsi target get gone, you can call ndi_devi_offline to offline the target devices. Javen Somnath kotur wrote: Hi Javen, Yes that is right, atleast that is the only thing i could understand after looking at the src code,(esp the solaris iscsi initiator src) .So below is what i am doing in my tran_bus_config. myxx_tran_bus_config(dev_info_t *parent, uint_t flags, ddi_bus_config_op_t op, void *arg, dev_info_t **childp) { return ndi_busop_bus_config(parent, flags, op, arg, childp, 0); } I figured that the implementation of BUS_CONFIG_ONE and BUS_CONFIG_ALL was more relevant to it ,because of the way it names its target(iqn way), i just use a targetID On hotplug i do an ndi_devi_alloc ,update the target/lun props and online it In my tran_tgt_init() i do as below : if (ndi_dev_is_persistent_node(tgt_dip) == 0) { (void) ndi_merge_node(tgt_dip, scsi_name_child); ddi_set_name_addr(tgt_dip, NULL); return (DDI_FAILURE); } // Then my driver scans through its internal list of available targets looking for the same targetnumber input via the sd(struct scsi_device *) parameter and if found returns DDI_SUCCESS. So, pls let me know what i am doing wrong and if so ,how to correct it. Thanks Som --- Javen Wu [EMAIL PROTECTED] wrote: Hi Som, From you last email, I found another potential problem which maybe irrelevant to your panic. My understanding about your last email is that you use ndi_busop_bus_configure()
Re: [driver-discuss] cannot mount root path on iscsi boot lun
Hi Javen, Thanks a lot for your help,But I still do not understand the semantics and exactly why we need to do all that ? Especially since i am registering my HBA driver as a class 'scsi' ,so as of now whenever i do an iscsi login /logout all i do is call ndi_devi_alloc/ndi_devi_online and a devfs_clean/ndi_devi_offline respectively and this works fine for me . I do NOT understand where/how exactly it might just break Could you pls explain or point me to some relevant documentation ? Thanks Som --- Javen Wu [EMAIL PROTECTED] wrote: Hi Som, Actually, your tran_bus_config() should allocate/online immediate children as persistent node before you call ndi_busos_bus_config(). In one world, you have to use ndi_devi_alloc/ndi_devi_online to online all immediate children as well. I believe your HBA is able to discover the specific target existence or all attached targets by some vendor specific method. Once HBA detect targets, you use ndi_devi_alloc() and ndi_devi_online() to configure your devices. Please refer to my below pseudo code: myxx_tran_bus_config(dev_info_t *parent, uint_t flags, ddi_bus_config_op_t op, void *arg, dev_info_t **childp) { int ret = DDI_FAILURE; /* * get the softstate of your initiator */ myxx_softstate = (myxx_softs *)ddi_get_soft_state(mpt_state, ddi_get_instance(parent)); if (myxx_softstate == NULL) { return (NDI_FAILURE); } ndi_devi_enter(parent, circ1); switch (op) { case BUS_CONFIG_ONE: /* parse target name out of name given */ if ((ptr = strchr((char *)arg, '@')) == NULL) { ret = NDI_FAILURE; break; } /* * per the string to parse target and lun */ myxx_parse_target_lun(prt, target, lun) ret = myxx_config_one(myxx_softstate, target,lun); break; case BUS_CONFIG_DRIVER: case BUS_CONFIG_ALL: myxx_config_all(myxx_softstate); ret = DDI_SUCCESS; break; default: break; } /* * Only when above configure success, then invoke ndi_busop */ if (ret == NDI_SUCCESS) { ret = ndi_busop_bus_config(parent, flags, op, arg, childp, 0); ndi_devi_exit(parent, circ1); } int myxx_config_one(myxx_softs *myxx, int target, int lun, dev_info_t **childp) { /* * Check if the target+lun has already been in your internel list * if exist, just set childp pointer and return success, if failed go to next step * to configure */ .code for search internal list... if (find the node from internal list) { set childp return (NDI_SUCCESS); } /* * your discovery process to make sure the target is there */ target discovery existence... or just scsi_hba_probe() to probe the target,lun if the target,lun exist we need configure the device ndi_devi_alloc() scsi_hba_nodename_compatible_get() if (ndi_prop_update_int(DDI_DEV_T_NONE, *lun_dip, TARGET_PROP, (int)target) != DDI_PROP_SUCCESS) { ndi_rtn = NDI_FAILURE; goto error_handle; } if (ndi_prop_update_int(DDI_DEV_T_NONE, *lun_dip, LUN_PROP, lun) != DDI_PROP_SUCCESS) { mpt_log(mpt, CE_WARN, mpt driver unable to create property for target %d lun %d (LUN_PROP), target, lun); ndi_rtn = NDI_FAILURE; goto error_handle; } ret = ndi_devi_online(); if (ret ==NDI_SUCCESS) set the dip to childp; return (ret); } void myxx_config_all() { discovery all attached iSCSI targets for (each target) { issue REPORT_LUN to get LUNs for (each lun) { myxx_config_one(); } } Above is just a rough pseduo codes for configuration routine, hope it can help you. BTW, I am not sure if you handle device offline, I mean when a iscsi target get gone, you can call ndi_devi_offline to offline the target devices. Javen Somnath kotur wrote: Hi Javen, Yes that is right, atleast that is the only thing i could understand after looking at the src code,(esp the solaris iscsi initiator src) .So below is what i am doing in my tran_bus_config. myxx_tran_bus_config(dev_info_t *parent, uint_t flags, ddi_bus_config_op_t op, void *arg, dev_info_t **childp) { return ndi_busop_bus_config(parent, flags, op, arg, childp, 0); } I figured that the implementation of BUS_CONFIG_ONE and BUS_CONFIG_ALL was more relevant to it ,because of the way it names its target(iqn way), i just use a targetID On hotplug i do an ndi_devi_alloc ,update the target/lun props and online it In my tran_tgt_init() i do
Re: [driver-discuss] cannot mount root path on iscsi boot lun
Hi Javen, Thank you , i get the problem now, so what you might be saying is i might not be able to log off any .conf nodes... Only reason i am doing the .conf way is to be able to enumerate and identify any targets/LUNs that were already attached in my HBA before my driver came up,so that way when my driver is being installed and it obtains all the relevant info from the HBA ,the tran_tgt_init is invoked so i can report this to the SCSA I wasnt able to think of any other way of self enumerating these devices /LUNS that are already present in my HBA while my driver is being loaded. Oh am sorry i forgot to mention abt that issue,yes looks like it is solved, there was a bug in the code on my HBA that was not persisting the targetnumber correctly,so it would alternately give numbers 0 and 1 ,which is why tran_tgt_init() would return sucess every other boot and the consequent boot would be successful! Thanks Som --- Javen Wu [EMAIL PROTECTED] wrote: Som, Unfortunately, Less document introduce relevant content. As I mentioned .conf node is different to persistent node. You mixed .conf nodes and persistent nodes. I cannot remember what kind of problem I met during my development, but that do cause problem during handle hotplug. If the device offline, ndi_devi_offline with NDI_REMOVE flag should be called, but the node should be a persistent node which is allocated by ndi_devi_alloc(). .conf node is not allocated by ndi_devi_alloc(), I am guess it would cause problem. On the other hands, iSCSI initiator driver should be a self identify/auto-enumeration driver like FC driver. You should enumerate devices without replying on sd.conf. I am working on a new driver class named scsi-self-identify which means the driver will enumerate targets by HBA driver itself without the help of .conf file. Since you async report devices as persistent node, you need keep all nodes are consistent. I haven't saw any driver mix .conf node and persistent node. BTW, did you figure out the boot panic problem? Cheers Javen Somnath kotur wrote: Hi Javen, Thanks a lot for your help,But I still do not understand the semantics and exactly why we need to do all that ? Especially since i am registering my HBA driver as a class 'scsi' ,so as of now whenever i do an iscsi login /logout all i do is call ndi_devi_alloc/ndi_devi_online and a devfs_clean/ndi_devi_offline respectively and this works fine for me . I do NOT understand where/how exactly it might just break Could you pls explain or point me to some relevant documentation ? Thanks Som --- Javen Wu [EMAIL PROTECTED] wrote: Hi Som, Actually, your tran_bus_config() should allocate/online immediate children as persistent node before you call ndi_busos_bus_config(). In one world, you have to use ndi_devi_alloc/ndi_devi_online to online all immediate children as well. I believe your HBA is able to discover the specific target existence or all attached targets by some vendor specific method. Once HBA detect targets, you use ndi_devi_alloc() and ndi_devi_online() to configure your devices. Please refer to my below pseudo code: myxx_tran_bus_config(dev_info_t *parent, uint_t flags, ddi_bus_config_op_t op, void *arg, dev_info_t **childp) { int ret = DDI_FAILURE; /* * get the softstate of your initiator */ myxx_softstate = (myxx_softs *)ddi_get_soft_state(mpt_state, ddi_get_instance(parent)); if (myxx_softstate == NULL) { return (NDI_FAILURE); } ndi_devi_enter(parent, circ1); switch (op) { case BUS_CONFIG_ONE: /* parse target name out of name given */ if ((ptr = strchr((char *)arg, '@')) == NULL) { ret = NDI_FAILURE; break; } /* * per the string to parse target and lun */ myxx_parse_target_lun(prt, target, lun) ret = myxx_config_one(myxx_softstate, target,lun); break; case BUS_CONFIG_DRIVER: case BUS_CONFIG_ALL: myxx_config_all(myxx_softstate); ret = DDI_SUCCESS; break; default: break; } /* * Only when above configure success, then invoke ndi_busop */ if (ret == NDI_SUCCESS) { ret = ndi_busop_bus_config(parent, flags, op, arg, childp, 0); ndi_devi_exit(parent, circ1); } int myxx_config_one(myxx_softs *myxx, int target, int lun, dev_info_t **childp) { /* * Check if the target+lun has already been in your internel list * if exist, just set childp pointer and return success, if failed go to next step * to configure */ .code for search internal list... if (find the node from internal list) { set childp return (NDI_SUCCESS); } /* * your discovery process to
Re: [driver-discuss] cannot mount root path on iscsi boot lun
Hi Som, From you last email, I found another potential problem which maybe irrelevant to your panic. My understanding about your last email is that you use ndi_busop_bus_configure() to enumerate the immediate children, but for the target added into the system after the system boot, you use ndi_devi_online(). In another words, You relies on .conf files of target drivers(sd.conf, st.conf...) to enumerate iSCSI targets during boot, but use ndi_devi_online handle hotplug. Is my understanding correct? If my understanding is correct, that could cause the attributes of dip nodes(dev_info_t) of your iSCSI targets in the device tree are different. You know, solaris has .conf node and persistent node which are different. Persistent node was allocated by ndi_devi_alloc() and onlined by ndi_devi_online(). But .conf nodes are different. I guess according your implementation, the immediate children are .conf nodes, however, the iSCSI targets reported asynchronizly are persistent node which would cause endless problem. Is it possible you show us how you implement tran_bus_config() BUS_CONFIG_ONE BUS_CONFIG_ALL? How you name your iSCSI target? iqn name or small digital targetID? How you load your driver? What's your driver class? --javen Somnath kotur wrote: Hi Javen, Well, this is consistently reproducible exactly 50% of the time, i.e * every other reboot * ... so im thinking that there must be some simple explanation to this ..like some setting ,prtconf output was OK ..im still looking on seeing the output of tran_tgt_init and probe .. BTW i am indeed using ndi_devi_online when i want to async report my targets online after an iscsi login to a target Thanks Som --- Javen Wu [EMAIL PROTECTED] wrote: Hi Som, I met the same panic stack during configuration failure in my driver. But I am not sure if your problem is exactly same as mine. I am curious why it happens occasionally in your case. But I think that's a start point to position/debug the problem. Is it possible the configure routine failed by some reason occasionally? I thought your implement dynamic enumeration by calling ndi_devi_online previously, but it seems your driver is not. Anyway, I guess before panic in mount root, your tran_bus_configure failed and return DDI_FAILURE. You can verify my guess by watching the return value of your tran_bus_configure by kmdb. If my above guess is correct. So after you invoke ndi_busop_bus_configure(), please trace the return value of your tran_tgt_init() and tran_tgt_probe() routine and watch the return value. Most likely your tran_tgt_probe() failed occasionally due to some werid situation in my mind. If you didn't implement tran_tgt_probe() routine by yourself, the SCSA will set scsi_hba_probe() as default. So you can trace the return value of scsi_hba_probe(). If scsi_hba_probe failed as well. That means at least *this time*, your IO met some problem to send inquiry...So you can do deeper investigation. Cheers Javen Somnath kotur wrote: Javen, Thank you once again, but one thing i do not understand is if that is the case,then why does it boot fine after the reboot ,i mean everytime there is a panic ..the subsequent reboot that is initiated by it always leads to a succesful boot ..(reboot from this stage again leads to a panic ..alternating) my tran_bus_config() implementation just calls ndi_busop_bus_config(parent, flags,op, arg, childp, 0) = passing down the rest of the args as it is ..with timeout of 0, you think that should be changed? Thanks som --- Javen Wu [EMAIL PROTECTED] wrote: During the boot phase, system would configure root device by BUS_CONFIG_ONE with the argument root device path. I don't know how you implement your tran_bus_config() routine, but you can trace or debug your implementation of tran_bus_config with BUS_CONFIG_ONE by kmdb. I think the panic could be caused by tran_bus_config failure during configuring root device. Another way to prove the point is I guess you invoked ndi_devi_online() in your tran_bus_config(), you can watch the return value of ndi_devi_online() during boot by kmdb. I suppose you used x86 system, %eax or %rax is the return value after you jump out the function by :u command in kmdb. Javen Javen Wu wrote: Hi Som, I have met similar problem before. The root cause of my problem is the root disk not being configured correctly. Are you sure your boot disk(iSCSI target) was enumerated by your iscsi initiator driver correctly and attached already? You can use -k option to boot your system and when panic happen, the system will enter kmdb directly. Then you can use ::prtconf to check whether the dip node of your iSCSI target under your initiator instance was generated and attached correctly. Cheers Javen Somnath kotur wrote: Hi Juergen, I have a strange issue with the iscsi
Re: [driver-discuss] cannot mount root path on iscsi boot lun
Hi Som, I met the same panic stack during configuration failure in my driver. But I am not sure if your problem is exactly same as mine. I am curious why it happens occasionally in your case. But I think that's a start point to position/debug the problem. Is it possible the configure routine failed by some reason occasionally? I thought your implement dynamic enumeration by calling ndi_devi_online previously, but it seems your driver is not. Anyway, I guess before panic in mount root, your tran_bus_configure failed and return DDI_FAILURE. You can verify my guess by watching the return value of your tran_bus_configure by kmdb. If my above guess is correct. So after you invoke ndi_busop_bus_configure(), please trace the return value of your tran_tgt_init() and tran_tgt_probe() routine and watch the return value. Most likely your tran_tgt_probe() failed occasionally due to some werid situation in my mind. If you didn't implement tran_tgt_probe() routine by yourself, the SCSA will set scsi_hba_probe() as default. So you can trace the return value of scsi_hba_probe(). If scsi_hba_probe failed as well. That means at least *this time*, your IO met some problem to send inquiry...So you can do deeper investigation. Cheers Javen Somnath kotur wrote: Javen, Thank you once again, but one thing i do not understand is if that is the case,then why does it boot fine after the reboot ,i mean everytime there is a panic ..the subsequent reboot that is initiated by it always leads to a succesful boot ..(reboot from this stage again leads to a panic ..alternating) my tran_bus_config() implementation just calls ndi_busop_bus_config(parent, flags,op, arg, childp, 0) = passing down the rest of the args as it is ..with timeout of 0, you think that should be changed? Thanks som --- Javen Wu [EMAIL PROTECTED] wrote: During the boot phase, system would configure root device by BUS_CONFIG_ONE with the argument root device path. I don't know how you implement your tran_bus_config() routine, but you can trace or debug your implementation of tran_bus_config with BUS_CONFIG_ONE by kmdb. I think the panic could be caused by tran_bus_config failure during configuring root device. Another way to prove the point is I guess you invoked ndi_devi_online() in your tran_bus_config(), you can watch the return value of ndi_devi_online() during boot by kmdb. I suppose you used x86 system, %eax or %rax is the return value after you jump out the function by :u command in kmdb. Javen Javen Wu wrote: Hi Som, I have met similar problem before. The root cause of my problem is the root disk not being configured correctly. Are you sure your boot disk(iSCSI target) was enumerated by your iscsi initiator driver correctly and attached already? You can use -k option to boot your system and when panic happen, the system will enter kmdb directly. Then you can use ::prtconf to check whether the dip node of your iSCSI target under your initiator instance was generated and attached correctly. Cheers Javen Somnath kotur wrote: Hi Juergen, I have a strange issue with the iscsi boot,whenever i do a 'reboot' after the iSCSI LUN boot has come up ,upon reboot ,the kernel panics at the 1st stage with dump as below: ... it then automatically reboots and on the 2nd time ,the OS comes up properly without the panic ,am wondering if there is some simple explanation to this,like making some entry in a system file ? ## panic[cpu0]/thread=fbc21d00: cannot mount root path fbc44ab0 genunix:rootconf+0xea() fbc44ae0 genunix:vfs_mountroot+0x51() fbc44b20 genunix:main+0x86() 0xfe800342(fe80) kobj_init + 0x1c3(100ff88,..) skipping system dump - no dump device configured rebooting... ### Thanks Som --- Somnath kotur [EMAIL PROTECTED] wrote: Hi Juergen/Javen, Finally the issue got fixed last night,iscsi boot is working !!! The problem was that when the SCSA stack was doing a partial DMA and breaking a request into multiple windows, my driver's pvt structure which has a field called cdb_length was relying on the 'cmdlen' parameter passed in tran_init_pkt() ,this parameter happens to be set to '0' for all windows after the first one. I am not sure if they expect us to pick up the cmdlen from the first window and use the same for the rest(which is how i worked around this) or is it a genuine bug ? I know that the opensolaris src code is different from the OS in the CD release, as the scsi_pkt structure has
Re: [driver-discuss] cannot mount root path on iscsi boot lun
Hi Javen, Well, this is consistently reproducible exactly 50% of the time, i.e * every other reboot * ... so im thinking that there must be some simple explanation to this ..like some setting ,prtconf output was OK ..im still looking on seeing the output of tran_tgt_init and probe .. BTW i am indeed using ndi_devi_online when i want to async report my targets online after an iscsi login to a target Thanks Som --- Javen Wu [EMAIL PROTECTED] wrote: Hi Som, I met the same panic stack during configuration failure in my driver. But I am not sure if your problem is exactly same as mine. I am curious why it happens occasionally in your case. But I think that's a start point to position/debug the problem. Is it possible the configure routine failed by some reason occasionally? I thought your implement dynamic enumeration by calling ndi_devi_online previously, but it seems your driver is not. Anyway, I guess before panic in mount root, your tran_bus_configure failed and return DDI_FAILURE. You can verify my guess by watching the return value of your tran_bus_configure by kmdb. If my above guess is correct. So after you invoke ndi_busop_bus_configure(), please trace the return value of your tran_tgt_init() and tran_tgt_probe() routine and watch the return value. Most likely your tran_tgt_probe() failed occasionally due to some werid situation in my mind. If you didn't implement tran_tgt_probe() routine by yourself, the SCSA will set scsi_hba_probe() as default. So you can trace the return value of scsi_hba_probe(). If scsi_hba_probe failed as well. That means at least *this time*, your IO met some problem to send inquiry...So you can do deeper investigation. Cheers Javen Somnath kotur wrote: Javen, Thank you once again, but one thing i do not understand is if that is the case,then why does it boot fine after the reboot ,i mean everytime there is a panic ..the subsequent reboot that is initiated by it always leads to a succesful boot ..(reboot from this stage again leads to a panic ..alternating) my tran_bus_config() implementation just calls ndi_busop_bus_config(parent, flags,op, arg, childp, 0) = passing down the rest of the args as it is ..with timeout of 0, you think that should be changed? Thanks som --- Javen Wu [EMAIL PROTECTED] wrote: During the boot phase, system would configure root device by BUS_CONFIG_ONE with the argument root device path. I don't know how you implement your tran_bus_config() routine, but you can trace or debug your implementation of tran_bus_config with BUS_CONFIG_ONE by kmdb. I think the panic could be caused by tran_bus_config failure during configuring root device. Another way to prove the point is I guess you invoked ndi_devi_online() in your tran_bus_config(), you can watch the return value of ndi_devi_online() during boot by kmdb. I suppose you used x86 system, %eax or %rax is the return value after you jump out the function by :u command in kmdb. Javen Javen Wu wrote: Hi Som, I have met similar problem before. The root cause of my problem is the root disk not being configured correctly. Are you sure your boot disk(iSCSI target) was enumerated by your iscsi initiator driver correctly and attached already? You can use -k option to boot your system and when panic happen, the system will enter kmdb directly. Then you can use ::prtconf to check whether the dip node of your iSCSI target under your initiator instance was generated and attached correctly. Cheers Javen Somnath kotur wrote: Hi Juergen, I have a strange issue with the iscsi boot,whenever i do a 'reboot' after the iSCSI LUN boot has come up ,upon reboot ,the kernel panics at the 1st stage with dump as below: ... it then automatically reboots and on the 2nd time ,the OS comes up properly without the panic ,am wondering if there is some simple explanation to this,like making some entry in a system file ? ## panic[cpu0]/thread=fbc21d00: cannot mount root === message truncated === Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping ___ driver-discuss mailing list driver-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/driver-discuss
Re: [driver-discuss] cannot mount root path on iscsi boot lun
Hi Juergen, I tried it out and yes both the fields are filled with valid values (ufs for fs-type ,and bo_name has the valid dev name) Thanks Som --- Juergen Keil [EMAIL PROTECTED] wrote: Hi Som, I have a strange issue with the iscsi boot,whenever i do a 'reboot' after the iSCSI LUN boot has come up ,upon reboot ,the kernel panics at the 1st stage with dump as below: ... it then automatically reboots and on the 2nd time ,the OS comes up properly without the panic ,am wondering if there is some simple explanation to this,like making some entry in a system file ? ## panic[cpu0]/thread=fbc21d00: cannot mount root path fbc44ab0 genunix:rootconf+0xea() fbc44ae0 genunix:vfs_mountroot+0x51() fbc44b20 genunix:main+0x86() 0xfe800342(fe80) kobj_init + 0x1c3(100ff88,..) skipping system dump - no dump device configured rebooting... ### Hmm, the panic message is printed from usr/src/uts/common/fs/vfs.c function rootconf(): http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/vfs.c#4291 4320 error = VFS_MOUNTROOT(rootvfs, ROOT_INIT); 4321 vfs_unrefvfssw(vsw); 4322 rootdev = rootvfs-vfs_dev; 4323 4324 if (error) 4325 panic(cannot mount root path %s, rootfs.bo_name); There should be some text following cannot mount root path, but I don't see that in your panic messages. That is, rootfs.bo_name must be a an empty string. I'd verify the contents of the rootfs structure, by booting the kernel with options -kd, setting a breakpoint at rootconf and printing the contents of the rootfs structure. Something like this (at the kmdb prompt): rootconf:b :c rootfs::print In the boot-from-local-IDE-drive case, I see something like this in rootfs: [0] rootfs::print { bo_fstyp = [ ufs ] bo_name = [ /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a ] bo_flags = 0x1 } On your system, is bo_fstyp and bo_named filled? Looking for last minute shopping deals? Find them fast with Yahoo! Search. http://tools.search.yahoo.com/newsearch/category.php?category=shopping ___ driver-discuss mailing list driver-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/driver-discuss
Re: [driver-discuss] cannot mount root path on iscsi boot lun
Juergen, Yes i did because except for the address, everything else was identical. There was NO name following the msg path.Infact there is a blank line ,but when i did step through the code ,even after the panic happened and i did a rootfs::print ,the correct value for bo_name showed up...however there is a field called bo_devname that was showing 0's throughout... Another thing - is it possible that the opensolaris src code is not the same as the one running off the CD ? I know it is different in some of the other layers like the SCSA stack etc Thanks Som --- Juergen Keil [EMAIL PROTECTED] wrote: Did you copy and paste the following text? panic[cpu0]/thread=fbc21d00: cannot mount root path I don't see the bo_name in the panic message. And bo_name is filled with the physical device path to your iSCSI volume, when you're in kmdb at the start of function rootconf? Is there perhaps a 0-byte right at the start of bo_name? Otherwise, it seems as if rootfs.bo_name gets corrupted - between entering rootconf() and the point where the |panic(cannot mount root path %s, rootfs.bo_name)| is called. Try to use the ::step over command to step through rootconf, and print the rootfs structure after each subroutine call. It should be quite easy to find out which subroutine call changes rootfs.bo_name... Hi Juergen, I tried it out and yes both the fields are filled with valid values (ufs for fs-type ,and bo_name has the valid dev name) Thanks Som --- Juergen Keil [EMAIL PROTECTED] wrote: Hi Som, I have a strange issue with the iscsi boot,whenever i do a 'reboot' after the iSCSI LUN boot has come up ,upon reboot ,the kernel panics at the 1st stage with dump as below: ... it then automatically reboots and on the 2nd time ,the OS comes up properly without the panic ,am wondering if there is some simple explanation to this,like making some entry in a system file ? ## panic[cpu0]/thread=fbc21d00: cannot mount root path fbc44ab0 genunix:rootconf+0xea() fbc44ae0 genunix:vfs_mountroot+0x51() fbc44b20 genunix:main+0x86() 0xfe800342(fe80) kobj_init + 0x1c3(100ff88,..) skipping system dump - no dump device configured rebooting... ### Hmm, the panic message is printed from usr/src/uts/common/fs/vfs.c function rootconf(): http://cvs.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/uts/common/fs/vfs. c#4291 4320 error = VFS_MOUNTROOT(rootvfs, ROOT_INIT); 4321 vfs_unrefvfssw(vsw); 4322 rootdev = rootvfs-vfs_dev; 4323 4324 if (error) 4325 panic(cannot mount root path %s, rootfs.bo_name); There should be some text following cannot mount root path, but I don't see that in your panic messages. That is, rootfs.bo_name must be a an empty string. I'd verify the contents of the rootfs structure, by booting the kernel with options -kd, setting a breakpoint at rootconf and printing the contents of the rootfs structure. Something like this (at the kmdb prompt): rootconf:b :c rootfs::print In the boot-from-local-IDE-drive case, I see something like this in rootfs: [0] rootfs::print { bo_fstyp = [ ufs ] bo_name = [ /[EMAIL PROTECTED],0/[EMAIL PROTECTED],1/[EMAIL PROTECTED]/[EMAIL PROTECTED],0:a ] bo_flags = 0x1 } On your system, is bo_fstyp and bo_named filled? Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ___ driver-discuss mailing list driver-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/driver-discuss
Re: [driver-discuss] cannot mount root path on iscsi boot lun
Hi Som, I have met similar problem before. The root cause of my problem is the root disk not being configured correctly. Are you sure your boot disk(iSCSI target) was enumerated by your iscsi initiator driver correctly and attached already? You can use -k option to boot your system and when panic happen, the system will enter kmdb directly. Then you can use ::prtconf to check whether the dip node of your iSCSI target under your initiator instance was generated and attached correctly. Cheers Javen Somnath kotur wrote: Hi Juergen, I have a strange issue with the iscsi boot,whenever i do a 'reboot' after the iSCSI LUN boot has come up ,upon reboot ,the kernel panics at the 1st stage with dump as below: ... it then automatically reboots and on the 2nd time ,the OS comes up properly without the panic ,am wondering if there is some simple explanation to this,like making some entry in a system file ? ## panic[cpu0]/thread=fbc21d00: cannot mount root path fbc44ab0 genunix:rootconf+0xea() fbc44ae0 genunix:vfs_mountroot+0x51() fbc44b20 genunix:main+0x86() 0xfe800342(fe80) kobj_init + 0x1c3(100ff88,..) skipping system dump - no dump device configured rebooting... ### Thanks Som --- Somnath kotur [EMAIL PROTECTED] wrote: Hi Juergen/Javen, Finally the issue got fixed last night,iscsi boot is working !!! The problem was that when the SCSA stack was doing a partial DMA and breaking a request into multiple windows, my driver's pvt structure which has a field called cdb_length was relying on the 'cmdlen' parameter passed in tran_init_pkt() ,this parameter happens to be set to '0' for all windows after the first one. I am not sure if they expect us to pick up the cmdlen from the first window and use the same for the rest(which is how i worked around this) or is it a genuine bug ? I know that the opensolaris src code is different from the OS in the CD release, as the scsi_pkt structure has changed and the cdb length is embedded in it,hope this behaviour is addressed in that? FINALLY having got it to work , i STILL see couple of issues though that havent gone the newfs/mkfs still gives the same error as before . and my install_log still gives errors as it was giving before on post installation...any ideas pls? I also see a strange behaviour in that when i reboot,and just when grub is loading the kernel,the system does a warm boot by itself and then the 2nd time around the acutal OS is booted into .. So the actual delta time b/n OS reboot and up time is turning out to be around 4-5 mins !! Thanks Som --- Somnath kotur [EMAIL PROTECTED] wrote: Hi Juergen, The 2nd expt went fine, the network trace showed that the i/o's were not really in parallel (rather async) The first expt of creating the ramdisk definitely recreated the issue,(tho there was no amd64 directory under i86pc) Saw some nearly 3200 commands being sent before initiator sent the task mgmt command to the target So definitely there is some problem with this async ios scenario Any suggestions? Thanks Som --- Somnath kotur [EMAIL PROTECTED] wrote: Hi Juergen, You might be spot on with your analysis which also seems to concur with my observation, i just captured a detailed network trace using a good hardware tool of the entire installation process, found that initially a LOT of I/O's are going out from the initiator ,the iSCSI command sequence number going upto 21,000 and thereabouts and there is still no response from the target ,soon after which my initiator seems to have sent a Task Management Command the target seems to respond much later after a whole bunch of I/O's (mainly SCSI writes ) are sent out by the initiator. I was wondering what more do i have to do in my HBA driver to maintain concurrency, all commands first arrive at tran_tgt_init ,followed by tran_init_pkt ,where i alloc scsi_pkt along with memory for my HBA tran structure ... Do i have to internally queue the scsi pkts arriving in tran_init_pkt? Or is there some parameter ,like a 'queue depth' that i should register with Solaris for the same while registering my driver? Thanks Som --- Juergen Keil [EMAIL PROTECTED] wrote: I think the newfs/mkfs/mkfs: close failed on write disk: I/O error error is more interesting than testing mkfile. newfs / mkfs -F ufs is using async io. Your iSCSI driver should be receiving new i/o requests, while the current i/o request is still busy. And the script that is constructing the
Re: [driver-discuss] cannot mount root path on iscsi boot lun
Javen, Thank you once again, but one thing i do not understand is if that is the case,then why does it boot fine after the reboot ,i mean everytime there is a panic ..the subsequent reboot that is initiated by it always leads to a succesful boot ..(reboot from this stage again leads to a panic ..alternating) my tran_bus_config() implementation just calls ndi_busop_bus_config(parent, flags,op, arg, childp, 0) = passing down the rest of the args as it is ..with timeout of 0, you think that should be changed? Thanks som --- Javen Wu [EMAIL PROTECTED] wrote: During the boot phase, system would configure root device by BUS_CONFIG_ONE with the argument root device path. I don't know how you implement your tran_bus_config() routine, but you can trace or debug your implementation of tran_bus_config with BUS_CONFIG_ONE by kmdb. I think the panic could be caused by tran_bus_config failure during configuring root device. Another way to prove the point is I guess you invoked ndi_devi_online() in your tran_bus_config(), you can watch the return value of ndi_devi_online() during boot by kmdb. I suppose you used x86 system, %eax or %rax is the return value after you jump out the function by :u command in kmdb. Javen Javen Wu wrote: Hi Som, I have met similar problem before. The root cause of my problem is the root disk not being configured correctly. Are you sure your boot disk(iSCSI target) was enumerated by your iscsi initiator driver correctly and attached already? You can use -k option to boot your system and when panic happen, the system will enter kmdb directly. Then you can use ::prtconf to check whether the dip node of your iSCSI target under your initiator instance was generated and attached correctly. Cheers Javen Somnath kotur wrote: Hi Juergen, I have a strange issue with the iscsi boot,whenever i do a 'reboot' after the iSCSI LUN boot has come up ,upon reboot ,the kernel panics at the 1st stage with dump as below: ... it then automatically reboots and on the 2nd time ,the OS comes up properly without the panic ,am wondering if there is some simple explanation to this,like making some entry in a system file ? ## panic[cpu0]/thread=fbc21d00: cannot mount root path fbc44ab0 genunix:rootconf+0xea() fbc44ae0 genunix:vfs_mountroot+0x51() fbc44b20 genunix:main+0x86() 0xfe800342(fe80) kobj_init + 0x1c3(100ff88,..) skipping system dump - no dump device configured rebooting... ### Thanks Som --- Somnath kotur [EMAIL PROTECTED] wrote: Hi Juergen/Javen, Finally the issue got fixed last night,iscsi boot is working !!! The problem was that when the SCSA stack was doing a partial DMA and breaking a request into multiple windows, my driver's pvt structure which has a field called cdb_length was relying on the 'cmdlen' parameter passed in tran_init_pkt() ,this parameter happens to be set to '0' for all windows after the first one. I am not sure if they expect us to pick up the cmdlen from the first window and use the same for the rest(which is how i worked around this) or is it a genuine bug ? I know that the opensolaris src code is different from the OS in the CD release, as the scsi_pkt structure has changed and the cdb length is embedded in it,hope this behaviour is addressed in that? FINALLY having got it to work , i STILL see couple of issues though that havent gone the newfs/mkfs still gives the same error as before . and my install_log still gives errors as it was giving before on post installation...any ideas pls? I also see a strange behaviour in that when i reboot,and just when grub is loading the kernel,the system does a warm boot by itself and then the 2nd time around the acutal OS is booted into .. So the actual delta time b/n OS reboot and up time is turning out to be around 4-5 mins !! Thanks Som --- Somnath kotur [EMAIL PROTECTED] wrote: Hi Juergen, The 2nd expt went fine, the network trace showed that the i/o's were not really in parallel (rather async) The first expt of creating the ramdisk definitely recreated the issue,(tho there was no amd64 directory under i86pc) Saw some nearly 3200 commands being sent before initiator sent the task mgmt command to the target So definitely there is some problem with this async === message truncated === Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ