I've been doing some digging into a reported scenario where a node update fails with an "iSCSI database failure" reported.
The short of it is that it's possible to create node records for a target portal in both the old-style portal configuration file format, and the new-style configuration directory with tpgt. If they both exist on an update command a database failure may occur, depending on the unspecified return order of readdir. If the old-style record is created first, then the update/rewrite logic will remove it when a request to create a node with a specified tpgt comes in. But if the new-style record exists first, you can still create a static old-style record without tpgt. On an update, when walking through the files in the node database if readdir returns the new-style record first this can occur; The new-style entry is read and passed to idbm_node_set_param and then idbm_rec_write. In idbm_rec_write the existence of the old-style file is detected, and it's removed in order to update to the new-style. We have now gone from two node records to one. But, because readdir can cache directory contents from the last opendir/rewinddir call (in idbm_for_each_portal) it will still return the path to the now deleted old-style file. Attempting to read that in (in idbm_for_each_iface) will result in a failed stat call and the "iSCSI database failure." The actual set of calls that seems to be happening in what was reported to me is running SendTargets discovery on a portal, then for some reason running additional node --op=new commands specifying the target-name and portal address (without tpgt) of what was discovered, and finally an update to change node.startup. To reproduce this reliably, I've been using LD_PRELOAD with Ted T'so's spd_readdir.c code to cache and sort the filestream contents. OK, arguably that may be a case of "if it hurts when you do that, then stop doing that." But it's easy, subtle, and weird enough that it might be worth addressing. It's just not clear how to best address it. The idbm code could be more forgiving of stat failures in these loops, either ignoring them or restarting the command if a filesystem change like this happens. In this case, it could check for the existence of a matching new-style entry, and convert that to a static record preserving the tpgt instead of adding a new static record without one. I'm slightly confused by the entire point of having the tpgt be in the pathname (and therefor part of a unique record key), given that it allows the same portal to have multiple tags under a single target which seems wrong. Anyway, end of long description. Looking at possible improvements and I'd love any other opinions on this one. - Chris -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/open-iscsi. For more options, visit https://groups.google.com/groups/opt_out.
