Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Nigel, I have sent you an email with the output that you were looking for. Once a solution has been discovered I'll post it on here so everyone can see. Tano -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Hi Tano Great to hear that you've now got this working!! I understand you are using a Broadcom network card, from your previous posts I can see you are using the 'bnx' driver. I will raise this as a bug, but first please would you run '/usr/X11/bin/scanpci' to indentify the exact 'vendor id' and 'device id' for the Broadcom network chipset, and report that back here. I must admit that this is the first I have heard of 'I/OAT DMA', so I did some Googling on it, and found this links: http://opensolaris.org/os/community/arc/caselog/2008/257/onepager/ To quote from that ARC case: All new Sun Intel based platforms have Intel I/OAT (I/O Acceleration Technology) hardware. The first such hardware is an on-systemboard asynchronous DMA engine code named Crystal Beach. Through a set of RFEs Solaris will use this hardware to implement TCP receive side zero CPU copy via a socket. Ok, so I think that makes some sense, in the context of the problem we were seeing. It's referring to how the network adaptor transfers the data it has received, out of the buffer and onto the rest of the operating system. I've just looked to see if I can find the source code for the BNX driver, but I cannot find it. Digging deeper we find on this page: http://www.opensolaris.org/os/about/no_source/ ..on the 'ON' tab, that: Components for which there are currently no plans to release source bnx driver (B) Broadcom NetXtreme II Gigabit Ethernet driver So the bnx driver is closed source :-( Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
ns == Nigel Smith [EMAIL PROTECTED] writes: ns the bnx driver is closed source :-( The GPL'd Linux driver is contributed by Broadcom: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=blob;f=drivers/net/bnx2.c;h=2486a656f12d9f47ff27ead587e084a3c337a1a3;hb=HEAD and I believe the chip itself is newer than the Solaris 10 ``all new bits will be open-source'' pitch. pgpnZzPhLnb3Y.pgp Description: PGP signature ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
ns I will raise this as a bug, but first please would you run '/usr/X11/bin/scanpci' to indentify the exact 'vendor id' and 'device id' for the Broadcom network chipset, and report that back here Primary network interface Embedded NIC: pci bus 0x0005 cardnum 0x00 function 0x00: vendor 0x14e4 device 0x164c Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet Plus the two external add on Broadcom cards: (CURRENTLY NOT IN USE) pci bus 0x000b cardnum 0x00 function 0x00: vendor 0x1166 device 0x0103 Broadcom EPB PCI-Express to PCI-X Bridge pci bus 0x000c cardnum 0x00 function 0x00: vendor 0x14e4 device 0x164c Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet pci bus 0x000d cardnum 0x00 function 0x00: vendor 0x1166 device 0x0103 Broadcom EPB PCI-Express to PCI-X Bridge pci bus 0x000e cardnum 0x00 function 0x00: vendor 0x14e4 device 0x164c Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet I will submit the information that you had asked in email very soon. Tano -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
So it's finally working: nothing special was done to get it working either which is extremely vexing! I disabled the I/OAT DMA feature from the BIOS that apparently assists the network card and enabled the TPGT option on the iscsi target. I have two iscsitargets, one 100G on a mirror on the internal SATA controller , and a 1TB block on a RAIDZ partition. I have confirmed by disabling I/OAT DMA that I can READ/WRITE to the raidz via ISCSI. With I/OAT DMA enabled I can only read from the disks. Writes will LAG/FAIL within 10 megabytes Based on the wiki, I/OAT DMA only provides a 10% speed improvement on the network card. It seems that the broadcom drivers supplied with Solaris may be the culprit? I hope for all those individuals who were experiencing this problem can try to turn off the I/OAT DMA or similar option to see whether their problems go away. Transferred 100 gigs of data from the local store to the iscsi target on open solaris in 26 minutes. Local store = 1 SATA 1.5gb/s drive pushing at 65mb/s read average; not too bad! The I/OAT DMA feature works fine under Debian Linux and serves iscsi targets without any issues. Thank Nigel for all your help and patience. I will post on this topic some more if I get anything new, (basically if I have been getting extremely lucky and the problem returns all of a sudden.) -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
so I found some more information and have been at it diligently. Checking my hardware bios, Dell likes to share a lot of it's IRQs with other peripherals. back in the old days when we were limited to just 15 IRQs it was imperative that certain critical hardware had it's own IRQ. It may seem to be the same case here. I have disabled everything that I can from the bios, removed all additional RAID or boot cards. Also, I have turned off the I/OAT DMA settings (http://en.wikipedia.org/wiki/Direct_memory_access). I also have changed the network card from the Broadcom TOE adapter to the an Intel Etherexpress Pro 1000G card with it's own iRQs. I reinstalled the server and have started to try vmotion again. it's copying! Vmotion is actually working but at a snail's pace. In 1 hour it has copied only 28% of a 15 GIG VMDK folder. That's slow, but I don't know if it is my disk subsystem (using the internal SATA controller) or that TCP is having issues. Going to be sitting on the logs and watching it. IOSTAT -xn 1 reports activity every 10 to 15 seconds.. more information soon.. but it seems that the irq conflicts or I/OAT DMA may be the culprit. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
around 30 to 40% it really starts to slow down, but no disconnection or timeout yet. The speed is unacceptable and therefore will continue to with the notion that something is wrong in the tcp stack/iscsi. Following the snoop logs, it shows that the Windows size on the iscsi end is 0, and the on the vmware size is 32969. It seems that the window size is negotiatted for a quite some time then finally about 10 to 20 megs of data is allowed to pass (based on the iostat -xn 1 report at the time of negotiation) and then rinse and repeat. I tried to use a desktop gigabit adapted for a network card, but it doesn't seem to want to get enabled in opensolaris even though drivers are available and installed. maybe wrong drivers? I'll continue some more, but at this time i'm also out of options. Tano -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Hi Tano Please check out my post on the storage-forum for another idea to try which may give further clues: http://mail.opensolaris.org/pipermail/storage-discuss/2008-October/006458.html Best Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
[EMAIL PROTECTED]:/tmp# cat /var/svc/log/system-iscsitgt\:default.log [ Oct 21 09:17:49 Enabled. ] [ Oct 21 09:17:49 Executing start method (/lib/svc/method/svc-iscsitgt start). ] [ Oct 21 09:17:49 Method start exited with status 0. ] [ Oct 21 17:02:12 Disabled. ] [ Oct 21 17:02:12 Rereading configuration. ] [ Oct 22 12:40:13 Disabled. ] [ Oct 22 12:40:34 Rereading configuration. ] [ Oct 22 12:53:35 Enabled. ] [ Oct 22 12:53:35 Executing start method (/lib/svc/method/svc-iscsitgt start). ] [ Oct 22 12:53:35 Method start exited with status 0. ] [ Oct 22 12:54:02 Rereading configuration. ] [ Oct 22 12:54:02 No 'refresh' method defined. Treating as :true. ] [ Oct 22 12:54:06 Stopping because service restarting. ] [ Oct 22 12:54:06 Executing stop method (/lib/svc/method/svc-iscsitgt stop 90). ] [ Oct 22 12:54:06 Method stop exited with status 0. ] [ Oct 22 12:54:06 Executing start method (/lib/svc/method/svc-iscsitgt start). ] [ Oct 22 12:54:06 Method start exited with status 0. ] [ Oct 22 12:59:15 Rereading configuration. ] [ Oct 22 12:59:15 No 'refresh' method defined. Treating as :true. ] [ Oct 22 12:59:19 Rereading configuration. ] [ Oct 22 12:59:19 No 'refresh' method defined. Treating as :true. ] [EMAIL PROTECTED]:/tmp# CPU REMOTE IP EVENT BYTESITT SCSIOP 0 138.23.117.29 login-response 0 0 - 2 138.23.117.29 login-command 587 0 - 2 138.23.117.29 login-command 587 0 - 0 138.23.117.29 login-response 0 0 - 2 138.23.117.29 login-command 587 0 - 0 138.23.117.29 login-response 0 0 - 0 138.23.117.29 login-command 587 0 - 0 138.23.117.29 login-response 0 0 - 0 138.23.117.29 login-command 600 0 - 3 138.23.117.29 scsi-command 131072 2201616384 write(10) 3 138.23.117.29 scsi-command 131072 2218393600 write(10) 0 138.23.117.29 login-response466 0 - 3 138.23.117.29 scsi-command0 3226992640 0x0 0 138.23.117.29 scsi-response 0 3226992640 - 3 138.23.117.29 scsi-command0 3243769856 0x12 3 138.23.117.29 data-send 8 3243769856 - 3 138.23.117.29 scsi-response 8 3243769856 - 0 138.23.117.29 scsi-command0 3260547072 0x12 3 138.23.117.29 data-send 152 3260547072 - 3 138.23.117.29 scsi-response 152 3260547072 - 0 138.23.117.29 scsi-command0 3277324288 0x12 3 138.23.117.29 data-send 8 3277324288 - 3 138.23.117.29 scsi-response 8 3277324288 - 0 138.23.117.29 nop-receive 0 2268725248 - 3 138.23.117.29 nop-send0 2268725248 - 0 138.23.117.29 scsi-command 131072 2285502464 write(10) 3 138.23.117.29 login-command 587 0 - 3 138.23.117.29 login-response 0 0 - 3 138.23.117.29 login-command 587 0 - 3 138.23.117.29 login-response 0 0 - 0 138.23.117.29 login-command 587 0 - 0 138.23.117.29 login-response 0 0 - 2 138.23.117.29 login-command 587 0 - 2 138.23.117.29 login-response 0 0 - 2 138.23.117.29 login-command 587 0 - 0 138.23.117.29 login-response 0 0 - 2 138.23.117.29 scsi-command 131072 2302279680 write(10) 1 138.23.117.29 login-command 600 0 - 2 138.23.117.29 scsi-command 131072 2319056896 write(10) 1 138.23.117.29 login-response466 0 - 1 138.23.117.29 scsi-command0 3294101504 0x0 1 138.23.117.29 scsi-response 0 3294101504 - 1 138.23.117.29 scsi-command0 3310878720 0x12 1 138.23.117.29 data-send 8 3310878720 - 1 138.23.117.29 scsi-response 8 3310878720 - 1 138.23.117.29 scsi-command0 3327655936 0x12 0 138.23.117.29 scsi-command0 3344433152 0x12 2 138.23.117.29 data-send 152 3327655936 - 2 138.23.117.29 scsi-response 152 3327655936 - 0 138.23.117.29 data-send 8 3344433152 - 0 138.23.117.29 scsi-response 8 3344433152 - 0 138.23.117.29 nop-receive 0 2369388544 - 1 138.23.117.29 nop-send0 2369388544 - 0
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Well the '/var/svc/log/system-iscsitgt\:default.log' is NOT showing any core dumps, which is good, but means that we need to look think deeper for the answer. The 'iscsisnoop.d' output does looks similar to that captured by Eugene over on the storage forum, but Eugene only showed a short sequence. http://mail.opensolaris.org/pipermail/storage-discuss/2008-October/006414.html Here we have a longer sequence of 'iscsisnoop.d' output clearly showing the looping, as the error occurs, causing the initiator and target to try to re-establish the session. The question is - what is the root cause, what is just consequential effect. Tano, it you could also get some debug log messages from the iscsi target (/tmp/target_log), that would help to confirm that this is the same (or not) as what Eugene is seeing: http://mail.opensolaris.org/pipermail/storage-discuss/2008-October/006428.html It would be useful to modify the 'iscsisnoop.d' to give timestamps, as this would help to show if there are any unusual delays. And the DTrace iscsi probes have a 'args[1]' which can give further details on sequence numbers and tags. Having seen your 'iscsisnoop.d' output, and the '/tmp/target_log' from Eugene, I now going back to thinking this IS an iscsi issue, with the initiator and target mis-interacting in some way, and NOT a driver/hardware issue. I know that SUN have recently been doing a lot of stress testing with the iscsi target and various initiators, including Linux. I have found the snv_93 and snv_97 iscsi target to work well with the Vmware ESX and Microsoft initiators. So it is a surprise to see these problems occurring. Maybe some of the more resent builds snv_98, 99 have 'fixes' that have cause the problem... Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Well, my colleague myself have recently had a basic Vmare ESX cluster working, with the Solaris iscsi target, in the Lab at work, so I know it does work. We used ESX 3.5i on two Dell Precision 390 workstations, booted from USB memory sticks. We used snv_97 and no special tweaks required. We used Vmotion to move a running Windows XP guest from one ESX host to the another. Windows XP was playing a video feed at the time. It all worked fine. We repeated the operation three times. My colleague is the ESX expert, but I believe it was update 2 with all latest patches applied. But we only had a single iscsi target setup on the Solaris box, The target size was 200Gb, formated with VMFS. Ok, another thing you could try, which may give a clue to what is going wrong, is to run the 'iscsisnoop.d' script on the Solaris box. http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_iSCSI This is a DTrace script which shows what iscsi target events are happening, so interesting if it shows anything unusual at the point of failure. But, I'm beginning to think it could be one of your hardware components that is playing up, but no clue so far. It could be anywhere on the path. Maybe you could check the Solaris iScsi target works ok under stress from something other that ESX, like say the Windows iscsi initiator. Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
I have a very similar problem with SNV_(( and Virtual Iron (http://www.opensolaris.org/jive/thread.jspa?threadID=79831tstart=0) I am using IBM x3650 Server with 6 SAS drives. And what we have in common is Broadcomm network cards (BNX driver). From previous experiance I know this cards had a driver problem in linux. So as a wild guess maybe problem is here? Can you try another card in your server? Unfortunately I don't have compatible spare card to check it.. Regards, Eugene -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
The poweredge 1850 has an intel etherexpress pro 1000 internal cards in it. However, some new updates, even the microsoft initiator hung writing a 1.5 gigabyte file to the iscsitarget on the opensolaris box. I've installed linux iscsitarget on the same box and will reattempt the iscsi targets to the microsoft and esx servers. I'll also get the DTRACE of the iscsi box later this afternoon. Sorry for the delay. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
one more update: common hardware between all my machines soo far has been the PERC (Poweredge Raid Controllers) or also known as the LSI MEGA RAID controller. The 1850 has a PERC 4d/i the 1900 has a PERC 5/i I'll be testing the iscsitarget with a SATA controller to test my hypothesis. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Hi tano I hope you can try with the 'iscsisnoop.d' script, so we can see if your problem is the same as what Eugene is seeing. Please can you also check the contents of the file: /var/svc/log/system-iscsitgt\:default.log .. just to make sure that the iscsi target is not core dumping restarting. I've also done a post on the storage-forum on how to enable a debug log on the iscsi target, which may also give some clues. http://mail.opensolaris.org/pipermail/storage-discuss/2008-October/006423.html It may also be worth trying with a smaller target size, just to see if that is a factor. (There have in the past been bugs, now fixed, which triggered with 'large' targets.) As I said, it worked ok for me with a 200Gb target. Many thanks for all your testing. Please bear with us on this one. If it is a problem with the Solaris iscsi target we need to get to the bottom of the root cause. Following Eugene report, I'm beginning to fear that some sort of regression has been introduced into the iscsi target code... Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
A couple of updates: Installed Opensolaris on a Poweredge 1850 with a single network card, default iscsitarget configuration (no special tweaks or tpgt settings), vmotion was about 10 percent successful before I received write errors on disk. 10 percent better than the Poweredge 1900 iscsitarget. The GUID's are set by VMWare when the iscsi initiator connects to the Opensolaris Target. Therefore I have no control what the GUIDs are and from my observations it doesn't matter with the GUIDs are identical. Unless there is a bug in Vmware and GUIDs. I have followed the instructions to delete the backing stores, the zfs partitions and start a new. I even went as far as rebooting the machine after I created a Single LUN, connected to the vmware initiator. I then repeated the same steps when creating the second LUN. Overall VMWare determined the GUID # of the iscsi target. I Right now I am applying a ton of VMWare patches that have iscsi connectivity repairs and other security updates. I will be resorting back to a linux iscsi target model if the patches do not work to check whether the physical machines have an abnormality or networking that may be causing problems. I'll be submitting more updates as I continue testing! cliff notes: nothing has worked so far :( -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
According to the svccfg(1M) man page: http://docs.sun.com/app/docs/doc/819-2240/svccfg-1m?a=view ...it should be just 'export' without a leading '-' or '--'. I've been googling on NAA and this is the 'Network Address Authority', It seems to be yet another way of uniquely identifying a target Lun, and is apparently to be compatble with the way that Fibre Channel SAS do this. For futher details, see: http://tools.ietf.org/html/rfc3980 T11 Network Address Authority (NAA) Naming Format for iSCSI Node Names I also found this blog post: http://timjacobs.blogspot.com/2008/08/matching-luns-between-esx-hosts-and-vcb.html ...which talks about Vmware ESX and NAA. For anyone interested in the code fix's to the solaris iscsi target to support Vmware ESX server, take a look at these links: http://hg.genunix.org/onnv-gate.hg/rev/29862a7558ef http://hg.genunix.org/onnv-gate.hg/rev/5b422642546a Tano, based on the above, I would say you need unique GUID's for two separate Targets/LUNS. Best Regards Nigel Smith http://nwsmith.blogspot.com/ -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Ciao, Your GUID's must not be the same an NAA is already established on the targets and if you previously tried to initialize the LUN with VMware it would have assigned the value in the VMFS header wich is now stored on your raw ZFS backing store. This will confuse VMware and it will remember it now some where in its definitions. You need to remove the second datastore from VMware and delete the target definition and ZFS backing store. Once you recreate the backing and target you should have a new GUID and iqn which should cure the issue. Regards, Mike -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Do you have an active interface on the OpenSolaris box that is configured for 0.0.0.0 right now? Not anymore: By default, since you haven't configured the tpgt on the iscsi target, solaris will broadcast all active interfaces in its SendTargets response. On the ESX side, ESX will attempt to log into all addresses in that SendTargets response, even though you may only put 1 address in the sw initiator config. THis made a lot of sense and I was flirting with the TPGT idea and this motivated me to try it. If that is the case, you have a few options a) disable that bogus interface: it was a physical interface that has been removed b) fully configure it and and also create a vmkernel interface that can connect to it disable and removed. c) configure a tpgt mask on the iscsi target (iscsitadm create tpgt) to only use the valid address Configured... see information below: Also, I never see target 40 log into anything...is that still a valid target number? You may want to delete everything in /var/lib/iscsi and reboot the host. The vmkbinding and vmkdiscovery files will be rebuilt and it will start over with target 0. Sometimes, things get a bit crufty Deleted /var/lib/iscsi contents = Now for more information: This is the result of what I have tried new: I removed all extra interfaces from both the ESX host and the Solaris ISCSI machine ESX is now configured with a single interface: Virtual Switch: vSwitch0 Server: Poweredge 850 Service Console IP: 138.23.117.20 VMKERNEL (ISCSI)IP: 138.23.117.21 VMNETWORK (VLAN ID 25) VMNIC 0 1000 FULL ISCSI target server Poweredge 1900 Broadcomm Gigabit TOE interface BXN0: 138.23.117.32 Steps below: ISCSI TARGET SERVER INTERFACE LIST: [EMAIL PROTECTED]:~# ifconfig -a lo0: flags=2001000849UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL mtu 8232 index 1 inet 127.0.0.1 netmask ff00 bnx0: flags=201000843UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS mtu 1500 index 2 inet 138.23.117.32 netmask ff00 broadcast 138.23.117.255 ether 0:1e:c9:d5:75:d2 lo0: flags=2002000849UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL mtu 8252 index 1 inet6 ::1/128 [EMAIL PROTECTED]:~# ESX HOST INTERFACE LIST [EMAIL PROTECTED] log]# ifconfig -a loLink encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:21265 errors:0 dropped:0 overruns:0 frame:0 TX packets:21265 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:11963350 (11.4 Mb) TX bytes:11963350 (11.4 Mb) vmnic0Link encap:Ethernet HWaddr 00:19:B9:F7:ED:DD UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:212272 errors:0 dropped:0 overruns:0 frame:0 TX packets:3354 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:16606213 (15.8 Mb) TX bytes:2131622 (2.0 Mb) Interrupt:97 vswif0Link encap:Ethernet HWaddr 00:50:56:40:0D:17 inet addr:138.23.117.20 Bcast:138.23.117.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:2027 errors:0 dropped:0 overruns:0 frame:0 TX packets:3336 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:495940 (484.3 Kb) TX bytes:2123882 (2.0 Mb) [EMAIL PROTECTED] log]# iscsitadm create target -b /dev/zvol/rdsk/vdrive/LUNA iscsi iscsitadm create target -b /dev/zvol/rdsk/vdrive/LUNB vscsi iscsitadm create tpgt 1 iscsitadm modify -i 138.23.117.32 1 [EMAIL PROTECTED]:~# iscsitadm list tpgt -v TPGT: 1 IP Address: 138.23.117.32 [EMAIL PROTECTED]:~# iscsitadm modify target -p 1 iscsi iscsitadm modify target -p 1 vscsi After assigning a TPGT value to an iscsi target it seemed a little promising. But no luck. [EMAIL PROTECTED]:~# iscsitadm list target -v Target: vscsi iSCSI Name: iqn.1986-03.com.sun:02:35ec26d8-f173-6dd5-b239-93a9690ffe46.vscsi Connections: 0 ACL list: TPGT list: TPGT: 1 LUN information: LUN: 0 GUID: 0 VID: SUN PID: SOLARIS Type: disk Size: 1.3T Backing store: /dev/zvol/rdsk/vdrive/LUNB Status: online Target: iscsi iSCSI Name: iqn.1986-03.com.sun:02:4d469663-2304-4796-87a5-dffa03cd14ea.iscsi Connections: 0 ACL list: TPGT list: TPGT: 1 LUN information: LUN: 0 GUID: 0 VID: SUN PID: SOLARIS Type: disk Size: 750G Backing store: /dev/zvol/rdsk/vdrive/LUNA Status: online [EMAIL PROTECTED]:~# Now test and logs: Logged into ESX Infrastructure: Add iscsi target ip in Dynamic Discovery iscsi Server in ESX: LOGS: Oct 17 06:49:18 vmware-860-1 vmkernel: 0:02:03:01.499 cpu1:1035)iSCSI: bus 0
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Hello Tano, The issue here is not the target or VMware but a missing GUID on the target as the issue. Observe the target smf properties using iscsitadm list target -v You have iSCSI Name: iqn.1986-03.com.sun:02:35ec26d8-f173-6dd5-b239-93a9690ffe46.vscsi Connections: 0 ACL list: TPGT list: TPGT: 1 LUN information: LUN: 0 GUID: 0 VID: SUN PID: SOLARIS Type: disk Size: 1.3T Backing store: /dev/zvol/rdsk/vdrive/LUNB Status: online Target: iscsi iSCSI Name: iqn.1986-03.com.sun:02:4d469663-2304-4796-87a5-dffa03cd14ea.iscsi Connections: 0 ACL list: TPGT list: TPGT: 1 LUN information: LUN: 0 GUID: 0 VID: SUN PID: SOLARIS Type: disk Size: 750G Backing store: /dev/zvol/rdsk/vdrive/LUNA Status: online Both targets have the same invalid GUID of zero and this will prevent NAA from working properly. To fix this you can create a two new temporary targets and export the smf props to an xml file. e.g. svccfg -export iscsitgt /iscsibackup/myiscsitargetbu.xml then edit the xml file switching the newly generated guid's to your valid targets and zeroing the temp ones. Now you can import the file with scvadm import /iscsibackup/myiscsitargetbu.xml When you restart your iscsitgt server you should have the guids in place and it should work with vmware. The you can delete the temps targets. http://blog.laspina.ca -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Hi, I rebooted the server after I submitted the information to release the locks set up on my ESX host. After the reboot: I reran the iscsitadm list target -v and the GUIDs showed up. Only interesting problem: the GUID's are identical (any problems with that?) [EMAIL PROTECTED]:~# iscsitadm list target -v Target: iscsi iSCSI Name: iqn.1986-03.com.sun:02:4d469663-2304-4796-87a5-dffa03cd14ea.iscsi Connections: 1 Initiator: iSCSI Name: iqn.1998-01.com.vmware:vmware-860-1-4403d26f Alias: vmware-860-1.ucr.edu ACL list: TPGT list: TPGT: 1 LUN information: LUN: 0 GUID: 600144f048f8fa2a1ec9d575d200 VID: SUN PID: SOLARIS Type: disk Size: 750G Backing store: /dev/zvol/rdsk/vdrive/LUNA Status: online Target: vscsi iSCSI Name: iqn.1986-03.com.sun:02:35ec26d8-f173-6dd5-b239-93a9690ffe46.vscsi Connections: 1 Initiator: iSCSI Name: iqn.1998-01.com.vmware:vmware-860-1-4403d26f Alias: vmware-860-1.ucr.edu ACL list: TPGT list: TPGT: 1 LUN information: LUN: 0 GUID: 600144f048f8fa2b1ec9d575d200 VID: SUN PID: SOLARIS Type: disk Size: 1.3T Backing store: /dev/zvol/rdsk/vdrive/LUNB Status: online [EMAIL PROTECTED]:~# when I attempted to run svccfg -export iscsitgt /iscsibackup/myiscsibackup.xml tells me that -e is an illegal option. svccfg does not have an --export or -export option. I checked the man pages. -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Tano wrote: I'm not sure if this is a problem with the iscsitarget or zfs. I'd greatly appreciate it if it gets moved to the proper list. Well I'm just about out of ideas on what might be wrong.. Quick history: I installed OS 2008.05 when it was SNV_86 to try out ZFS with VMWare. Found out that multilun's were being treated as multipaths so waited till SNV_94 came out to fix the issues with VMWARE and iscsitadm/zfs shareiscsi=on. I Installed OS2008.05 on a virtual machine as a test bed, pkg image-update to SNV_94 a month ago, made some thin provisioned partitions, shared them with iscsitadm and mounted on VMWare without any problems. Ran storage VMotion and all went well. So with this success I purchased a Dell 1900 with a PERC 5/i controller 6 x 15K SAS DRIVEs with ZFS RAIDZ1 configuration. I shared the zfs partitions and mounted them on VMWare. Everything is great till I have to write to the disks. It won't write! What's the error exactly? What step are you performing to get the error? Creating the vmfs3 filesystem? Accessing the mountpoint? Steps I took creating the disks 1) Installed mega_sas drivers. 2) zpool create tank raidz c5t0d0 c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0 3) zfs create -V 1TB tank/disk1 4) zfs create -V 1TB tank/disk2 5) iscsitadm create target -b /dev/zvol/rdsk/tank/disk1 LABEL1 6) iscsitadm create target -b /dev/zvol/rdsk/tank/disk2 LABEL2 Now both drives are lun 0 but with uniqu VMHBA device identifiers. SO they are detected as seperate drives. I then redid (deleted) step 5 and 6 and changed it too 5) iscsitadm create target -u 0 -b /dev/zvol/rdsk/tank/disk1 LABEL1 6) iscsitadm create target -u 1 -b /dev/zvol/rdsk/tank/disk2 LABEL1 VMWARE discovers the seperate LUNs on the Device identifier, but still unable to write to the iscsi luns. Why is it that the steps I've conducted in SNV_94 work but in SNV_97,98, or 99 don't. Any ideas?? any log files I can check? I am still an ignorant linux user so I only know to look in /var/log :) The relevant errors from /var/log/vmkernel on the ESX server would be helpful. Also, iscsitadm list target -v Also, I blogged a bit on OpenSolaris iSCSI VMware ESX I was using b98 on a X4500. http://blogs.sun.com/rarneson/entry/zfs_clones_iscsi_and_vmware -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss -- Ryan Arneson Sun Microsystems, Inc. 303-223-6264 [EMAIL PROTECTED] http://blogs.sun.com/rarneson ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Thank you Ryan for your response, I have included all the information you requested in line to this document: I will be testing SNV_86 again to see whether the problem persists, maybe it's my hardware. I will confirm that soon enough. On Thu, October 16, 2008 10:31 am, Ryan Arneson wrote: Tano wrote: I'm not sure if this is a problem with the iscsitarget or zfs. I'd greatly appreciate it if it gets moved to the proper list. Well I'm just about out of ideas on what might be wrong.. Quick history: I installed OS 2008.05 when it was SNV_86 to try out ZFS with VMWare. Found out that multilun's were being treated as multipaths so waited till SNV_94 came out to fix the issues with VMWARE and iscsitadm/zfs shareiscsi=on. I Installed OS2008.05 on a virtual machine as a test bed, pkg image-update to SNV_94 a month ago, made some thin provisioned partitions, shared them with iscsitadm and mounted on VMWare without any problems. Ran storage VMotion and all went well. So with this success I purchased a Dell 1900 with a PERC 5/i controller 6 x 15K SAS DRIVEs with ZFS RAIDZ1 configuration. I shared the zfs partitions and mounted them on VMWare. Everything is great till I have to write to the disks. It won't write! What's the error exactly? From the VMWARE Infrastructure front end, everything looks like is in order. I Send Targets to the iscsi IP, then rescan the HBA and it detects all the LUNs and Targets. What step are you performing to get the error? Creating the vmfs3 filesystem? Accessing the mountpoint? The error occurs when attempting to write large data sets to the mount point. Formatting the drive VMFS3 works, manually copying 5 megabytes of data to the Target works. Running cp -a of the VM folder or cold VM migration will hang the infrastructure client and the ESX host lags. No timeouts of any sort will occur. I waited up to an hour. Steps I took creating the disks 1) Installed mega_sas drivers. 2) zpool create tank raidz c5t0d0 c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0 3) zfs create -V 1TB tank/disk1 4) zfs create -V 1TB tank/disk2 5) iscsitadm create target -b /dev/zvol/rdsk/tank/disk1 LABEL1 6) iscsitadm create target -b /dev/zvol/rdsk/tank/disk2 LABEL2 Now both drives are lun 0 but with uniqu VMHBA device identifiers. SO they are detected as seperate drives. I then redid (deleted) step 5 and 6 and changed it too 5) iscsitadm create target -u 0 -b /dev/zvol/rdsk/tank/disk1 LABEL1 6) iscsitadm create target -u 1 -b /dev/zvol/rdsk/tank/disk2 LABEL1 VMWARE discovers the seperate LUNs on the Device identifier, but still unable to write to the iscsi luns. Why is it that the steps I've conducted in SNV_94 work but in SNV_97,98, or 99 don't. Any ideas?? any log files I can check? I am still an ignorant linux user so I only know to look in /var/log :) The relevant errors from /var/log/vmkernel on the ESX server would be helpful. So I weeded out the best that I could the logs from /var/log/vmkernel. Basically everytime I initiated a command from vmware I captured the logs. I have broken down what I was doing at what point in the logs. Again the complete breakdown of both systems: [b]VMware ESX 3.5 Update 2[/b] [EMAIL PROTECTED] log]# uname -a Linux vmware-860-1.ucr.edu 2.4.21-57.ELvmnix #1 Tue Aug 12 17:28:03 PDT 2008 i686 i686 i386 GNU/Linux [EMAIL PROTECTED] log]# arch i686 [b]Opensolaris:[/b] Dell Poweredge 1900 PERC 5/i 6 disk 450GB each SAS 15kRPM Broadcomm BNX driver: no conflicts. Quadcore 1600 Mhz 1066 FSB 8 GB RAM [EMAIL PROTECTED]:~# uname -a SunOS iscsi-sas 5.11 snv_99 i86pc i386 i86pc Solaris [EMAIL PROTECTED]:~# isainfo -v 64-bit amd64 applications ssse3 cx16 mon sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications ssse3 ahf cx16 mon sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu [EMAIL PROTECTED]:~# zpool status -v pool: rpool state: ONLINE scrub: none requested config: NAME STATE READ WRITE CKSUM rpool ONLINE 0 0 0 mirror ONLINE 0 0 0 c3t0d0s0 ONLINE 0 0 0 c3t1d0ONLINE 0 0 0 errors: No known data errors pool: vdrive state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM vdrive ONLINE 0 0 0 raidz1ONLINE 0 0 0 c5t0d0 ONLINE 0 0 0 c5t1d0 ONLINE 0 0 0 c5t2d0 ONLINE 0 0 0 c5t3d0 ONLINE 0 0 0 c5t4d0 ONLINE 0 0 0 c5t5d0 ONLINE 0 0 0 errors: No known data errors [EMAIL PROTECTED]:~# [EMAIL PROTECTED]:~# zfs create -V 750G vdrive/LUNA [EMAIL PROTECTED]:~# zfs create -V 1250G vdrive/LUNB [EMAIL PROTECTED]:~# zfs list NAME USED AVAIL REFER MOUNTPOINT rpool
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Also I had read your blog post previously. I will be taking advantage of the cloning/snapshot section of your blog once I am successful writing to the Targets. Thanks again! -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
I googled on some sub-strings from your ESX logs and found these threads on the VmWare forum which lists similar error messages, suggests some actions to try on the ESX server: http://communities.vmware.com/message/828207 Also, see this thread: http://communities.vmware.com/thread/131923 Are you using multiple Ethernet connections between the OpenSolaris box and the ESX server? Your 'iscsitadm list target -v' is showing Connections: 0, so run that command after the ESX server initiator has successfully connected to the OpenSolaris iscsi target, and post that output. The log files seem to show the iscsi session has dropped out, and the initiator is auto retrying to connect to the target, but failing. It may help to get a packet capture at this stage to try see why the logon is failing. Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!
Nigel Smith wrote: I googled on some sub-strings from your ESX logs and found these threads on the VmWare forum which lists similar error messages, suggests some actions to try on the ESX server: http://communities.vmware.com/message/828207 Also, see this thread: http://communities.vmware.com/thread/131923 Are you using multiple Ethernet connections between the OpenSolaris box and the ESX server? Indeed, I think there might be some notion of 2 separate interfaces. I see 0.0.0.0 and the 138.xx.xx.xx networks. Oct 16 06:38:29 vmware-860-1 vmkernel: 0:02:03:00.166 cpu1:1080)iSCSI: bus 0 target 40 trying to establish session 0x9a684e0 to portal 0, address 0.0.0.0 port 3260 group 1 Oct 16 06:16:30 vmware-860-1 vmkernel: 0:01:41:01.021 cpu1:1076)iSCSI: bus 0 target 38 established session 0x9a402c0 #1 to portal 0, address 138.23.117.32 port 3260 group 1, alias luna Do you have an active interface on the OpenSolaris box that is configured for 0.0.0.0 right now? By default, since you haven't configured the tpgt on the iscsi target, solaris will broadcast all active interfaces in its SendTargets response. On the ESX side, ESX will attempt to log into all addresses in that SendTargets response, even though you may only put 1 address in the sw initiator config. If that is the case, you have a few options a) disable that bogus interface b) fully configure it and and also create a vmkernel interface that can connect to it c) configure a tpgt mask on the iscsi target (iscsitadm create tpgt) to only use the valid address Also, I never see target 40 log into anything...is that still a valid target number? You may want to delete everything in /var/lib/iscsi and reboot the host. The vmkbinding and vmkdiscovery files will be rebuilt and it will start over with target 0. Sometimes, things get a bit crufty. -ryan Your 'iscsitadm list target -v' is showing Connections: 0, so run that command after the ESX server initiator has successfully connected to the OpenSolaris iscsi target, and post that output. The log files seem to show the iscsi session has dropped out, and the initiator is auto retrying to connect to the target, but failing. It may help to get a packet capture at this stage to try see why the logon is failing. Regards Nigel Smith -- This message posted from opensolaris.org ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss