Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-11-24 Thread Tano
Nigel,

I have sent you an email with the output that you were looking for.

Once a solution has been discovered I'll post it on here so everyone can see.

Tano
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-29 Thread Nigel Smith
Hi Tano
Great to hear that you've now got this working!!

I understand you are using a Broadcom network card,
from your previous posts I can see you are using the 'bnx' driver.

I will raise this as a bug, but first please would you run 
'/usr/X11/bin/scanpci'
to indentify the exact 'vendor id' and 'device id' for the Broadcom network 
chipset,
and report that back here.

I must admit that this is the first I have heard of 'I/OAT DMA',
so I did some Googling on it, and found this links:

http://opensolaris.org/os/community/arc/caselog/2008/257/onepager/

To quote from that ARC case:

  All new Sun Intel based platforms have Intel I/OAT (I/O Acceleration
   Technology) hardware.

   The first such hardware is an on-systemboard asynchronous DMA engine
   code named Crystal Beach.

   Through a set of RFEs Solaris will use this hardware to implement
   TCP receive side zero CPU copy via a socket.

Ok, so I think that makes some sense, in the context of
the problem we were seeing. It's referring to how the network
adaptor transfers the data it has received, out of the buffer
and onto the rest of the operating system.

I've just looked to see if I can find the source code for 
the BNX driver, but I cannot find it.

Digging deeper we find on this page:
http://www.opensolaris.org/os/about/no_source/
..on the 'ON' tab, that:

Components for which there are currently no plans to release source
bnx driver (B)  Broadcom NetXtreme II Gigabit Ethernet driver

So the bnx driver is closed source :-(
Regards
Nigel Smith
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-29 Thread Miles Nordin
 ns == Nigel Smith [EMAIL PROTECTED] writes:

ns the bnx driver is closed source :-(

The GPL'd Linux driver is contributed by Broadcom: 

 
http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=blob;f=drivers/net/bnx2.c;h=2486a656f12d9f47ff27ead587e084a3c337a1a3;hb=HEAD

and I believe the chip itself is newer than the Solaris 10 ``all new
bits will be open-source'' pitch.


pgpnZzPhLnb3Y.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-29 Thread Tano
ns I will raise this as a bug, but first please would you run 
'/usr/X11/bin/scanpci'
to indentify the exact 'vendor id' and 'device id' for the Broadcom network 
chipset,
and report that back here

Primary network interface Embedded NIC:  
pci bus 0x0005 cardnum 0x00 function 0x00: vendor 0x14e4 device 0x164c
 Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet


Plus the two external add on Broadcom cards: (CURRENTLY NOT IN USE)
pci bus 0x000b cardnum 0x00 function 0x00: vendor 0x1166 device 0x0103
 Broadcom EPB PCI-Express to PCI-X Bridge

pci bus 0x000c cardnum 0x00 function 0x00: vendor 0x14e4 device 0x164c
 Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet

pci bus 0x000d cardnum 0x00 function 0x00: vendor 0x1166 device 0x0103
 Broadcom EPB PCI-Express to PCI-X Bridge

pci bus 0x000e cardnum 0x00 function 0x00: vendor 0x14e4 device 0x164c
 Broadcom Corporation NetXtreme II BCM5708 Gigabit Ethernet

I will submit the information that you had asked in email very soon.

Tano
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-28 Thread Tano
So it's finally working: nothing special was done to get it working either 
which is extremely vexing!

I disabled the I/OAT DMA feature from the BIOS that apparently assists the 
network card and enabled the TPGT option on the iscsi target. I have two 
iscsitargets, one 100G on a mirror on the internal SATA controller , and a 1TB 
block on a RAIDZ partition.

I have confirmed by disabling I/OAT DMA that I can READ/WRITE to the raidz via 
ISCSI. With I/OAT DMA enabled I can only read from the disks. Writes will 
LAG/FAIL within 10 megabytes


Based on the wiki, I/OAT DMA only provides a 10% speed improvement on the 
network card. It seems that the broadcom drivers supplied with Solaris may be 
the culprit?

I hope for all those individuals who were experiencing this problem can try to 
turn off the I/OAT DMA or similar option to see whether their problems go away.

Transferred 100 gigs of data from the local store to the iscsi target on open 
solaris in 26 minutes.

Local store = 1 SATA 1.5gb/s drive pushing at 65mb/s read average; not too bad!

The I/OAT DMA feature works fine under Debian Linux and serves iscsi targets 
without any issues. 

Thank Nigel for all your help and patience. I will post on this topic some more 
if I get anything new, (basically if I have been getting extremely lucky and 
the problem returns all of a sudden.)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-27 Thread Tano
so I found some more information and have been at it diligently.

Checking my hardware bios, Dell likes to share a lot of it's IRQs with other 
peripherals.

back in the old days when we were limited to just 15 IRQs it was imperative 
that certain critical hardware had it's own IRQ. It may seem to be the same 
case here.


I have disabled everything that I can from the bios, removed all additional 
RAID or boot cards. Also, I have turned off the I/OAT DMA settings 
(http://en.wikipedia.org/wiki/Direct_memory_access). I also have changed the 
network card from the Broadcom TOE adapter to the an Intel Etherexpress Pro 
1000G card with it's own iRQs.

I reinstalled the server and have started to try vmotion again.

it's copying! Vmotion is actually working but at a snail's pace. In 1 hour it 
has copied only 28% of a 15 GIG VMDK folder.

That's slow, but I don't know if it is my disk subsystem (using the internal 
SATA controller) or that TCP is having issues. Going to be sitting on the logs 
and watching it. 

IOSTAT -xn 1 reports activity every 10 to 15 seconds.. 

more information soon.. but it seems that the irq conflicts or I/OAT DMA may be 
the culprit.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-27 Thread Tano
around 30 to 40% it really starts to slow down, but no disconnection or timeout 
yet. The speed is unacceptable and therefore will continue to with the notion 
that something is wrong in the tcp stack/iscsi.

Following the snoop logs, it shows that the Windows size on the iscsi end is 0, 
and the on the vmware size is 32969. It seems that the window size is 
negotiatted for a quite some time then finally about 10 to 20 megs of data is 
allowed to pass (based on the iostat -xn 1 report at the time of negotiation) 
and then rinse and repeat.

I tried to use a desktop gigabit adapted for a network card, but it doesn't 
seem to want to get enabled in opensolaris even though drivers are available 
and installed. maybe wrong drivers?

I'll continue some more, but at this time i'm also out of options.

Tano
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-27 Thread Nigel Smith
Hi Tano
Please check out my post on the storage-forum for another idea
to try which may give further clues:
http://mail.opensolaris.org/pipermail/storage-discuss/2008-October/006458.html
Best Regards
Nigel Smith
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-22 Thread Tano
[EMAIL PROTECTED]:/tmp# cat /var/svc/log/system-iscsitgt\:default.log 
[ Oct 21 09:17:49 Enabled. ]
[ Oct 21 09:17:49 Executing start method (/lib/svc/method/svc-iscsitgt 
start). ]
[ Oct 21 09:17:49 Method start exited with status 0. ]
[ Oct 21 17:02:12 Disabled. ]
[ Oct 21 17:02:12 Rereading configuration. ]
[ Oct 22 12:40:13 Disabled. ]
[ Oct 22 12:40:34 Rereading configuration. ]
[ Oct 22 12:53:35 Enabled. ]
[ Oct 22 12:53:35 Executing start method (/lib/svc/method/svc-iscsitgt 
start). ]
[ Oct 22 12:53:35 Method start exited with status 0. ]
[ Oct 22 12:54:02 Rereading configuration. ]
[ Oct 22 12:54:02 No 'refresh' method defined.  Treating as :true. ]
[ Oct 22 12:54:06 Stopping because service restarting. ]
[ Oct 22 12:54:06 Executing stop method (/lib/svc/method/svc-iscsitgt stop 
90). ]
[ Oct 22 12:54:06 Method stop exited with status 0. ]
[ Oct 22 12:54:06 Executing start method (/lib/svc/method/svc-iscsitgt 
start). ]
[ Oct 22 12:54:06 Method start exited with status 0. ]
[ Oct 22 12:59:15 Rereading configuration. ]
[ Oct 22 12:59:15 No 'refresh' method defined.  Treating as :true. ]
[ Oct 22 12:59:19 Rereading configuration. ]
[ Oct 22 12:59:19 No 'refresh' method defined.  Treating as :true. ]
[EMAIL PROTECTED]:/tmp# 





CPU  REMOTE IP  EVENT   BYTESITT  SCSIOP
  0  138.23.117.29  login-response  0  0  -
  2  138.23.117.29  login-command 587  0  -
  2  138.23.117.29  login-command 587  0  -
  0  138.23.117.29  login-response  0  0  -
  2  138.23.117.29  login-command 587  0  -
  0  138.23.117.29  login-response  0  0  -
  0  138.23.117.29  login-command 587  0  -
  0  138.23.117.29  login-response  0  0  -
  0  138.23.117.29  login-command 600  0  -
  3  138.23.117.29  scsi-command   131072 2201616384  write(10)
  3  138.23.117.29  scsi-command   131072 2218393600  write(10)
  0  138.23.117.29  login-response466  0  -
  3  138.23.117.29  scsi-command0 3226992640  0x0
  0  138.23.117.29  scsi-response   0 3226992640  -
  3  138.23.117.29  scsi-command0 3243769856  0x12
  3  138.23.117.29  data-send   8 3243769856  -
  3  138.23.117.29  scsi-response   8 3243769856  -
  0  138.23.117.29  scsi-command0 3260547072  0x12
  3  138.23.117.29  data-send 152 3260547072  -
  3  138.23.117.29  scsi-response 152 3260547072  -
  0  138.23.117.29  scsi-command0 3277324288  0x12
  3  138.23.117.29  data-send   8 3277324288  -
  3  138.23.117.29  scsi-response   8 3277324288  -
  0  138.23.117.29  nop-receive 0 2268725248  -
  3  138.23.117.29  nop-send0 2268725248  -
  0  138.23.117.29  scsi-command   131072 2285502464  write(10)
  3  138.23.117.29  login-command 587  0  -
  3  138.23.117.29  login-response  0  0  -
  3  138.23.117.29  login-command 587  0  -
  3  138.23.117.29  login-response  0  0  -
  0  138.23.117.29  login-command 587  0  -
  0  138.23.117.29  login-response  0  0  -
  2  138.23.117.29  login-command 587  0  -
  2  138.23.117.29  login-response  0  0  -
  2  138.23.117.29  login-command 587  0  -
  0  138.23.117.29  login-response  0  0  -
  2  138.23.117.29  scsi-command   131072 2302279680  write(10)
  1  138.23.117.29  login-command 600  0  -
  2  138.23.117.29  scsi-command   131072 2319056896  write(10)
  1  138.23.117.29  login-response466  0  -
  1  138.23.117.29  scsi-command0 3294101504  0x0
  1  138.23.117.29  scsi-response   0 3294101504  -
  1  138.23.117.29  scsi-command0 3310878720  0x12
  1  138.23.117.29  data-send   8 3310878720  -
  1  138.23.117.29  scsi-response   8 3310878720  -
  1  138.23.117.29  scsi-command0 3327655936  0x12
  0  138.23.117.29  scsi-command0 3344433152  0x12
  2  138.23.117.29  data-send 152 3327655936  -
  2  138.23.117.29  scsi-response 152 3327655936  -
  0  138.23.117.29  data-send   8 3344433152  -
  0  138.23.117.29  scsi-response   8 3344433152  -
  0  138.23.117.29  nop-receive 0 2369388544  -
  1  138.23.117.29  nop-send0 2369388544  -
  0 

Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-22 Thread Nigel Smith
Well the '/var/svc/log/system-iscsitgt\:default.log'
is NOT showing any core dumps, which is good, but
means that we need to look  think deeper for the answer.

The 'iscsisnoop.d' output does looks similar to that 
captured by Eugene over on the storage forum, but
Eugene only showed a short sequence.
http://mail.opensolaris.org/pipermail/storage-discuss/2008-October/006414.html

Here we have a longer sequence of 'iscsisnoop.d' output
clearly showing the looping, as the error occurs, 
causing the initiator and target to try to re-establish the session.

The question is - what is the root cause,  what is
 just consequential effect.

Tano, it you could also get some debug log messages
from the iscsi target (/tmp/target_log), that would help to
confirm that this is the same (or not) as what Eugene is seeing:
http://mail.opensolaris.org/pipermail/storage-discuss/2008-October/006428.html

It would be useful to modify the 'iscsisnoop.d' to give
timestamps, as this would help to show if there are 
any unusual delays.
And the DTrace iscsi probes have a 'args[1]' which can
give further details on sequence numbers and tags.

Having seen your 'iscsisnoop.d' output, and the '/tmp/target_log'
from  Eugene, I now going back to thinking this IS an iscsi issue,
with the initiator and target mis-interacting in some way,
and NOT a driver/hardware issue.

I know that SUN have recently been doing a lot of stress testing
with the iscsi target and various initiators, including Linux.
I have found the snv_93 and snv_97 iscsi target to work
well with the Vmware ESX and Microsoft initiators.
So it is a surprise to see these problems occurring.
Maybe some of the more resent builds snv_98, 99 have
'fixes' that have cause the problem...
Regards
Nigel Smith
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-21 Thread Nigel Smith
Well, my colleague  myself have recently had a basic Vmare ESX cluster working,
with the Solaris iscsi target, in the Lab at work, so I know it does work.

We used ESX 3.5i on two Dell Precision 390 workstations,
booted from USB memory sticks.
We used snv_97 and no special tweaks required.
We used Vmotion to move a running Windows XP guest
from one ESX host to the another.
Windows XP was playing a video feed at the time.  
It all worked fine.  We repeated the operation three times.
My colleague is the ESX expert, but I believe it was
update 2 with all latest patches applied.
But we only had a single iscsi target setup on the Solaris box,
The target size was 200Gb, formated with VMFS.

Ok, another thing you could try, which may give a clue
to what is going wrong, is to run the 'iscsisnoop.d'
script on the Solaris box.
http://www.solarisinternals.com/wiki/index.php/DTrace_Topics_iSCSI
This is a DTrace script which shows what iscsi target events are happening,
so interesting if it shows anything unusual at the point of failure.

But, I'm beginning to think it could be one of your hardware components
that is playing up, but no clue so far. It could be anywhere on the path.
Maybe you could check the Solaris iScsi target works ok under stress
from something other that ESX, like say the Windows iscsi initiator.
Regards
Nigel Smith
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-21 Thread Eugene Chupriyanov
I have a very similar problem with SNV_(( and Virtual Iron 
(http://www.opensolaris.org/jive/thread.jspa?threadID=79831tstart=0)

I am using IBM x3650 Server with 6 SAS drives. And what we have in common is 
Broadcomm network cards (BNX driver). From previous experiance I know this 
cards had a driver problem in linux. So as a wild guess maybe problem is here? 
Can you try another card in your server? Unfortunately I don't have compatible 
spare card to check it..

Regards,
Eugene
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-21 Thread Tano
The poweredge 1850 has an intel etherexpress pro 1000 internal cards in it. 

However, some new updates, even the microsoft initiator hung writing a 1.5 
gigabyte file to the iscsitarget on the opensolaris box.

I've installed linux iscsitarget on the same box and will reattempt the iscsi 
targets to the microsoft and esx servers.

I'll also get the DTRACE of the iscsi box later this afternoon. 

Sorry for the delay.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-21 Thread Tano
one more update:

common hardware between all my machines soo far has been the PERC (Poweredge 
Raid Controllers) or also known as the LSI MEGA RAID controller.


The 1850 has a PERC 4d/i
the 1900 has a PERC 5/i

I'll be testing the iscsitarget with a SATA controller to test my hypothesis.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-21 Thread Nigel Smith
Hi tano
I hope you can try with the 'iscsisnoop.d' script, so 
we can see if your problem is the same as what Eugene is seeing.

Please can you also check the contents of the file:
/var/svc/log/system-iscsitgt\:default.log
.. just to make sure that the iscsi target is not core dumping  restarting.

I've also done a post on the storage-forum on how to
enable a debug log on the iscsi target, which may also give some clues.
http://mail.opensolaris.org/pipermail/storage-discuss/2008-October/006423.html

It may also be worth trying with a smaller target size,
just to see if that is a factor.
(There have in the past been bugs, now fixed, which triggered with 'large' 
targets.)
As I said, it worked ok for me with a 200Gb target.

Many thanks for all your testing. Please bear with us on this one.
If it is a problem with the Solaris iscsi target we need to get to 
the bottom of the root cause.
Following Eugene report, I'm beginning to fear that some sort of regression
has been introduced into the iscsi target code...
Regards
Nigel Smith
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-20 Thread Tano
A couple of updates:

Installed Opensolaris on a Poweredge 1850 with a single network card, default 
iscsitarget configuration (no special tweaks or tpgt settings), vmotion was 
about 10 percent successful before I received write errors on disk.

10 percent better than the Poweredge 1900 iscsitarget.

The GUID's are set by VMWare when the iscsi initiator connects to the 
Opensolaris Target. Therefore I have no control what the GUIDs are and from my 
observations it doesn't matter with the GUIDs are identical. Unless there is a 
bug in Vmware and GUIDs. 

I have followed the instructions to delete the backing stores, the zfs 
partitions and start a new. I even went as far as rebooting the machine after I 
created a Single LUN, connected to the vmware initiator. I then repeated the 
same steps when creating the second LUN. Overall VMWare determined the GUID # 
of the iscsi target. I 

Right now I am applying a ton of VMWare patches that have iscsi connectivity 
repairs and other security updates. 

I will be resorting back to a linux iscsi target model if the patches do not 
work to check whether the physical machines have an abnormality or networking 
that may be causing problems.

I'll be submitting more updates as I continue testing! 

cliff notes: nothing has worked so far :(
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-18 Thread Nigel Smith
According to the svccfg(1M) man page:
http://docs.sun.com/app/docs/doc/819-2240/svccfg-1m?a=view
...it should be just 'export' without a leading '-' or '--'.

I've been googling on NAA and this is the 'Network Address Authority',
It seems to be yet another way of uniquely identifying a target  Lun,
and is apparently to be compatble with the way that Fibre Channel 
SAS do this. For futher details, see:
http://tools.ietf.org/html/rfc3980
T11 Network Address Authority (NAA) Naming Format for iSCSI Node Names

I also found this blog post:
http://timjacobs.blogspot.com/2008/08/matching-luns-between-esx-hosts-and-vcb.html
...which talks about Vmware ESX and NAA.

For anyone interested in the code fix's to the solaris
iscsi target to support Vmware ESX server, take a look
at these links:
http://hg.genunix.org/onnv-gate.hg/rev/29862a7558ef
http://hg.genunix.org/onnv-gate.hg/rev/5b422642546a

Tano, based on the above, I would say you need
unique GUID's for two separate Targets/LUNS.
Best Regards
Nigel Smith
http://nwsmith.blogspot.com/
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-18 Thread Mike La Spina
Ciao,

Your GUID's must not be the same an NAA is already established on the targets 
and if you previously tried to initialize the LUN with VMware it would have 
assigned the value in the VMFS header wich is now stored on your raw ZFS 
backing store. This will confuse VMware and it will remember it now some where 
in its definitions. You need to remove the second datastore from VMware and 
delete the target definition and ZFS backing store.

Once you recreate the backing and target you should have a new GUID and iqn 
which should cure the issue.

Regards,

Mike
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-17 Thread Tano
Do you have an active interface on the OpenSolaris box that is configured for 
0.0.0.0 right now?  

Not anymore:

By default, since you haven't configured the tpgt on the iscsi target, solaris 
will broadcast all active interfaces in its SendTargets response. On the ESX 
side, ESX will attempt to log into all addresses in that SendTargets 
response, even though you may only put 1 address in the sw initiator config.

THis made a lot of sense and I was flirting with the TPGT idea and this 
motivated me to try it.

If that is the case, you have a few options

a) disable that bogus interface: 
it was a physical interface that has been removed

b) fully configure it and and also create a vmkernel interface that can 
connect to it

disable and removed.

c) configure a tpgt mask on the iscsi target (iscsitadm create tpgt) to 
only use the valid address

Configured... see information below:

Also, I never see target 40 log into anything...is that still a valid 
target number?
You may want to delete everything in /var/lib/iscsi and reboot the host. 
The vmkbinding and vmkdiscovery files will be rebuilt and it will start 
over with target 0. Sometimes, things get a bit crufty

Deleted /var/lib/iscsi contents

=
Now for more information: 

This is the result of what I have tried new:

I removed all extra interfaces from both the ESX host and the Solaris ISCSI 
machine


ESX is now configured with a single interface:

Virtual Switch: vSwitch0
Server: Poweredge 850 
Service Console IP: 138.23.117.20
VMKERNEL (ISCSI)IP: 138.23.117.21
VMNETWORK (VLAN ID 25)

 VMNIC 0 1000 FULL

ISCSI target server
Poweredge 1900
Broadcomm Gigabit TOE interface BXN0: 138.23.117.32

Steps below:

ISCSI TARGET SERVER INTERFACE LIST:
[EMAIL PROTECTED]:~# ifconfig -a
lo0: flags=2001000849UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL mtu 8232 
index 1
inet 127.0.0.1 netmask ff00
bnx0: flags=201000843UP,BROADCAST,RUNNING,MULTICAST,IPv4,CoS mtu 1500 index 2
inet 138.23.117.32 netmask ff00 broadcast 138.23.117.255
ether 0:1e:c9:d5:75:d2
lo0: flags=2002000849UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL mtu 8252 
index 1
inet6 ::1/128
[EMAIL PROTECTED]:~#

ESX HOST INTERFACE LIST
[EMAIL PROTECTED] log]# ifconfig -a
loLink encap:Local Loopback
  inet addr:127.0.0.1  Mask:255.0.0.0
  UP LOOPBACK RUNNING  MTU:16436  Metric:1
  RX packets:21265 errors:0 dropped:0 overruns:0 frame:0
  TX packets:21265 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:0
  RX bytes:11963350 (11.4 Mb)  TX bytes:11963350 (11.4 Mb)

vmnic0Link encap:Ethernet  HWaddr 00:19:B9:F7:ED:DD
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:212272 errors:0 dropped:0 overruns:0 frame:0
  TX packets:3354 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:16606213 (15.8 Mb)  TX bytes:2131622 (2.0 Mb)
  Interrupt:97

vswif0Link encap:Ethernet  HWaddr 00:50:56:40:0D:17
  inet addr:138.23.117.20  Bcast:138.23.117.255  Mask:255.255.255.0
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:2027 errors:0 dropped:0 overruns:0 frame:0
  TX packets:3336 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:495940 (484.3 Kb)  TX bytes:2123882 (2.0 Mb)

[EMAIL PROTECTED] log]#




iscsitadm create target -b /dev/zvol/rdsk/vdrive/LUNA iscsi
iscsitadm create target -b /dev/zvol/rdsk/vdrive/LUNB vscsi
iscsitadm create tpgt 1
iscsitadm modify -i 138.23.117.32 1

[EMAIL PROTECTED]:~# iscsitadm list tpgt -v
TPGT: 1
IP Address: 138.23.117.32
[EMAIL PROTECTED]:~#

iscsitadm modify target -p 1 iscsi
iscsitadm modify target -p 1 vscsi

After assigning a TPGT value to an iscsi target it seemed a little promising. 
But no luck.

[EMAIL PROTECTED]:~# iscsitadm list target -v
Target: vscsi
iSCSI Name: 
iqn.1986-03.com.sun:02:35ec26d8-f173-6dd5-b239-93a9690ffe46.vscsi
Connections: 0
ACL list:
TPGT list:
TPGT: 1
LUN information:
LUN: 0
GUID: 0
VID: SUN
PID: SOLARIS
Type: disk
Size: 1.3T
Backing store: /dev/zvol/rdsk/vdrive/LUNB
Status: online
Target: iscsi
iSCSI Name: 
iqn.1986-03.com.sun:02:4d469663-2304-4796-87a5-dffa03cd14ea.iscsi
Connections: 0
ACL list:
TPGT list:
TPGT: 1
LUN information:
LUN: 0
GUID: 0
VID: SUN
PID: SOLARIS
Type: disk
Size:  750G
Backing store: /dev/zvol/rdsk/vdrive/LUNA
Status: online
[EMAIL PROTECTED]:~#


Now test and logs:

Logged into ESX Infrastructure: 

Add iscsi target ip in Dynamic Discovery iscsi Server in ESX:

LOGS:
Oct 17 06:49:18 vmware-860-1 vmkernel: 0:02:03:01.499 cpu1:1035)iSCSI: bus 0 

Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-17 Thread Mike La Spina
Hello Tano,

The issue here is not the target or VMware but a missing GUID on the target as 
the issue.

Observe the target smf properties using

iscsitadm list target -v

You have

iSCSI Name: iqn.1986-03.com.sun:02:35ec26d8-f173-6dd5-b239-93a9690ffe46.vscsi
Connections: 0
ACL list:
TPGT list:
TPGT: 1
LUN information:
LUN: 0
GUID: 0
VID: SUN
PID: SOLARIS
Type: disk
Size: 1.3T
Backing store: /dev/zvol/rdsk/vdrive/LUNB
Status: online
Target: iscsi
iSCSI Name: iqn.1986-03.com.sun:02:4d469663-2304-4796-87a5-dffa03cd14ea.iscsi
Connections: 0
ACL list:
TPGT list:
TPGT: 1
LUN information:
LUN: 0
GUID: 0
VID: SUN
PID: SOLARIS
Type: disk
Size: 750G
Backing store: /dev/zvol/rdsk/vdrive/LUNA
Status: online
 
Both targets have the same invalid GUID of zero and this will prevent NAA from 
working properly.

To fix this you can create a two new temporary targets and export the smf props 
to an xml file.

e.g. 

svccfg -export iscsitgt  /iscsibackup/myiscsitargetbu.xml

then edit the xml file switching the newly generated guid's to your valid 
targets and zeroing the temp ones.

Now you can import the file with

scvadm import /iscsibackup/myiscsitargetbu.xml

When you restart your iscsitgt server you should have the guids in place and it 
should work with vmware.

The you can delete the temps targets.

http://blog.laspina.ca
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-17 Thread Tano
Hi,

I rebooted the server after I submitted the information to release the locks 
set up on my ESX host.

After the reboot: I reran the iscsitadm list target -v and the GUIDs showed up.

Only interesting problem: the GUID's are identical (any problems with that?)

[EMAIL PROTECTED]:~# iscsitadm list target -v
Target: iscsi
iSCSI Name: 
iqn.1986-03.com.sun:02:4d469663-2304-4796-87a5-dffa03cd14ea.iscsi
Connections: 1
Initiator:
iSCSI Name: iqn.1998-01.com.vmware:vmware-860-1-4403d26f
Alias: vmware-860-1.ucr.edu
ACL list:
TPGT list:
TPGT: 1
LUN information:
LUN: 0
GUID: 600144f048f8fa2a1ec9d575d200
VID: SUN
PID: SOLARIS
Type: disk
Size:  750G
Backing store: /dev/zvol/rdsk/vdrive/LUNA
Status: online
Target: vscsi
iSCSI Name: 
iqn.1986-03.com.sun:02:35ec26d8-f173-6dd5-b239-93a9690ffe46.vscsi
Connections: 1
Initiator:
iSCSI Name: iqn.1998-01.com.vmware:vmware-860-1-4403d26f
Alias: vmware-860-1.ucr.edu
ACL list:
TPGT list:
TPGT: 1
LUN information:
LUN: 0
GUID: 600144f048f8fa2b1ec9d575d200
VID: SUN
PID: SOLARIS
Type: disk
Size: 1.3T
Backing store: /dev/zvol/rdsk/vdrive/LUNB
Status: online
[EMAIL PROTECTED]:~#


when I attempted to run svccfg -export iscsitgt  
/iscsibackup/myiscsibackup.xml  tells me that -e is an illegal option.

svccfg does not have an --export or -export option.
I checked the man pages.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-16 Thread Ryan Arneson
Tano wrote:
 I'm not sure if this is a problem with the iscsitarget or zfs. I'd greatly 
 appreciate it if it gets moved to the proper list.

 Well I'm just about out of ideas on what might be wrong..

 Quick history:

 I installed OS 2008.05 when it was SNV_86 to try out ZFS with VMWare. Found 
 out that multilun's were being treated as multipaths so waited till SNV_94 
 came out to fix the issues with VMWARE and iscsitadm/zfs shareiscsi=on.

 I Installed OS2008.05 on a virtual machine as a test bed, pkg image-update to 
 SNV_94 a month ago, made some thin provisioned partitions, shared them with 
 iscsitadm and mounted on VMWare without any problems. Ran storage VMotion and 
 all went well.

 So with this success I purchased a Dell 1900 with a PERC 5/i controller 6 x 
 15K SAS DRIVEs with ZFS RAIDZ1 configuration. I shared the zfs partitions and 
 mounted them on VMWare. Everything is great till I have to write to the disks.

 It won't write!
   

What's the error exactly?
What step are you performing to get the error? Creating the vmfs3 
filesystem? Accessing the mountpoint?


 Steps I took creating the disks

 1) Installed mega_sas drivers.
 2) zpool create tank raidz c5t0d0 c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0
 3) zfs create -V 1TB tank/disk1
 4) zfs create -V 1TB tank/disk2
 5) iscsitadm create target -b /dev/zvol/rdsk/tank/disk1 LABEL1
 6) iscsitadm create target -b /dev/zvol/rdsk/tank/disk2 LABEL2

 Now both drives are lun 0 but with uniqu VMHBA device identifiers. SO they 
 are detected as seperate drives.

 I then redid (deleted) step 5 and 6 and changed it too

 5) iscsitadm create target -u 0 -b /dev/zvol/rdsk/tank/disk1 LABEL1
 6) iscsitadm create target -u 1 -b /dev/zvol/rdsk/tank/disk2 LABEL1

 VMWARE discovers the seperate LUNs on the Device identifier, but still unable 
 to write to the iscsi luns.

 Why is it that the steps I've conducted in SNV_94 work but in SNV_97,98, or 
 99 don't.

 Any ideas?? any log files I can check? I am still an ignorant linux user so I 
 only know to look in /var/log :)
   
The relevant errors from /var/log/vmkernel on the ESX server would be 
helpful.

Also, iscsitadm list target -v

Also, I blogged a bit on OpenSolaris iSCSI  VMware ESX I was using 
b98 on a X4500.

http://blogs.sun.com/rarneson/entry/zfs_clones_iscsi_and_vmware

 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


-- 
Ryan Arneson
Sun Microsystems, Inc.
303-223-6264
[EMAIL PROTECTED]
http://blogs.sun.com/rarneson

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-16 Thread Tano
Thank you Ryan for your response, I have included all the information you 
requested in line to this document:

I will be testing SNV_86 again to see whether the problem persists, maybe it's 
my hardware. I will confirm that soon enough.

On Thu, October 16, 2008 10:31 am, Ryan Arneson wrote:
 Tano wrote:
 I'm not sure if this is a problem with the iscsitarget or zfs. I'd greatly
 appreciate it if it gets moved to the proper list.

 Well I'm just about out of ideas on what might be wrong..

 Quick history:

 I installed OS 2008.05 when it was SNV_86 to try out ZFS with VMWare. Found
 out that multilun's were being treated as multipaths so waited till SNV_94
 came out to fix the issues with VMWARE and iscsitadm/zfs shareiscsi=on.

 I Installed OS2008.05 on a virtual machine as a test bed, pkg image-update
 to SNV_94 a month ago, made some thin provisioned partitions, shared them
 with iscsitadm and mounted on VMWare without any problems. Ran storage
 VMotion and all went well.

 So with this success I purchased a Dell 1900 with a PERC 5/i controller 6 x
 15K SAS DRIVEs with ZFS RAIDZ1 configuration. I shared the zfs partitions
 and mounted them on VMWare. Everything is great till I have to write to the
 disks.

 It won't write!

 
 What's the error exactly?

From the VMWARE Infrastructure front end, everything looks like is in order.
I Send Targets to the iscsi IP, then rescan the HBA and it detects all the 
LUNs and Targets.

 What step are you performing to get the error? Creating the vmfs3
 filesystem? Accessing the mountpoint?

The error occurs when attempting to write large data sets to the mount point. 
Formatting the drive VMFS3 works, manually copying 5 megabytes of data to the 
Target works. Running cp -a of the VM folder or cold VM migration will hang the 
infrastructure client and the ESX host lags. No timeouts of any sort will 
occur. I waited up to an hour.

 

 Steps I took creating the disks

 1) Installed mega_sas drivers.
 2) zpool create tank raidz c5t0d0 c5t1d0 c5t2d0 c5t3d0 c5t4d0 c5t5d0
 3) zfs create -V 1TB tank/disk1
 4) zfs create -V 1TB tank/disk2
 5) iscsitadm create target -b /dev/zvol/rdsk/tank/disk1 LABEL1
 6) iscsitadm create target -b /dev/zvol/rdsk/tank/disk2 LABEL2

 Now both drives are lun 0 but with uniqu VMHBA device identifiers. SO they
 are detected as seperate drives.

 I then redid (deleted) step 5 and 6 and changed it too

 5) iscsitadm create target -u 0 -b /dev/zvol/rdsk/tank/disk1 LABEL1
 6) iscsitadm create target -u 1 -b /dev/zvol/rdsk/tank/disk2 LABEL1

 VMWARE discovers the seperate LUNs on the Device identifier, but still
 unable to write to the iscsi luns.

 Why is it that the steps I've conducted in SNV_94 work but in SNV_97,98, or
 99 don't.

 Any ideas?? any log files I can check? I am still an ignorant linux user so
 I only know to look in /var/log :)

 The relevant errors from /var/log/vmkernel on the ESX server would be
 helpful.
 

So I weeded out the best that I could the logs from /var/log/vmkernel. 
Basically everytime I initiated a command from vmware I captured the logs.  I 
have broken down what I was doing at what point in the logs.


Again the complete breakdown of both systems: 

[b]VMware ESX 3.5 Update 2[/b]
[EMAIL PROTECTED] log]# uname -a
Linux vmware-860-1.ucr.edu 2.4.21-57.ELvmnix #1 Tue Aug 12 17:28:03 PDT 2008 
i686 i686 i386 GNU/Linux

[EMAIL PROTECTED] log]# arch
i686

[b]Opensolaris:[/b] 
Dell Poweredge 1900 PERC 5/i  6 disk 450GB each SAS 15kRPM
Broadcomm BNX driver: no conflicts. Quadcore 1600 Mhz 1066 FSB 8 GB RAM
 
[EMAIL PROTECTED]:~# uname -a
SunOS iscsi-sas 5.11 snv_99 i86pc i386 i86pc Solaris

[EMAIL PROTECTED]:~# isainfo -v
64-bit amd64 applications
ssse3 cx16 mon sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu
32-bit i386 applications
ssse3 ahf cx16 mon sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu


[EMAIL PROTECTED]:~# zpool status -v
  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c3t0d0s0  ONLINE   0 0 0
c3t1d0ONLINE   0 0 0

errors: No known data errors

  pool: vdrive
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
vdrive  ONLINE   0 0 0
  raidz1ONLINE   0 0 0
c5t0d0  ONLINE   0 0 0
c5t1d0  ONLINE   0 0 0
c5t2d0  ONLINE   0 0 0
c5t3d0  ONLINE   0 0 0
c5t4d0  ONLINE   0 0 0
c5t5d0  ONLINE   0 0 0

errors: No known data errors
[EMAIL PROTECTED]:~#

[EMAIL PROTECTED]:~# zfs create -V 750G vdrive/LUNA
[EMAIL PROTECTED]:~# zfs create -V 1250G vdrive/LUNB

[EMAIL PROTECTED]:~# zfs list
NAME   USED  AVAIL  REFER  MOUNTPOINT
rpool  

Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-16 Thread Tano
Also I had read your blog post previously.

I will be taking advantage of the cloning/snapshot section of your blog once I 
am successful writing to the Targets.

Thanks again!
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-16 Thread Nigel Smith
I googled on some sub-strings from your ESX logs
and found these threads on the VmWare forum 
which lists similar error messages,
 suggests some actions to try on the ESX server:

http://communities.vmware.com/message/828207

Also, see this thread:

http://communities.vmware.com/thread/131923

Are you using multiple Ethernet connections between the OpenSolaris box
and the ESX server?
Your 'iscsitadm list target -v' is showing Connections: 0,
so run that command after the  ESX server initiator has
successfully connected to the OpenSolaris iscsi target,
and post that output.
The log files seem to show the iscsi session has dropped out,
and the initiator is auto retrying to connect to the target, 
but failing. It may help to get a packet capture at this stage
to try  see why the logon is failing.
Regards
Nigel Smith
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] HELP! SNV_97, 98, 99 zfs with iscsitadm and VMWare!

2008-10-16 Thread Ryan Arneson
Nigel Smith wrote:
 I googled on some sub-strings from your ESX logs
 and found these threads on the VmWare forum 
 which lists similar error messages,
  suggests some actions to try on the ESX server:

 http://communities.vmware.com/message/828207

 Also, see this thread:

 http://communities.vmware.com/thread/131923

 Are you using multiple Ethernet connections between the OpenSolaris box
 and the ESX server?
   
Indeed, I think there might be some notion of 2 separate interfaces. I 
see 0.0.0.0 and the 138.xx.xx.xx networks.

Oct 16 06:38:29 vmware-860-1 vmkernel: 0:02:03:00.166 cpu1:1080)iSCSI: bus 0 
target 40 trying to establish session 0x9a684e0 to portal 0, address 0.0.0.0 
port 3260 group 1

Oct 16 06:16:30 vmware-860-1 vmkernel: 0:01:41:01.021 cpu1:1076)iSCSI: bus 0 
target 38 established session 0x9a402c0 #1 to portal 0, address 138.23.117.32 
port 3260 group 1, alias luna


Do you have an active interface on the OpenSolaris box that is configured for 
0.0.0.0 right now? By default, since you haven't configured the tpgt on the 
iscsi target, solaris will broadcast all active interfaces in its SendTargets 
response. On the ESX side, ESX will attempt to log into all addresses in that 
SendTargets response, even though you may only put 1 address in the sw 
initiator config.

If that is the case, you have a few options

a) disable that bogus interface
b) fully configure it and and also create a vmkernel interface that can 
connect to it
c) configure a tpgt mask on the iscsi target (iscsitadm create tpgt) to 
only use the valid address

Also, I never see target 40 log into anything...is that still a valid 
target number?
You may want to delete everything in /var/lib/iscsi and reboot the host. 
The vmkbinding and vmkdiscovery files will be rebuilt and it will start 
over with target 0. Sometimes, things get a bit crufty.


-ryan

 Your 'iscsitadm list target -v' is showing Connections: 0,
 so run that command after the  ESX server initiator has
 successfully connected to the OpenSolaris iscsi target,
 and post that output.
 The log files seem to show the iscsi session has dropped out,
 and the initiator is auto retrying to connect to the target, 
 but failing. It may help to get a packet capture at this stage
 to try  see why the logon is failing.
 Regards
 Nigel Smith
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss