Ubuntu server + Open-iscsi + multipath + ocfs2. Connectivity loss causes immediate server reboot.

2012-06-14 Thread Jiří Červenka
Hi,
on Ubuntu server 12.04 (3.2.0-24-generic) I use open-iscsi (2.0-871), 
multipath-tools (v0.4.9) and ocfs2 (1.6.3-4ubuntu1) to access shared 
storage HP P2000 G3 iscsi. Even short network connectivity loss is causing 
immediate server crash and reboot. In syslog I can not found any clue what 
might hapened.

Configuration:

iscsid.conf
 Code:

node.conn[0].startup = automatic
node.startup = automatic
node.session.timeo.replacement_timeout = 180
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.initial_login_retry_max = 4
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.xmit_thread_priority = -20
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.session.iscsi.FastAbort = No

multipath.conf:
 Code:

defaults {
   udev_dir/dev
   polling_interval10
   selectorround-robin 0
   path_grouping_policymultibus
   getuid_callout  /lib/udev/scsi_id --whitelisted 
--device=/dev/%n
   prioconst
   path_checkerdirectio
   rr_min_io   100
   flush_on_last_del   no
   max_fds 8192
   rr_weight   priorities
   failbackimmediate
   no_path_retry   fail
   queue_without_daemonno
   user_friendly_names no
   mode644
   uid 0
   gid disk
}
blacklist {
   wwid 26353900f02796769
   devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
   devnode ^hd[a-z][[0-9]*]
   devnode ^sda1
   device {
   vendor DEC.*
   product MSA[15]00
   }
}
multipaths {
multipath {
wwid3600c0ff000127311ab8dcc4f0100
}
multipath {
wwid3600c0ff0001273712d8dcc4f0100
}
multipath {
wwid3600c0ff000127311cd8dcc4f0100
}
}
devices {
   device {
   vendor  HP
   product P2000 G3 FC|P2000 G3 iSCSI
   path_grouping_policygroup_by_prio
   getuid_callout  /lib/udev/scsi_id --whitelisted 
--device=/dev/%n
   path_checkertur
   path_selector   round-robin 0
   hardware_handler0
   prioalua
   failbackimmediate
   rr_weight   uniform
   no_path_retry   18
   rr_min_io   100
   }
}

cluster.conf:
 Code:

node:
name = node1
cluster = ocfs2
number = 0
ip_address = 192.168.1.11
ip_port = 
node:
name = node2
cluster = ocfs2
number = 1
ip_address = 192.168.1.12
ip_port = 
cluster:
name = ocfs2
node_count = 2

How can I troubleshoot this issue? 
Thanks for any help.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/open-iscsi/-/N4tSTbooQm0J.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



re: [SCSI] qla4xxx: support iscsiadm session mgmt

2012-06-14 Thread Dan Carpenter
Hi Manish,

The patch b3a271a94d00: [SCSI] qla4xxx: support iscsiadm session 
mgmt from Jul 25, 2011, leads to the following warning:
drivers/scsi/qla4xxx/ql4_os.c:4479 qla4xxx_get_ep_fwdb()
 warn: casting from 16 to 28 bytes

(Sort of).

drivers/scsi/qla4xxx/ql4_os.c qla4xxx_ep_connect()
   705  qla_ep = ep-dd_data;
   706  memset(qla_ep, 0, sizeof(struct qla_endpoint));
   707  if (dst_addr-sa_family == AF_INET) {
   708  memcpy(qla_ep-dst_addr, dst_addr, sizeof(struct 
sockaddr_in));
   709  addr = (struct sockaddr_in *)qla_ep-dst_addr;
   710  DEBUG2(ql4_printk(KERN_INFO, ha, %s: %pI4\n, __func__,
   711(char *)addr-sin_addr));
   712  } else if (dst_addr-sa_family == AF_INET6) {
   713  memcpy(qla_ep-dst_addr, dst_addr,
^^^
   714 sizeof(struct sockaddr_in6));
   

Both qla_ep-dst_addr and dst_addr are type struct sockaddr.  We are
copying sizeof(struct sockaddr_in6) bytes which is 12 bytes larger.  I
don't know the actual size of qla_ep-dst_addr but dst_addr is allocated
in qla4xxx_get_ep_fwdb() as a struct sockaddr.  So we are copying past
the end of the struct here and it's possibly an information leak or even
a memory corruption issue depending on how much space ep-dd_data has.

   715  addr6 = (struct sockaddr_in6 *)qla_ep-dst_addr;
   716  DEBUG2(ql4_printk(KERN_INFO, ha, %s: %pI6\n, __func__,
   717(char *)addr6-sin6_addr));
   718  }

regards,
dan carpenter

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Ubuntu server + open-iscsi + ocfs2. Connectivity loss causes immediate server reboot.

2012-06-14 Thread Jiří Červenka
Hi,
on Ubuntu server 12.04 (3.2.0-24-generic) I use open-iscsi (2.0-871), 
multipath-tools (v0.4.9) and ocfs2 (1.6.3-4ubuntu1) to access shared 
storage HP P2000 G3 iscsi. Even short network connectivity loss is causing 
immediate server crash and reboot. In syslog I can not found any clue what 
might hapened.

Configuration:

iscsid.conf:

node.conn[0].startup = automatic
node.startup = automatic
node.session.timeo.replacement_timeout = 180
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.initial_login_retry_max = 4
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.xmit_thread_priority = -20
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.session.iscsi.FastAbort = No


multipath.conf:

efaults {
   udev_dir/dev
   polling_interval10
   selectorround-robin 0
   path_grouping_policymultibus
   getuid_callout  /lib/udev/scsi_id --whitelisted 
--device=/dev/%n
   prioconst
   path_checkerdirectio
   rr_min_io   100
   flush_on_last_del   no
   max_fds 8192
   rr_weight   priorities
   failbackimmediate
   no_path_retry   fail
   queue_without_daemonno
   user_friendly_names no
   mode644
   uid 0
   gid disk
}
blacklist {
   wwid 26353900f02796769
   devnode ^(ram|raw|loop|fd|md|dm-|sr|scd|st)[0-9]*
   devnode ^hd[a-z][[0-9]*]
   devnode ^sda1
   device {
   vendor DEC.*
   product MSA[15]00
   }
}
multipaths {
multipath {
wwid3600c0ff000127311ab8dcc4f0100
}
multipath {
wwid3600c0ff0001273712d8dcc4f0100
}
multipath {
wwid3600c0ff000127311cd8dcc4f0100
}
}
devices {
   device {
   vendor  HP
   product P2000 G3 FC|P2000 G3 iSCSI
   path_grouping_policygroup_by_prio
   getuid_callout  /lib/udev/scsi_id --whitelisted 
--device=/dev/%n
   path_checkertur
   path_selector   round-robin 0
   hardware_handler0
   prioalua
   failbackimmediate
   rr_weight   uniform
   no_path_retry   18
   rr_min_io   100
   }
}


cluster.conf:

node:
name = node1
cluster = ocfs2
number = 0
ip_address = 192.168.1.11
ip_port = 
node:
name = node2
cluster = ocfs2
number = 1
ip_address = 192.168.1.12
ip_port = 
cluster:
name = ocfs2
node_count = 2


How can I troubleshoot this issue? 
Thanks for any help.

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To view this discussion on the web visit 
https://groups.google.com/d/msg/open-iscsi/-/Lq8eYlEXJrkJ.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Ubuntu server + Open-iscsi + multipath + ocfs2. Connectivity loss causes immediate server reboot.

2012-06-14 Thread Mike Christie
On 06/14/2012 06:22 AM, Jiří Červenka wrote:
 Hi,
 on Ubuntu server 12.04 (3.2.0-24-generic) I use open-iscsi (2.0-871),
 multipath-tools (v0.4.9) and ocfs2 (1.6.3-4ubuntu1) to access shared
 storage HP P2000 G3 iscsi. Even short network connectivity loss is
 causing immediate server crash and reboot. In syslog I can not found any
 clue what might hapened.
 

Are you doing iscsi root (or iscsi + multipath root)? Does ubuntu have
iscsid running?

ps -u root | grep iscsid

?

Are you able to get a oops trace or anything?

-- 
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.



Re: Ubuntu server + open-iscsi + ocfs2. Connectivity loss causes immediate server reboot.

2012-06-14 Thread Mark Lehrer
storage HP P2000 G3 iscsi. Even short network connectivity loss is 
causing immediate server crash and reboot. In syslog I can not found any 


Does ocfs2 have a fencing arrangement like Red Hat's gfs ?  This is a 
feature actually, clustered filesystems don't want rogue machines writing 
straight to the block device.


If this is the problem, there should be a way to tweak the timeouts.  Red 
Hat's cluster system has an overwhelming amount of options available.


FWIW, we gave up on Red Hat Cluster  GFS, which is similar to OCFS, due to 
this type of outage.  We switched to ASM and NFS.  ASM works great with 
iscsi.


Mark

--
You received this message because you are subscribed to the Google Groups 
open-iscsi group.
To post to this group, send email to open-iscsi@googlegroups.com.
To unsubscribe from this group, send email to 
open-iscsi+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/open-iscsi?hl=en.