[Bug 211990] iscsi fails to reconnect and does not release devices

2017-04-11 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

Edward Tomasz Napierala  changed:

   What|Removed |Added

 Status|Open|Closed
 Resolution|--- |Works As Intended

--- Comment #44 from Edward Tomasz Napierala  ---
>From what I see, it's a networking problem, not a bug in iSCSI.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2017-01-06 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #43 from Julien Cigar  ---
(In reply to Ben RUBSON from comment #41)

as soon as I have a little time .. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2017-01-06 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #42 from Julien Cigar  ---
also, looks similar to bug #215727

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2017-01-06 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #41 from Ben RUBSON  ---
OK, so same issue as mine (for which I opened this bug report), with 10.3.
Which I was not able to reproduce with 11.
Any plan to upgrade to 11 to be safe ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2017-01-06 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #40 from Julien Cigar  ---
I had to reboot..

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2017-01-05 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #39 from Ben RUBSON  ---
Strange, as if mbufs were "consumed" but not released.
And when devices are disconnected, can you reconnect to them properly ?
Or do you need to change their target name or to reboot, as explained in my
first message of this bug report ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2017-01-05 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #38 from Julien Cigar  ---
(In reply to Ben RUBSON from comment #37)

it only delayed the problem, .. but it would be interesting if you could test
on your side..!

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2017-01-05 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #37 from Ben RUBSON  ---
And no improvements ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2017-01-05 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #36 from Julien Cigar  ---
(In reply to Ben RUBSON from comment #35)

yes, .. but only slightly as the machine has only 8GB of RAM

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2017-01-04 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #35 from Ben RUBSON  ---
Did you try to increase some sysctls with jumbo frames on, as proposed by
Steven ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-12-15 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #34 from Julien Cigar  ---
No problems for more than a month now.. I tend to conclude that the MTU 9000
was the root cause of this problem (?)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-30 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #33 from Julien Cigar  ---
an update on this: I turned off jumbo frames and TSO on the replication
interface and the issue hasn't been repeated until now...!

filer1% uptime
10:15AM  up 18 days, 13:35, 1 user, load averages: 0.07, 0.09, 0.08

filer1% vmstat -z|head -1;vmstat -z|grep -i 'mbuf'
ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP
mbuf_packet:256, 3200070,1024,3403,527205862,   0,   0
mbuf:   256, 3200070,   6,2752,456285246,   0,   0
mbuf_cluster:  2048, 500010,4427, 355,18287308,   0,   0
mbuf_jumbo_page:   4096, 250005,   0,  71, 2682330,   0,   0
mbuf_jumbo_9k: 9216,  74075,   0,   0,   0,   0,   0
mbuf_jumbo_16k:   16384,  41667,   0,   0,   0,   0,   0
mbuf_ext_refcnt:  4,  0,   0,   0,   0,   0,   0

filer1% netstat -m
1031/6154/7185 mbufs in use (current/cache/total)
1024/3758/4782/500010 mbuf clusters in use (current/cache/total/max)
1024/3403 mbuf+clusters out of packet secondary zone in use (current/cache)
0/71/71/250005 4k (page size) jumbo clusters in use (current/cache/total/max)
0/0/0/74075 9k jumbo clusters in use (current/cache/total/max)
0/0/0/41667 16k jumbo clusters in use (current/cache/total/max)
2305K/9338K/11644K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/0/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

filer1% netstat -I bge1
NameMtu Network   Address  Ipkts Ierrs IdropOpkts Oerrs
 Coll
bge1   1500   xx:xx:xx:xx:xx:xx 497577171 0 0 262946639
0 0
bge1  - 10.20.30.010.20.30.31   497575590 - - 262942638
- -

filer1% ifconfig bge1
bge1: flags=8843 metric 0 mtu 1500
   
options=c009b
ether xx:xx:xx:xx:xx:xx
inet 10.20.30.31 netmask 0xff00 broadcast 10.20.30.255 
nd6 options=29
media: Ethernet autoselect (1000baseT )
status: active

(I'll let you know if the problem occurs again)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #32 from Julien Cigar  ---
(In reply to Steven Hartland from comment #30)

Yes I already checked the error counters, but it's 0:

root@filer1:/var/log # netstat -I bge1
NameMtu Network   Address  Ipkts Ierrs IdropOpkts Oerrs
 Coll
bge1   9000   d0:bf:9c:3a:c6:e1 77755195 0   152 46072898 0
0
bge1  - 10.20.30.010.20.30.31   77753810 - - 74150354 -
-

... so I guess it's not a driver issue

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #31 from Julien Cigar  ---
Y(In reply to Steven Hartland from comment #29)

Yes it seems so:

root@filer1:/var/log # vmstat -z | head -n 1; vmstat -z | grep mbuf
ITEM   SIZE  LIMIT USED FREE  REQ FAIL SLEEP
mbuf_packet:256, 3200070,1024,3431,94881716,   0,   0
mbuf:   256, 3200070, 257,3103,566359041,   0,   0
mbuf_cluster:  2048, 500010,4455, 133,15367798,   0,   0
mbuf_jumbo_page:   4096, 250005,   0, 161,102798544,   0,   0
mbuf_jumbo_9k: 9216,  74075, 256, 120,52036055, 304,   0
mbuf_jumbo_16k:   16384,  41667,   0,   0,   0,   0,   0
mbuf_ext_refcnt:  4,  0,   0,   0,   0,   0,   0

> You'd have to catch it when the issue is occurring to confirm

not so easy :) but every time I have a WARNING: 10.20.30.32
(iqn.2016-08.lan.prod:target0): no ping reply (NOP-In) after 30 seconds;
reconnecting

I have a small increase in the "requests for 9k jumbo clusters denied" counter

> If you have plenty of ram you could just trying doubling kern.ipc.nmbjumbo9 
> to see if the problem goes away?

The machine has only 8GB of RAM (and ZFS), I think I'll first test with a 1500
MTU and -tso

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #30 from Steven Hartland  ---
If its dropping in the driver due to failure go get a jumbo mbuf you should see
this in the output netstat -i.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #29 from Steven Hartland  ---
I'm assuming vmstat agrees with netstat -m e.g.
vmstat -z | head -n 1; vmstat -z | grep mbuf

It could be that the disconnect is the cause of mbuf issue and not the other
way round. You'd have to catch it when the issue is occurring to confirm.

If you have plenty of ram you could just trying doubling kern.ipc.nmbjumbo9 to
see if the problem goes away?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #28 from Steven Hartland  ---
checksums a likely just due to hw checksum offloading.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #27 from Steven Hartland  ---
checksums a likely just due to hw checksum offloading.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #26 from Julien Cigar  ---
I don't know if it's related but I also noticed some incorrect cksum:

00:14:03.380734 IP (tos 0x0, ttl 64, id 19369, offset 0, flags [DF], proto TCP
(6), length 100, bad cksum 0 (->9e84)!)
10.20.30.31.24995 > 10.20.30.32.iscsi-target: Flags [P.], cksum 0x50bd
(incorrect -> 0xfff9), seq 1205926059:1205926107, ack 1518444144, win 16384,
options [nop,nop,TS val 1388742470 ecr 2474468733], length 48

00:14:03.380742 IP (tos 0x0, ttl 64, id 19370, offset 0, flags [DF], proto TCP
(6), length 100, bad cksum 0 (->9e83)!)
10.20.30.31.55645 > 10.20.30.32.iscsi-target: Flags [P.], cksum 0x50bd
(incorrect -> 0x3912), seq 3104128225:3104128273, ack 1948017382, win 16384,
options [nop,nop,TS val 1388742470 ecr 4001288677], length 48

00:14:03.381060 IP (tos 0x0, ttl 64, id 54596, offset 0, flags [DF], proto TCP
(6), length 100)
10.20.30.32.iscsi-target > 10.20.30.31.55645: Flags [P.], cksum 0x4c6a
(correct), seq 1:49, ack 48, win 16384, options [nop,nop,TS val 4001289988 ecr
1388742470], length 48

00:14:03.381070 IP (tos 0x0, ttl 64, id 54597, offset 0, flags [DF], proto TCP
(6), length 100)
10.20.30.32.iscsi-target > 10.20.30.31.24995: Flags [P.], cksum 0x0d62
(correct), seq 1:49, ack 48, win 16384, options [nop,nop,TS val 2474470019 ecr
1388742470], length 48

... but I guess it's because TSO is enabled on the interfaces ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #25 from Julien Cigar  ---
(In reply to Steven Hartland from comment #24)

I don't have any mention of nmbjumbo9 in /var/log/messages, I haven't raised
kern.ipc.nmbjumbo9 yet, currently it's:

root@filer1:/var/log # sysctl kern.ipc.nmbjumbo9
kern.ipc.nmbjumbo9: 25

but it does not seem necessary to raise it if I can trust netstat -m:

root@filer1:/var/log # netstat -m
1282/6533/7815 mbufs in use (current/cache/total)
1024/3564/4588/500010 mbuf clusters in use (current/cache/total/max)
1024/3431 mbuf+clusters out of packet secondary zone in use (current/cache)
0/161/161/250005 4k (page size) jumbo clusters in use (current/cache/total/max)
256/120/376/74075 9k jumbo clusters in use (current/cache/total/max)
0/0/0/41667 16k jumbo clusters in use (current/cache/total/max)
4672K/10485K/15157K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/304/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

(the box isn't heavily loaded)

what I don't understand if why there are requests for 4k jumbo clusters denied
as there are plenty free .. ?

could it be related to this change:
https://svnweb.freebsd.org/base/head/sys/dev/bge/if_bge.c?r1=276299=276750 ?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

Steven Hartland  changed:

   What|Removed |Added

 CC||s...@freebsd.org

--- Comment #24 from Steven Hartland  ---
Any mention of nmbjumbo9 in /var/log/messages?

Specifically do you see:
kern.ipc.nmbjumbo9 limit reached

If so have you tried bumping kern.ipc.nmbjumbo9?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #23 from Julien Cigar  ---
Ok.. I definitively think it's an issue with jumbo frames, I got another
disconnection and again the "requests for jumbo clusters denied" (9k) counter
increased:

0/304/0 requests for jumbo clusters denied (4k/9k/16k)

any idea?

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #22 from Julien Cigar  ---
Yep it's strange, I will definitively upgrade to 11 when possible ..

Other infos:

root@filer1:/home/jcigar # netstat -m
1285/6575/7860 mbufs in use (current/cache/total)
1024/3526/4550/500010 mbuf clusters in use (current/cache/total/max)
1024/3178 mbuf+clusters out of packet secondary zone in use (current/cache)
0/267/267/250005 4k (page size) jumbo clusters in use (current/cache/total/max)
256/181/437/74075 9k jumbo clusters in use (current/cache/total/max)
0/0/0/41667 16k jumbo clusters in use (current/cache/total/max)
4673K/11392K/16066K bytes allocated to network (current/cache/total)
0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters)
0/0/0 requests for jumbo clusters delayed (4k/9k/16k)
0/214/0 requests for jumbo clusters denied (4k/9k/16k)
0 requests for sfbufs denied
0 requests for sfbufs delayed
0 requests for I/O initiated by sendfile

root@filer1:/home/jcigar # sysctl kern.ipc | fgrep nmb
kern.ipc.nmbufs: 3200070
kern.ipc.nmbjumbo16: 18
kern.ipc.nmbjumbo9: 25
kern.ipc.nmbjumbop: 250005
kern.ipc.nmbclusters: 500010

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #21 from Ben RUBSON  ---
I was also using jumbo frames with 10.3, but unfortunately I did not look at
dropped packets when I had the issues.
Note that with 11, I still use jumbo frames and never encounter this issue
again.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #20 from Julien Cigar  ---
Also I noticed some Idrop packets on the iscsi interface of the initiator side:

root@filer1:/home/jcigar # netstat -I bge1
NameMtu Network   Address  Ipkts Ierrs IdropOpkts Oerrs
 Coll
bge1   9000   d0:bf:9c:3a:c6:e1 77396222 0   107 45800806 0
0
bge1  - 10.20.30.010.20.30.31   77394879 - - 73726273 -
-

and:

root@filer1:/home/jcigar # vmstat -z | grep -i 'mbuf_jumbo_9k:'
mbuf_jumbo_9k: 9216,  74075, 256, 161,51922039, 214,   0

could it be related to jumbo frames?

(on the target side counters are 0)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #19 from Julien Cigar  ---
this is what I have on the target:

root@filer2:/home/jcigar # sysctl -a|grep -i 'iscsi'
kern.iscsi.fail_on_shutdown: 1
kern.iscsi.fail_on_disconnection: 1
kern.iscsi.maxtags: 255
kern.iscsi.login_timeout: 30
kern.iscsi.iscsid_timeout: 30
kern.iscsi.ping_timeout: 30
kern.iscsi.debug: 1
kern.cam.ctl.iscsi.maxcmdsn_delta: 256
kern.cam.ctl.iscsi.login_timeout: 60
kern.cam.ctl.iscsi.ping_timeout: 5
kern.cam.ctl.iscsi.debug: 1

and the following in the error logs:
WARNING: 10.20.30.31 (iqn.1994-09.org.freebsd:filer1.prod.lan): connection
error; dropping connection
WARNING: 10.20.30.31 (iqn.1994-09.org.freebsd:filer1.prod.lan): connection
error; dropping connection
WARNING: 10.20.30.31 (iqn.1994-09.org.freebsd:filer1.prod.lan): connection
error; dropping connection

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-11-10 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #18 from Julien Cigar  ---
Problem appeared again today, after ~15 days of uptime, always on FreeBSD
filer1.prod.lan 10.3-RELEASE-p11 FreeBSD 10.3-RELEASE-p11 #0: Mon Oct 24
18:49:24 UTC 2016
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target0): no ping reply (NOP-In)
after 5 seconds; reconnecting
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target1): no ping reply (NOP-In)
after 5 seconds; reconnecting
(da3:iscsi1:0:0:0): READ(10). CDB: 28 00 01 ef ec 90 00 00 01 00 
(da3:iscsi1:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi2:0:0:0): READ(10). CDB: 28 00 01 ef ec 8e 00 00 01 00 
(da3:(da2:iscsi2:0:0:0): CAM status: CCB request aborted by the host
iscsi1:0:(da2:0:iscsi2:0:0): 0:Retrying command
0): Retrying command
da3 at iscsi1 bus 0 scbus4 target 0 lun 0
da3:  s/n MYSERIAL   0 detached
da2 at iscsi2 bus 0 scbus3 target 0 lun 0
da2:  s/n MYSERIAL   1 detached
(da2:iscsi2:0:0:0): Periph destroyed
(da3:iscsi1:0:0:0): Periph destroyed
da2 at iscsi2 bus 0 scbus3 target 0 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number MYSERIAL   1
da2: 150.000MB/s transfers
da2: Command Queueing enabled
da2: 1840144MB (471076881 4096 byte sectors)
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target1): no ping reply (NOP-In)
after 5 seconds; reconnecting
(da2:iscsi2:0:0:0): READ(10). CDB: 28 00 1c 14 10 0f 00 00 01 00 
(da2:iscsi2:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi2:0:0:0): Retrying command
da2 at iscsi2 bus 0 scbus3 target 0 lun 0
da2:  s/n MYSERIAL   1 detached
(da2:iscsi2:0:0:0): Periph destroyed
da2 at iscsi2 bus 0 scbus3 target 0 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number MYSERIAL   1
da2: 150.000MB/s transfers
da2: Command Queueing enabled
da2: 1840144MB (471076881 4096 byte sectors)
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target0): login timed out after 6
seconds; reconnecting
da3 at iscsi1 bus 0 scbus4 target 0 lun 0
da3:  Fixed Direct Access SPC-4 SCSI device
da3: Serial Number MYSERIAL   0
da3: 150.000MB/s transfers
da3: Command Queueing enabled
da3: 1840144MB (471076881 4096 byte sectors)
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target1): no ping reply (NOP-In)
after 5 seconds; reconnecting
(da2:iscsi2:0:0:0): READ(10). CDB: 28 00 1c 14 10 0f 00 00 01 00 
(da2:iscsi2:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi2:0:0:0): Retrying command
da2 at iscsi2 bus 0 scbus3 target 0 lun 0
da2:  s/n MYSERIAL   1 detached
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target1): no ping reply (NOP-In)
after 5 seconds; reconnecting
(da2:iscsi2:0:0:0): Periph destroyed
da2 at iscsi2 bus 0 scbus3 target 0 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number MYSERIAL   1
da2: 150.000MB/s transfers
da2: Command Queueing enabled
da2: 1840144MB (471076881 4096 byte sectors)
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target0): handoff on already
connected session
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target0): connection error;
reconnecting
da3 at iscsi1 bus 0 scbus4 target 0 lun 0
da3:  s/n MYSERIAL   0 detached
(da3:iscsi1:0:0:0): Periph destroyed
da3 at iscsi1 bus 0 scbus4 target 0 lun 0
da3:  Fixed Direct Access SPC-4 SCSI device
da3: Serial Number MYSERIAL   0
da3: 150.000MB/s transfers
da3: Command Queueing enabled
da3: 1840144MB (471076881 4096 byte sectors)

After a zpool online, and with vfs.zfs.scrub_delay = 0 and
vfs.zfs.resilver_delay = 0 I issued a zpool scrub and again I had a timeout:

WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target0): no ping reply (NOP-In)
after 5 seconds; reconnecting
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target1): no ping reply (NOP-In)
after 5 seconds; reconnecting
(da3:iscsi1:0:0:0): READ(10). CDB: 28 00 00 b9 9a 67 00 00 01 00 
(da3:iscsi1:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi2:0:0:0): READ(10). CDB: 28 00 00 b9 98 3c 00 00 01 00 
(da3:(da2:iscsi2:0:0:0): CAM status: CCB request aborted by the host
iscsi1:0:(da2:0:iscsi2:0:0): 0:Retrying command
0): Retrying command
(da3:iscsi1:0:0:0): READ(10). CDB: 28 00 00 b9 9f e4 00 00 01 00 
(da2:iscsi2:0:0:0): READ(10). CDB: 28 00 00 b9 a8 cc 00 00 01 00 
(da3:iscsi1:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi2:0:0:0): CAM status: CCB request aborted by the host
(da3:(da2:iscsi1:0:iscsi2:0:0:0:0): 0): Retrying command
Retrying command
(da3:iscsi1:0:0:0): READ(10). CDB: 28 00 00 b9 a2 42 00 00 01 00 
(da2:iscsi2:0:0:0): READ(10). CDB: 28 00 00 b9 95 6e 00 00 20 00 
(da3:iscsi1:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi2:0:0:0): CAM status: CCB request aborted by the host
(da3:(da2:iscsi1:0:iscsi2:0:0:0:0): 0): Retrying command
Retrying command
(da3:iscsi1:0:0:0): READ(10). CDB: 28 00 00 b9 96 4e 00 00 20 00 
(da2:iscsi2:0:0:0): READ(10). CDB: 28 00 00 b9 95 8e 00 00 20 00 
(da3:iscsi1:0:0:0): CAM 

[Bug 211990] iscsi fails to reconnect and does not release devices

2016-10-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #17 from Ben RUBSON  ---
I switched to 11 at the time of 11-RC1 as I did not manage to reproduce the bug
with it.
I did not try to explicitly reproduce it with RC2, RC3, 11.0-RELEASE and
11.0-RELEASE-p1, but it did not occur by itself in these versions :)
Sounds like this bug only affects 10.3.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-10-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #16 from Julien Cigar  ---
Thank you Ben. Can you confirm (once again) that you haven't had this problem
anymore in 11.0-RELEASE? If so I could just upgrade then .. :)

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-10-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

Ben RUBSON  changed:

   What|Removed |Added

 Status|Closed  |Open
 Resolution|Feedback Timeout|---

--- Comment #15 from Ben RUBSON  ---
I re-open this bug for Julien.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-10-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #14 from Julien Cigar  ---
This is also what I thought at first, but there is no error messages related to
the network interfaces in the logs, and all Ierrs, Idrop, and Coll counters are
zeros.

There is a dedicated interface for the iSCSI traffic which I turned on/off
quite a lot to test a failover script, and the symptoms are exactly the ones
described by Ben Rubson in the post above (altough I only have 4 disks in
total).

One more thing, I'm managing everything with SaltStack and in some cases ctld
on the target and iscsid on the initiator are restarted at the same time, could
it be because of that?

I forgot to paste my rc.conf filers, but I don't think there is something wrong
..:

root@filer1:/home/jcigar # cat /etc/rc.conf
dumpdev="AUTO"
ifconfig_bge0="DHCP"
ip6addrctl_policy="ipv4_prefer"
ipv6_network_interfaces="none"
keymap="be.iso"
mountd_enable="YES"
mountd_flags="-r -S -h 192.168.10.15"
nfs_server_enable="YES"
nfs_server_flags="-t -h 192.168.10.15"
nfsuserd_enable="YES"
nfsuserd_flags="-domain prod.lan"
nfsv4_server_enable="YES"
ntpdate_flags="-b -4"
ntpdate_hosts="ntp.belnet.be"
salt_minion_enable="YES"
sendmail_enable="NO"
sendmail_msp_queue_enable="NO"
sendmail_outbound_enable="NO"
sendmail_submit_enable="NO"
zfs_enable="YES"

hostname="filer1.prod.lan"

ifconfig_bge1="inet 10.20.30.31 netmask 255.255.255.0 mtu 9000"

ifconfig_bge0_alias0="inet vhid 54 advskew 50 pass xxx alias 192.168.10.15/32"

sshd_enable="YES"

ntpdate_enable="YES"

iscsid_enable="YES"
iscsictl_enable="YES"
iscsictl_flags="-Aa"

root@filer2:/home/jcigar # cat /etc/rc.conf
dumpdev="AUTO"
ifconfig_bge0="DHCP"
ip6addrctl_policy="ipv4_prefer"
ipv6_network_interfaces="none"
keymap="be.iso"
ntpdate_flags="-b -4"
ntpdate_hosts="ntp.belnet.be"
salt_minion_enable="YES"
sendmail_enable="NO"
sendmail_msp_queue_enable="NO"
sendmail_outbound_enable="NO"
sendmail_submit_enable="NO"
zfs_enable="YES"

hostname="filer2.prod.lan"

ifconfig_bge1="inet 10.20.30.32 netmask 255.255.255.0 mtu 9000"

ifconfig_bge0_alias0="inet vhid 54 advskew 100 pass xxx alias 192.168.10.15/32"

sshd_enable="YES"

ntpdate_enable="YES"
ctld_enable="YES"

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-10-18 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #13 from Edward Tomasz Napierala  ---
The "WARNING: 10.20.30.31 (iqn.1994-09.org.freebsd:filer1.prod.lan): no ping
reply (NOP-Out) after 5 seconds; dropping connection" and "WARNING: 10.20.30.32
(iqn.2016-08.lan.prod:target0): connection error; reconnecting" messages
suggest a network problem.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-10-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #12 from Julien Cigar  ---
Some more info:

on filer2:

root@filer2:/home/jcigar # cat /etc/ctl.conf 

auth-group ag0 {
chap xxx xxx
}

portal-group pg0 {
discovery-auth-group no-authentication
listen 10.20.30.32
}

target iqn.2016-08.lan.prod:target0 {
auth-group ag0
portal-group pg0

lun 0 {
blocksize 4096
path /dev/gpt/f2data0
}
}

target iqn.2016-08.lan.prod:target1 {
auth-group ag0
portal-group pg0

lun 0 {
blocksize 4096
path /dev/gpt/f2data1
}
}

root@filer2:/home/jcigar # ifconfig bge1
bge1: flags=8843 metric 0 mtu 9000
   
options=c019b
ether d0:bf:9c:3a:b7:25
inet 10.20.30.32 netmask 0xff00 broadcast 10.20.30.255 
nd6 options=29
media: Ethernet autoselect (1000baseT )
status: active

root@filer2:/home/jcigar # pciconf -lv bge1
bge1@pci0:2:0:1:class=0x02 card=0x22e8103c chip=0x165f14e4 rev=0x00
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme BCM5720 Gigabit Ethernet PCIe'
class  = network
subclass   = ethernet

root@filer2:/home/jcigar # grep 'bge1' /var/run/dmesg.boot 
bge1:  mem
0x92b0-0x92b0,0x92b1-0x92b1,0x92b2-0x92b2 at device 0.1
on pci2
bge1: APE FW version: NCSI v1.3.12.0
bge1: CHIP ID 0x0572; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
miibus1:  on bge1
bge1: Using defaults for TSO: 65518/35/2048
bge1: Ethernet address: d0:bf:9c:3a:b7:25
bge1: link state changed to DOWN
bge1: link state changed to UP

on filer1.prod.lan:

root@filer1:/home/jcigar # cat /etc/iscsi.conf 

t0 {
TargetAddress   = 10.20.30.32
TargetName  = iqn.2016-08.lan.prod:target0
AuthMethod  = CHAP
chapIName   = xxx
chapSecret  = xxx
}

t1 {
TargetAddress   = 10.20.30.32
TargetName  = iqn.2016-08.lan.prod:target1
AuthMethod  = CHAP
chapIName   = xxx
chapSecret  = xxx
}

root@filer1:/home/jcigar # ifconfig bge1
bge1: flags=8843 metric 0 mtu 9000
   
options=c019b
ether d0:bf:9c:3a:c6:e1
inet 10.20.30.31 netmask 0xff00 broadcast 10.20.30.255 
nd6 options=29
media: Ethernet autoselect (1000baseT )
status: active

root@filer1:/home/jcigar # pciconf -lv bge1
bge1@pci0:2:0:1:class=0x02 card=0x22e8103c chip=0x165f14e4 rev=0x00
hdr=0x00
vendor = 'Broadcom Corporation'
device = 'NetXtreme BCM5720 Gigabit Ethernet PCIe'
class  = network
subclass   = ethernet

root@filer1:/home/jcigar # grep 'bge1' /var/run/dmesg.boot 
bge1:  mem
0x92b0-0x92b0,0x92b1-0x92b1,0x92b2-0x92b2 at device 0.1
on pci2
bge1: APE FW version: NCSI v1.3.12.0
bge1: CHIP ID 0x0572; ASIC REV 0x5720; CHIP REV 0x57200; PCI-E
miibus1:  on bge1
bge1: Using defaults for TSO: 65518/35/2048
bge1: Ethernet address: d0:bf:9c:3a:c6:e1
bge1: link state changed to DOWN
bge1: link state changed to UP

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-10-17 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

Julien Cigar  changed:

   What|Removed |Added

 CC||jul...@perdition.city

--- Comment #11 from Julien Cigar  ---
I got a similar issue on:

FreeBSD filer1.prod.lan 10.3-RELEASE-p7 FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11
18:38:15 UTC 2016
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC  amd64

with:

root@filer1:/etc/rc.d # sysctl -a|grep -i 'iscsi'
kern.iscsi.fail_on_shutdown: 1
kern.iscsi.fail_on_disconnection: 1
kern.iscsi.maxtags: 255
kern.iscsi.login_timeout: 5
kern.iscsi.iscsid_timeout: 5
kern.iscsi.ping_timeout: 5
kern.iscsi.debug: 1

On the initiator side:

WARNING: 10.20.30.31 (iqn.1994-09.org.freebsd:filer1.prod.lan): no ping reply
(NOP-Out) after 5 seconds; dropping connection
WARNING: 10.20.30.31 (iqn.1994-09.org.freebsd:filer1.prod.lan): connection
error; dropping connection
WARNING: 10.20.30.31 (iqn.1994-09.org.freebsd:filer1.prod.lan): connection
error; dropping connection

On the target side:

WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target1): no ping reply (NOP-In)
after 5 seconds; reconnecting
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target0): no ping reply (NOP-In)
after 5 seconds; reconnecting
(da3:iscsi2:0:0:0): READ(10). CDB: 28 00 08 03 02 7a 00 00 01 00 
(da3:iscsi2:0:0:0): CAM status: CCB request aborted by the host
(da3:(da2:iscsi1:0:0:0): READ(10). CDB: 28 00 02 d7 49 e7 00 00 20 00 
iscsi2:0:(da2:iscsi1:0:0:0): CAM status: CCB request aborted by the host
0:(da2:0): iscsi1:0:Retrying command
0:0): Retrying command
(da3:iscsi2:0:0:0): READ(10). CDB: 28 00 02 d7 4a e7 00 00 20 00 
(da2:iscsi1:0:0:0): READ(10). CDB: 28 00 02 d7 4a 07 00 00 20 00 
(da3:iscsi2:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi1:0:0:0): CAM status: CCB request aborted by the host
(da3:(da2:iscsi2:0:iscsi1:0:0:0:0): 0): Retrying command
Retrying command
da3 at iscsi2 bus 0 scbus4 target 0 lun 0
da3:  s/n MYSERIAL   1 detached
da2 at iscsi1 bus 0 scbus3 target 0 lun 0
da2:  s/n MYSERIAL   0 detached
(da3:iscsi2:0:0:0): Periph destroyed
(da2:iscsi1:0:0:0): Periph destroyed
da2 at iscsi2 bus 0 scbus3 target 0 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number MYSERIAL   1
da2: 150.000MB/s transfers
da2: Command Queueing enabled
da2: 1840144MB (471076881 4096 byte sectors)
da3 at iscsi1 bus 0 scbus4 target 0 lun 0
da3:  Fixed Direct Access SPC-4 SCSI device
da3: Serial Number MYSERIAL   0
da3: 150.000MB/s transfers
da3: Command Queueing enabled
da3: 1840144MB (471076881 4096 byte sectors)
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target0): no ping reply (NOP-In)
after 5 seconds; reconnecting
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target1): no ping reply (NOP-In)
after 5 seconds; reconnecting
(da3:iscsi1:0:0:0): READ(10). CDB: 28 00 02 d7 4c a7 00 00 20 00 
(da3:iscsi1:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi2:0:0:0): READ(10). CDB: 28 00 02 e0 f3 d4 00 00 04 00 
(da3:(da2:iscsi2:0:0:0): CAM status: CCB request aborted by the host
iscsi1:0:(da2:0:iscsi2:0:0): 0:Retrying command
0): Retrying command
(da3:iscsi1:0:0:0): READ(10). CDB: 28 00 02 d7 4d 07 00 00 20 00 
(da2:iscsi2:0:0:0): READ(10). CDB: 28 00 02 d7 4b c7 00 00 20 00 
(da3:iscsi1:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi2:0:0:0): CAM status: CCB request aborted by the host
(da3:(da2:iscsi1:0:iscsi2:0:0:0:0): 0): Retrying command
Retrying command
da3 at iscsi1 bus 0 scbus4 target 0 lun 0
da3:  s/n MYSERIAL   0 detached
da2 at iscsi2 bus 0 scbus3 target 0 lun 0
da2:  s/n MYSERIAL   1 detached
(da2:iscsi2:0:0:0): Periph destroyed
(da3:iscsi1:0:0:0): Periph destroyed
da2 at iscsi1 bus 0 scbus4 target 0 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number MYSERIAL   0
da2: 150.000MB/s transfers
da2: Command Queueing enabled
da2: 1840144MB (471076881 4096 byte sectors)
da3 at iscsi2 bus 0 scbus3 target 0 lun 0
da3:  Fixed Direct Access SPC-4 SCSI device
da3: Serial Number MYSERIAL   1
da3: 150.000MB/s transfers
da3: Command Queueing enabled
da3: 1840144MB (471076881 4096 byte sectors)
WARNING: 10.20.30.32 (iqn.2016-08.lan.prod:target0): no ping reply (NOP-In)
after 5 seconds; reconnecting
(da2:iscsi1:0:0:0): READ(6). CDB: 08 00 00 00 20 00 
(da2:iscsi1:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi1:0:0:0): Retrying command
(da2:iscsi1:0:0:0): READ(6). CDB: 08 00 00 10 02 00 
(da2:iscsi1:0:0:0): CAM status: CCB request aborted by the host
(da2:iscsi1:0:0:0): Retrying command
da2 at iscsi1 bus 0 scbus4 target 0 lun 0
da2:  s/n MYSERIAL   0 detached
(da2:iscsi1:0:0:0): Periph destroyed
da2 at iscsi1 bus 0 scbus4 target 0 lun 0
da2:  Fixed Direct Access SPC-4 SCSI device
da2: Serial Number MYSERIAL   0
da2: 150.000MB/s transfers
da2: Command Queueing enabled
da2: 1840144MB 

[Bug 211990] iscsi fails to reconnect and does not release devices

2016-09-07 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

Ben RUBSON  changed:

   What|Removed |Added

 Status|New |Closed
 Resolution|--- |Feedback Timeout

--- Comment #10 from Ben RUBSON  ---
If I'm the only one interested in this bug report, it can then be closed, as I
now run FreeBSD 11.
Feel free then to re-open it if needed.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-09-06 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

Ben RUBSON  changed:

   What|Removed |Added

Version|11.0-STABLE |10.3-RELEASE

--- Comment #9 from Ben RUBSON  ---
Mmmmhh these 2 devices were spares in a zpool, but strangely seen as available.
I removed these 2 spares from the zpool and then the 2 iscsi devices
disappeared properly from the system.
OK, not related then to the 10.3 bug above.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-09-06 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

Ben RUBSON  changed:

   What|Removed |Added

Version|10.3-RELEASE|11.0-STABLE
   Severity|Affects Only Me |Affects Some People
 CC||freebsd-s...@freebsd.org,
   ||tr...@freebsd.org

--- Comment #8 from Ben RUBSON  ---
I'm facing this issue in 11-RC2 where I have 2 iscsi devices which are
disconnected, don't want to reconnect, and I can't manage to remove them :

# iscsictl 
Target name  Target portalState
iqn.2012-06.g2:rT3   192.168.2.2  Disconnected
iqn.2012-06.g2:ch1   192.168.2.2  Disconnected

# iscsictl -Ra

# iscsictl -v
Session ID:   6
Initiator name:   iqn.2012-06.b2g1
Initiator portal: 
Initiator alias:  
Target name:  iqn.2012-06.g2:rT3
Target portal:192.168.2.2
Target alias: G22020PY1EZHArT
User: 
Secret:   
Mutual user:  
Mutual secret:
Session type: Normal
Enable:   Yes
Session state:Disconnected
Failure reason:   
Header digest:None
Data digest:  None
DataSegmentLen:   131072
MaxBurstLen:  262144
FirstBurstLen:131072
ImmediateData:Yes
iSER (RDMA):  No
Offload driver:   None
Device nodes: da11 

Session ID:   10
Initiator name:   iqn.2012-06.b2g1
Initiator portal: 
Initiator alias:  
Target name:  iqn.2012-06.g2:ch1
Target portal:192.168.2.2
Target alias: G22000PY1EVVAch
User: 
Secret:   
Mutual user:  
Mutual secret:
Session type: Normal
Enable:   Yes
Session state:Disconnected
Failure reason:   
Header digest:None
Data digest:  None
DataSegmentLen:   131072
MaxBurstLen:  262144
FirstBurstLen:131072
ImmediateData:Yes
iSER (RDMA):  No
Offload driver:   None
Device nodes: da12 

# ls -l /dev/da11 /dev/da12
ls: /dev/da11: No such file or directory
ls: /dev/da12: No such file or directory

Any help please ?

Thank you very much,

Ben

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-08-21 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #7 from Ben RUBSON  ---
I tried 11-RC1 and was not able to reproduce the issue.

With 10.3, after about 10 NIC toggles, some devices fail.
With 11-RC1, more than one thousand toggles later, all my devices are still
reconnecting correctly.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-08-21 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #6 from Ben RUBSON  ---
(In reply to Ben RUBSON from comment #5)
>And the most important, regarding the bug :
>when devices do not want to correctly reconnect, I found that it is because
>iscsi is stuck in the following :
>cam_sim_free(is->is_sim, TRUE /*free_devq*/);
>However I don't know why.
I put some printf in cam_sim.c and found that iscsi is stuck by :
msleep(sim, sim->mtx, PRIBIO, "simfree", 0);
cam_sim_release() is never fired, so msleep is blocking iscsi.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-08-20 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #5 from Ben RUBSON  ---
I made further troubleshooting.

Regarding the number of iscsid processes which increases, I've found timeout
settings which permit to always have only one process per target :
sysctl kern.iscsi.login_timeout=85
sysctl kern.iscsi.iscsid_timeout=5
sysctl kern.iscsi.ping_timeout=5
These are based on net.inet.tcp.keepinit which is 75000 by default (75
seconds).

Regarding the "30" processes limit, it can be tuned with the "-d" option to
iscsid (through /etc/rc.d/iscsid for example).

And the most important, regarding the bug :
when devices do not want to correctly reconnect, I found that it is because
iscsi is stuck in the following :
cam_sim_free(is->is_sim, TRUE /*free_devq*/);
However I don't know why.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-08-19 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #4 from Ben RUBSON  ---
One strange thing I noticed.
(I put all things that could be interesting from my troubleshooting)

As soon as I put the network interface down, I get the following message on
target side, one per target :
17:01:00 srv2 kernel: WARNING: 192.168.2.1 (iqn.1994-09.org.freebsd:srv1): no
ping reply (NOP-Out) after 5 seconds; dropping connection

Then, on initiator side, I get these messages for each target :
Aug 19 17:01:07 srv1 kernel: iscsi_maintenance_thread_reconnect: 192.168.2.2
(iqn.2012-06.srv2:hm4): connection failed, destroying devices
Aug 19 17:01:07 srv1 kernel: iscsi_session_cleanup: 192.168.2.2
(iqn.2012-06.srv2:hm4): freezing
Aug 19 17:01:07 srv1 kernel: iscsi_session_cleanup: 192.168.2.2
(iqn.2012-06.srv2:hm4): deregistering SIM

At this moment, on initiator side, one iscsid process per target appears.

10 seconds later, on initiator side, I get these messages for each target :
Aug 19 17:01:18 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:hm4): login
timed out after 11 seconds; reconnecting
Aug 19 17:01:18 srv1 kernel: iscsi_maintenance_thread_reconnect: 192.168.2.2
(iqn.2012-06.srv2:hm4): connection failed, destroying devices

And at the same time, a second iscsid process per target appears, so that I get
2 iscsid processes per target :
# ps auxxw | grep iscsid:
root  8660.0  0.0 16632  2144  -  I 4:58pm   0:00.00 iscsid:
192.168.2.2 (iqn.2012-06.srv2:hm4) (iscsid)
root  8810.0  0.0 16632  2144  -  I 4:58pm   0:00.00 iscsid:
192.168.2.2 (iqn.2012-06.srv2:hm4) (iscsid)
(...)
However sounds like there is a limit to 30 processes, as for 17 targets I would
have expected 34 processes, but I only get 30.

If I put the NIC up before the second process is created, I only get one
reconnection message per target in target logs.
If I put the NIC up after the second process is created, I get a lot more
reconnection messages in target logs, between 40 and 50 for 17 targets.

Do we expect these additional processes ?
I think we would only expect one process / one reconnection message per target
?
Seems strange to have all these "duplicated" connection retries.

Another related question to the "30" processes found :
Is there any limit to 30 targets ?
I found a maxproc option in ctl.conf (default to 30) but I don't exactly know
what it means (I tested values of 1 to 50 without seeing any change).
No option found however on initiator side.

I noticed that we can reproduce this bug easier when we "stress" the devices :
disconnect network as soon as targets are reconnected, and reconnect it as soon
as they are disconnected.



Additionally to this, I had 8 kernel crashes, initator or target, each time
with the same address / pointer :
kernel: Fatal trap 12: page fault while in kernel mode
kernel: fault virtual address   = 0x1e8
kernel: instruction pointer = 0x20:0x80936933

I also got a stacktrace, but did not get it's pointer address.
http://img4.hostingpics.net/pics/707217211990.png

I'm also trying to get a full dump.
However I'm not sure this kernel crash issue is related to the reconnection
issue, perhaps there are 2 issues.

# uname -v
FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11 18:38:15 UTC 2016
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC



A lot of info !
I hope we will be able to correct these issues.

Many thanks,

Ben

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-08-19 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #3 from Ben RUBSON  ---
(In reply to Ben RUBSON from comment #0)
>The only workaround I found is to reboot

I even have to hard-reboot, because of this never ending :
iscsi_terminate_sessions: waiting for sessions to terminate

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-08-19 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #2 from Ben RUBSON  ---
Quite difficult to reproduce, but I managed to do it again.
This time the problematic target (only one /17) is seen as disconnected.

### Initiator :
# iscsictl -Ra
# iscsictl -L
Target name  Target portalState
iqn.2012-06.srv2:ch3 192.168.2.2  Disconnected
# netstat -an | grep 3260
// nothing returned

### Initiator logs :
12:38:02 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:ch3): login timed
out after 11 seconds; reconnecting
12:38:06 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:ch3): handoff on
already connected session
12:38:06 srv1 iscsid[7395]: 192.168.2.2 (iqn.2012-06.srv2:ch3): ISCSIDHANDOFF:
Device busy
12:38:06 srv1 iscsid[581]: child process 7395 terminated with exit status 1
12:38:06 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:ch3): connection
error; reconnecting
12:38:06 srv1 kernel: da8 at iscsi63 bus 0 scbus1 target 0 lun 0
12:38:06 srv1 kernel: da8:  s/n G22020PY1EZHAch detached
12:38:16 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:ch3): no ping
reply (NOP-In) after 10 seconds; reconnecting

### Target logs :
12:37:41 srv2 kernel: mlxen1: link state changed to DOWN
12:37:41 srv2 kernel: mlx4_en: mlxen1: Link Down
12:37:46 srv2 kernel: WARNING: 192.168.2.1 (iqn.1994-09.org.freebsd:srv1): no
ping reply (NOP-Out) after 5 seconds; dropping connection
12:37:46 srv2 last message repeated 16 times
12:38:05 srv2 kernel: mlx4_en: mlxen1: Link Up
12:38:05 srv2 kernel: mlxen1: link state changed to UP
12:38:05 srv2 devd: Executing '/etc/rc.d/dhclient quietstart mlxen1'
12:38:09 srv2 kernel: WARNING: 192.168.2.1 (iqn.1994-09.org.freebsd:srv1):
connection error; dropping connection

On initiator side, I also get the same log messages as above for the other
devices, but they correctly reconnect.

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-08-19 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

--- Comment #1 from Ben RUBSON  ---
On initiator, netstat also returns the following :
tcp4   0  0 192.168.2.1.49715  192.168.2.2.3260   CLOSED
tcp4   0  0 192.168.2.1.49708  192.168.2.2.3260   CLOSED
tcp4   0  0 192.168.2.1.49707  192.168.2.2.3260   CLOSED
tcp4   0  0 192.168.2.1.49706  192.168.2.2.3260   CLOSED

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"


[Bug 211990] iscsi fails to reconnect and does not release devices

2016-08-19 Thread bugzilla-noreply
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=211990

Bug ID: 211990
   Summary: iscsi fails to reconnect and does not release devices
   Product: Base System
   Version: 10.3-RELEASE
  Hardware: Any
OS: Any
Status: New
  Severity: Affects Only Me
  Priority: ---
 Component: kern
  Assignee: freebsd-bugs@FreeBSD.org
  Reporter: ben.rub...@gmail.com

Hello,

I'm facing an issue where iscsictl does not want to remove devices.
Here is how I can reproduce this.

### Initiator :
# iscsictl -Aa
iscsictl then reports the 17 targets as connected, perfect.

### Target :
Let's switch down the network interface
# ifconfig mlxen1 down

### Initiator :
iscsictl reports the 17 targets as disconnected, perfect.

### Target :
Let's switch up the network interface
# ifconfig mlxen1 up

### Initiator :
iscsictl reports the 17 targets as connected, however, for 4 devices, I get the
following :
09:59:43 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:lg2): timed out
waiting for iscsid(8) for 11 seconds; reconnecting
09:59:54 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:lg2): timed out
waiting for iscsid(8) for 11 seconds; reconnecting
09:59:57 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:lg2): handoff on
already connected session
07:59:57 srv1 iscsid[1372]: 192.168.2.2 (iqn.2012-06.srv2:lg2): ISCSIDHANDOFF:
Device busy
09:59:57 srv1 iscsid[581]: child process 1372 terminated with exit status 1
09:59:57 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:lg2): connection
error; reconnecting
09:59:57 srv1 kernel: (da21:iscsi8:0:0:0): got CAM status 0x8
09:59:57 srv1 kernel: (da21:iscsi8:0:0:0): fatal error, failed to attach to
device
10:00:07 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:lg2): no ping
reply (NOP-In) after 10 seconds; reconnecting
10:00:08 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:lg2): no ping
reply (NOP-In) after 10 seconds; reconnecting

### Target :
09:58:50 srv2 kernel: mlxen1: link state changed to DOWN
09:58:50 srv2 kernel: mlx4_en: mlxen1: Link Down
09:58:53 srv2 kernel: WARNING: 192.168.2.1 (iqn.1994-09.org.freebsd:srv1): no
ping reply (NOP-Out) after 5 seconds; dropping connection
09:58:53 srv2 last message repeated 16 times
09:59:49 srv2 kernel: mlx4_en: mlxen1: Link Up
09:59:49 srv2 kernel: mlxen1: link state changed to UP
09:59:49 srv2 devd: Executing '/etc/rc.d/dhclient quietstart mlxen1'
09:59:59 srv2 kernel: WARNING: 192.168.2.1 (iqn.1994-09.org.freebsd:srv1):
connection error; dropping connection
09:59:59 srv2 last message repeated 3 times

### Initiator :
# iscsictl -Ra
# iscsictl -L
Target nameTarget portalState
iqn.2012-06.srv2:sW1   192.168.2.2  Connected: da18 
iqn.2012-06.srv2:sW2   192.168.2.2  Connected: da23 
iqn.2012-06.srv2:rT3   192.168.2.2  Connected: da17 
iqn.2012-06.srv2:lg2   192.168.2.2  Connected: da21 

As you can see, the 4 problematic devices remain "connected", nodes exist in
/dev/, but they are unusable.

Each time I "iscsictl -Ra", I get the following on initiator side :
10:09:35 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:lg2): connection
error; reconnecting
10:09:35 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:sW1): connection
error; reconnecting
10:09:35 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:rT3): connection
error; reconnecting
10:09:35 srv1 kernel: WARNING: 192.168.2.2 (iqn.2012-06.srv2:sW2): connection
error; reconnecting

No logs however on target side, even if I start ctld with -d.

The only workaround I found is to reboot, or to change the target name to
properly reconnect...

# uname -v
FreeBSD 10.3-RELEASE-p7 #0: Thu Aug 11 18:38:15 UTC 2016
r...@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC 

Thank you for your support,

Best regards,

Ben

-- 
You are receiving this mail because:
You are the assignee for the bug.
___
freebsd-bugs@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-bugs
To unsubscribe, send any mail to "freebsd-bugs-unsubscr...@freebsd.org"