from:"Daniel Braniss"

Re: current make world brakes if HESIOD enabled

2021-04-03 Thread Daniel Braniss




> On 3 Apr 2021, at 20:22, Warner Losh  wrote:
> 
> What's the error if you don't have these extra uintptr_t casts?

--- getgrent.o ---  

 
*** [getgrent.o] Error code 1   

  

make[4]: stopped in /h/rnd/git/stable/13/lib/libc   

 
--- getpwent.o ---  


/h/rnd/git/stable/13/lib/libc/gen/getpwent.c::8: error: cast to smaller 
integer type 'enum nss_lookup_type' from 'void *' 
[-Werror,-Wvoid-pointer-to-enum-cast]   
  
   how = (enum nss_lookup_type)mdata;   

  
 ^~~


1 error generated. 
> 
> Warner
> 
> On Sat, Apr 3, 2021 at 12:18 AM Daniel Braniss  <mailto:da...@cs.huji.ac.il>> wrote:
> I must be the last person on earth to use Hesiod :-)
> this are the diffs:
> 
> diff --git a/lib/libc/gen/getgrent.c b/lib/libc/gen/getgrent.c
> index afb89cab3..5832cb8c6 100644
> --- a/lib/libc/gen/getgrent.c
> +++ b/lib/libc/gen/getgrent.c
> @@ -971,7 +971,7 @@ dns_group(void *retval, void *mdata, va_list ap)
> hes = NULL;
> name = NULL;
> gid = (gid_t)-1;
> -   how = (enum nss_lookup_type)mdata;
> +   how = (enum nss_lookup_type)(uintptr_t)mdata;
> switch (how) {
> case nss_lt_name:
> name = va_arg(ap, const char *);
> diff --git a/lib/libc/gen/getpwent.c b/lib/libc/gen/getpwent.c
> index a07ee109e..bc1d341fd 100644
> --- a/lib/libc/gen/getpwent.c
> +++ b/lib/libc/gen/getpwent.c
> @@ -1108,7 +1108,7 @@ dns_passwd(void *retval, void *mdata, va_list ap)
> hes = NULL;
> name = NULL;
> uid = (uid_t)-1;
> -   how = (enum nss_lookup_type)mdata;
> +   how = (enum nss_lookup_type)(uintptr_t)mdata;
> switch (how) {
> case nss_lt_name:
> name = va_arg(ap, const char *);
> 
> 
> ___
> freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable 
> <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
> <mailto:freebsd-stable-unsubscr...@freebsd.org>"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

current make world brakes if HESIOD enabled

2021-04-03 Thread Daniel Braniss

I must be the last person on earth to use Hesiod :-)
this are the diffs:

diff --git a/lib/libc/gen/getgrent.c b/lib/libc/gen/getgrent.c
index afb89cab3..5832cb8c6 100644
--- a/lib/libc/gen/getgrent.c
+++ b/lib/libc/gen/getgrent.c
@@ -971,7 +971,7 @@ dns_group(void *retval, void *mdata, va_list ap)
hes = NULL;
name = NULL;
gid = (gid_t)-1;
-   how = (enum nss_lookup_type)mdata;
+   how = (enum nss_lookup_type)(uintptr_t)mdata;
switch (how) {
case nss_lt_name:
name = va_arg(ap, const char *);
diff --git a/lib/libc/gen/getpwent.c b/lib/libc/gen/getpwent.c
index a07ee109e..bc1d341fd 100644
--- a/lib/libc/gen/getpwent.c
+++ b/lib/libc/gen/getpwent.c
@@ -1108,7 +1108,7 @@ dns_passwd(void *retval, void *mdata, va_list ap)
hes = NULL;
name = NULL;
uid = (uid_t)-1;
-   how = (enum nss_lookup_type)mdata;
+   how = (enum nss_lookup_type)(uintptr_t)mdata;
switch (how) {
case nss_lt_name:
name = va_arg(ap, const char *);


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: kqueue and NFS

2020-11-11 Thread Daniel Braniss

hi,


> On 11 Nov 2020, at 12:45, Ronald Klop  wrote:
> 
> Hi,
> 
> I don't think NFS has the possibility to push notifications about changes in 
> the filesystem to the clients. NFSv3 is stateless so the server does not even 
> know about the clients. NFSv4 I don't know much about, but I have never heard 
> of notifications.
> 
I now remember having a similar chat with Rick some years ago.

> So for NFS kqueue would only trigger if the change is on the same client as 
> where the kqueue is lurking.
> 
> Otherwise you could run some daemon on the server which pushes the 
> notifications out of band of the NFS protocol to the clients. Which probably 
> gives interesting results together with the caching of the NFS client. But 
> that is another story we see at work. (postfix -> you have mail! -> NFS -> 
> imap -> no you don't -> O yes, you have. :-) )
> 

in my case it was a python app (flask restful) that when run in debug mode 
would restart if some file changed,
but some days ago that stopped working, Since I had updated the kernel and the 
ports it took me some time
to find out what had happened, it had nothing to do with the upgrades but 
instead I had installed ‘watchdog.py’ which
flaks->werkseig->reload decided to use :-(  
rabbit hole indeed.

thanks,
danny

> Regards,
> Ronald.
> 
> Van: Daniel Braniss 
> Datum: woensdag, 11 november 2020 09:40
> Aan: sta...@freebsd.org
> Onderwerp: kqueue and NFS
>> Hi,
>> I have a vague recollection that kqueue does not work for NFS files,
>> any chance that this will be made possible?
>> cheers,
>>danny
>> ___
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

kqueue and NFS

2020-11-11 Thread Daniel Braniss

Hi,
I have a vague recollection that kqueue does not work for NFS files,
any chance that this will be made possible?

cheers,
danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: vmware/vmx causing problems

2020-08-10 Thread Daniel Braniss




> On 10 Aug 2020, at 09:46, Rainer Duffner  wrote:
> 
> 
> 
>> Am 10.08.2020 um 07:27 schrieb Daniel Braniss :
>> 
>> hi,
>> suspend/resume/migrate works fine up to  11.3,
>> in 12.1 it usually becomes very unresponsive, ping can take several minutes 
>> after a suspend/migrate.
>> switching to em works fine.
>> 
>> any ideas on how to save this?
>> 
> 
> 
> You need to disable snapshotting the memory.

who is snapshotting the memory? and how can I stop it?

> 
> 
> Migrating still works, I think.
> 
> 
> 
> 

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

vmware/vmx causing problems

2020-08-09 Thread Daniel Braniss

hi,
suspend/resume/migrate works fine up to  11.3,
in 12.1 it usually becomes very unresponsive, ping can take several minutes 
after a suspend/migrate.
switching to em works fine.

any ideas on how to save this?

danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

no output from lua boot

2020-05-22 Thread Daniel Braniss

Hi,
my last kernel where all works ok is 357067, somewhere since lua boot appeared,
I’m not seeing it’s messages, at the moment I’m on release 361071, i’ll try to 
update later, but
what am I missing?
hints?

danny
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: limit process memory usage

2020-01-30 Thread Daniel Braniss



> On 31 Jan 2020, at 08:10, Gerrit Kühn  wrote:
> 
> Hello all,
> 
> I have an application that sometimes develops some kind of memory leak
> or similar and eats up all RAM within a few minutes until the system is
> running out of memory and swap so the kernel starts randomly killing other
> processes and finally the crashes.
> Is there a way to limit the memory available to an (or any) application so
> that something like this doesn't tear down the whole system every time it
> happens but just kills the culprit? I found the rctl tool, but I couldn't
> make out how to use it for this purpose so far.
> 
> 
limit — gives you the current settings
and to change:
limit memoryuse some-value

> cu
>  Gerrit
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: usb QR reader

2020-01-16 Thread Daniel Braniss




> On 16 Jan 2020, at 12:26, Hans Petter Selasky  wrote:
> 
> On 2020-01-16 11:23, Daniel Braniss wrote:
>> 63549 a.outCALL  openat(AT_FDCWD,0x1045b,0x2)
>> 63549 a.outNAMI  "/dev/ttyU0"
>> 63549 a.outRET   openat -1 errno 6 Device not configured
>> 63549 a.outCALL  nanosleep(0xbfbfec20,0xbfbfec10)
>> 63549 a.outRET   nanosleep 0
>> 63549 a.outCALL  writev(0x2,0xbfbfe408,0x4)
>> 63549 a.outGIO   fd 2 wrote 28 bytes
>>   "open: Device not configured
>> and the console:
>> Jan 16 12:17:14 neo-black-1 kernel: umodem_cfg_set_dtr: onoff=1
>> Jan 16 12:17:14 neo-black-1 kernel: umodem_cfg_set_rts: onoff=1
>> Jan 16 12:17:15 neo-black-1 kernel: umodem0: at uhub5, port 1, addr 2 
>> (disconnected)
>> Jan 16 12:17:15 neo-black-1 kernel: umodem_detach: sc=0xd494c400
>> Jan 16 12:17:16 neo-black-1 kernel: umodem_cfg_set_break: onoff=0
>> Jan 16 12:17:16 neo-black-1 kernel: umodem0: detached
>> Jan 16 12:17:23 neo-black-1 kernel: umodem_probe:
>> Jan 16 12:17:23 neo-black-1 syslogd: last message repeated 1 times
>> Jan 16 12:17:23 neo-black-1 kernel: umodem0 on uhub5
>> Jan 16 12:17:23 neo-black-1 kernel: umodem0: > class 2/0, rev 1.10/1.00, addr 2> on usbus5
>> Jan 16 12:17:23 neo-black-1 kernel: umodem0: data interface 1, has no CM 
>> over data, has no break
> 
> Also do:
> 
> ls /dev/cuaU*
looks fine
> 
> To make sure there are not multiple devices there!
> 
> Can you try to open up /dev/cuaU0 instead?
> 
> Same result?
> 

the open succeeds, but read returns 0, and the device is detached again.

i will now try the whole termcap stuff (speed, raw mode, etc)

danny



___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: usb QR reader

2020-01-16 Thread Daniel Braniss




> On 16 Jan 2020, at 12:26, Hans Petter Selasky  wrote:
> 
> On 2020-01-16 11:23, Daniel Braniss wrote:
>> 63549 a.outCALL  openat(AT_FDCWD,0x1045b,0x2)
>> 63549 a.outNAMI  "/dev/ttyU0"
>> 63549 a.outRET   openat -1 errno 6 Device not configured
>> 63549 a.outCALL  nanosleep(0xbfbfec20,0xbfbfec10)
>> 63549 a.outRET   nanosleep 0
>> 63549 a.outCALL  writev(0x2,0xbfbfe408,0x4)
>> 63549 a.outGIO   fd 2 wrote 28 bytes
>>   "open: Device not configured
>> and the console:
>> Jan 16 12:17:14 neo-black-1 kernel: umodem_cfg_set_dtr: onoff=1
>> Jan 16 12:17:14 neo-black-1 kernel: umodem_cfg_set_rts: onoff=1
>> Jan 16 12:17:15 neo-black-1 kernel: umodem0: at uhub5, port 1, addr 2 
>> (disconnected)
>> Jan 16 12:17:15 neo-black-1 kernel: umodem_detach: sc=0xd494c400
>> Jan 16 12:17:16 neo-black-1 kernel: umodem_cfg_set_break: onoff=0
>> Jan 16 12:17:16 neo-black-1 kernel: umodem0: detached
>> Jan 16 12:17:23 neo-black-1 kernel: umodem_probe:
>> Jan 16 12:17:23 neo-black-1 syslogd: last message repeated 1 times
>> Jan 16 12:17:23 neo-black-1 kernel: umodem0 on uhub5
>> Jan 16 12:17:23 neo-black-1 kernel: umodem0: > class 2/0, rev 1.10/1.00, addr 2> on usbus5
>> Jan 16 12:17:23 neo-black-1 kernel: umodem0: data interface 1, has no CM 
>> over data, has no break
> 
> Also do:
> 
> ls /dev/cuaU*
> 
> To make sure there are not multiple devices there!
> 
> Can you try to open up /dev/cuaU0 instead?
> 
> Same result?
> 
> --HPS

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: usb QR reader

2020-01-16 Thread Daniel Braniss



> On 16 Jan 2020, at 12:12, Hans Petter Selasky  wrote:
> 
> On 2020-01-16 11:07, Daniel Braniss wrote:
>>> On 16 Jan 2020, at 11:16, Hans Petter Selasky  wrote:
>>> 
>>> ktracing
>> wrote a small c program, that just opens and tries to read, the open fails 
>> with
>>  ‘’open: Device not configured”
>> and the console shows:
>> Jan 16 11:59:09 neo-black-1 kernel: umodem0: detached
>> Jan 16 11:59:14 neo-black-1 kernel: ugen5.2:  at 
>> usbus5
>> Jan 16 11:59:14 neo-black-1 kernel: umodem_probe:
>> Jan 16 11:59:14 neo-black-1 syslogd: last message repeated 1 times
>> Jan 16 11:59:14 neo-black-1 kernel: umodem0 on uhub5
>> Jan 16 11:59:14 neo-black-1 kernel: umodem0: > class 2/0, rev 1.10/1.00, addr 2> on usbus5
>> Jan 16 11:59:14 neo-black-1 kernel: umodem0:
>> Jan 16 11:59:14 neo-black-1 kernel: data interface 1, has no CM over data, 
>> has no break
>> Jan 16 12:01:52 neo-black-1 kernel: umodem_cfg_set_dtr: onoff=1
>> Jan 16 12:01:52 neo-black-1 kernel: umodem_cfg_set_rts: onoff=1
>> Jan 16 12:01:53 neo-black-1 kernel: umodem0: at uhub5, port 1, addr 2 
>> (disconnected)
>> Jan 16 12:01:53 neo-black-1 kernel: umodem_detach: sc=0xd494c400
>> Jan 16 12:01:54 neo-black-1 kernel: umodem_cfg_set_break: onoff=0
>> Jan 16 12:01:54 neo-black-1 kernel: umodem0: detached
>> Jan 16 12:02:01 neo-black-1 kernel: umodem_probe:
>> Jan 16 12:02:01 neo-black-1 syslogd: last message repeated 1 times
>> Jan 16 12:02:01 neo-black-1 kernel: umodem0 on uhub5
>> Jan 16 12:02:01 neo-black-1 kernel: umodem0: > class 2/0, rev 1.10/1.00, addr 2> on usbus5
>> Jan 16 12:02:01 neo-black-1 kernel: umodem0: data interface 1, has no CM 
>> over data, has no break
> 
> Can you put a sleep call in your c-program, like 1 second?
> 
> --HPS
63549 a.outCALL  openat(AT_FDCWD,0x1045b,0x2)
63549 a.outNAMI  "/dev/ttyU0"
63549 a.outRET   openat -1 errno 6 Device not configured
63549 a.outCALL  nanosleep(0xbfbfec20,0xbfbfec10)
63549 a.outRET   nanosleep 0
63549 a.outCALL  writev(0x2,0xbfbfe408,0x4)
63549 a.outGIO   fd 2 wrote 28 bytes
  "open: Device not configured

and the console:
Jan 16 12:17:14 neo-black-1 kernel: umodem_cfg_set_dtr: onoff=1
Jan 16 12:17:14 neo-black-1 kernel: umodem_cfg_set_rts: onoff=1
Jan 16 12:17:15 neo-black-1 kernel: umodem0: at uhub5, port 1, addr 2 
(disconnected)
Jan 16 12:17:15 neo-black-1 kernel: umodem_detach: sc=0xd494c400
Jan 16 12:17:16 neo-black-1 kernel: umodem_cfg_set_break: onoff=0
Jan 16 12:17:16 neo-black-1 kernel: umodem0: detached
Jan 16 12:17:23 neo-black-1 kernel: umodem_probe: 
Jan 16 12:17:23 neo-black-1 syslogd: last message repeated 1 times
Jan 16 12:17:23 neo-black-1 kernel: umodem0 on uhub5
Jan 16 12:17:23 neo-black-1 kernel: umodem0:  on usbus5
Jan 16 12:17:23 neo-black-1 kernel: umodem0: data interface 1, has no CM over 
data, has no break

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: usb QR reader

2020-01-16 Thread Daniel Braniss



> On 16 Jan 2020, at 11:16, Hans Petter Selasky  wrote:
> 
> ktracing

wrote a small c program, that just opens and tries to read, the open fails with
‘’open: Device not configured”

and the console shows:
Jan 16 11:59:09 neo-black-1 kernel: umodem0: detached
Jan 16 11:59:14 neo-black-1 kernel: ugen5.2:  at 
usbus5
Jan 16 11:59:14 neo-black-1 kernel: umodem_probe: 
Jan 16 11:59:14 neo-black-1 syslogd: last message repeated 1 times
Jan 16 11:59:14 neo-black-1 kernel: umodem0 on uhub5
Jan 16 11:59:14 neo-black-1 kernel: umodem0:  on usbus5
Jan 16 11:59:14 neo-black-1 kernel: umodem0: 
Jan 16 11:59:14 neo-black-1 kernel: data interface 1, has no CM over data, has 
no break
Jan 16 12:01:52 neo-black-1 kernel: umodem_cfg_set_dtr: onoff=1
Jan 16 12:01:52 neo-black-1 kernel: umodem_cfg_set_rts: onoff=1
Jan 16 12:01:53 neo-black-1 kernel: umodem0: at uhub5, port 1, addr 2 
(disconnected)
Jan 16 12:01:53 neo-black-1 kernel: umodem_detach: sc=0xd494c400
Jan 16 12:01:54 neo-black-1 kernel: umodem_cfg_set_break: onoff=0
Jan 16 12:01:54 neo-black-1 kernel: umodem0: detached
Jan 16 12:02:01 neo-black-1 kernel: umodem_probe: 
Jan 16 12:02:01 neo-black-1 syslogd: last message repeated 1 times
Jan 16 12:02:01 neo-black-1 kernel: umodem0 on uhub5
Jan 16 12:02:01 neo-black-1 kernel: umodem0:  on usbus5
Jan 16 12:02:01 neo-black-1 kernel: umodem0: data interface 1, has no CM over 
data, has no break
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: usb QR reader

2020-01-16 Thread Daniel Braniss




> On 6 Jan 2020, at 00:32, Hans Petter Selasky  wrote:
> 
> On 2020-01-05 18:03, Daniel Braniss wrote:
>>> On 5 Jan 2020, at 17:08, Hans Petter Selasky  wrote:
>>> 
>>> On 2020-01-05 15:32, Daniel Braniss wrote:
>>>> status 0x6a1a3 
>>>> 
>>>> 16:25:17.790304 usbus5.2 SUBM-CTRL-EP=,SPD=FULL,NFR=1,SLEN=8,IVAL=0
>>>> frame[0] WRITE 8 bytes
>>>>   21 22 03 00 00 00 00 00  -- -- -- -- -- -- -- --  |!"..|
>>>> flags 0x10 
>>>> status 0x4a1a3 
>>>> 
>>>> 16:25:17.790346 usbus5.2 
>>>> DONE-CTRL-EP=,SPD=FULL,NFR=0,SLEN=0,IVAL=50,ERR=TIMEOUT
>>>> flags 0 <0>
>>>> status 0x8a1a5 
>>>> 
>>>> 16:25:18.797312 usbus5.2 
>>>> DONE-CTRL-EP=,SPD=FULL,NFR=0,SLEN=0,IVAL=0,ERR=TIMEOUT
>>>> flags 0x10 
>>> 
>>> Hi,
>>> 
>>> There are some USB requests your USB device doesn't respond to. ERR=TIMEOUT
>>> 
>>> You might find the corresponding driver and enable .debug=16 under :
>>> 
>>> sysctl hw.usb | grep debug
>>> 
>>> To get more information what exactly goes wrong.
>>> 
> 
> Did you compile the kernel with options USB_DEBUG ?
> 
> Now hw.usb.debug, but hw.usb.umodem.debug, for example.
> 
> --HPS
real events took over, so now:

connecting the device:
Jan 16 11:05:28 neo-black-1 kernel: ugen5.2:  at 
usbus5 (disconnected)
Jan 16 11:05:28 neo-black-1 kernel: umodem0: at uhub5, port 1, addr 2 
(disconnected)
Jan 16 11:05:28 neo-black-1 kernel: umodem_detach: sc=0xd494c400
Jan 16 11:05:28 neo-black-1 kernel: umodem0: detached
Jan 16 11:05:36 neo-black-1 kernel: ugen5.2:  at 
usbus5
Jan 16 11:05:36 neo-black-1 kernel: umodem_probe: 
Jan 16 11:05:36 neo-black-1 syslogd: last message repeated 1 times
Jan 16 11:05:36 neo-black-1 kernel: umodem0 on uhub5
Jan 16 11:05:36 neo-black-1 kernel: umodem0:  on usbus5
Jan 16 11:05:36 neo-black-1 kernel: umodem0: 
Jan 16 11:05:36 neo-black-1 kernel: data interface 1, has no CM over data, has 
no break

trying to open it:
Jan 16 11:06:34 neo-black-1 kernel: umodem_cfg_set_dtr: onoff=1
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: cmd=0x402c7413
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: unknown
Jan 16 11:06:34 neo-black-1 kernel: umodem_cfg_set_rts: onoff=1
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: cmd=0x802c7416
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: unknown
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: cmd=0x2000740d
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: unknown
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: cmd=0x402c7413
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: unknown
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: cmd=0x802c7416
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: unknown
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: cmd=0x8004667e
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: unknown
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: cmd=0x8004667d
Jan 16 11:06:34 neo-black-1 kernel: umodem_ioctl: unknown
Jan 16 11:06:35 neo-black-1 kernel: umodem0: at uhub5, port 1, addr 2 
(disconnected)
Jan 16 11:06:35 neo-black-1 kernel: umodem_detach: sc=0xd494c400
Jan 16 11:06:36 neo-black-1 kernel: umodem_cfg_set_break: onoff=0
Jan 16 11:06:36 neo-black-1 kernel: umodem0: detached
Jan 16 11:06:42 neo-black-1 kernel: umodem_probe: 
Jan 16 11:06:42 neo-black-1 syslogd: last message repeated 1 times
Jan 16 11:06:42 neo-black-1 kernel: umodem0 on uhub5
Jan 16 11:06:42 neo-black-1 kernel: umodem0:  on usbus5
Jan 16 11:06:42 neo-black-1 kernel: umodem0: data interface 1, has no CM over 
data, has no break

cheers,
danny


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs lockd errors after NetApp software upgrade.

2020-01-09 Thread Daniel Braniss



> On 9 Jan 2020, at 05:24, Rick Macklem  wrote:
> 
> The attached patch changes the xid to be a global for all "connections" for
> the krpc UDP client.
> 
> You could try it if you'd like. It passed a trivial test, but I don't know why
> there is that "misfeature" comment means, so I don't know if this breaks that.
> 
> I can't think of why "xid" would have been per-connection (especially since a
> connection is a questionable concept for UDP), except that this might have
> originated in a userland library and carried into the kernel during porting.
> 
> rick


I will try it ASAP, in the meantime the new behavior of the NetAPP has been 
disabled,
and since I still don’t know what is causing the unexplained huge number of 
unlock requests,
it’s going to be a long debug process.
also, I will see how to switch to TCP for the NLM protocol with minor 
disruption.

thanks,
danny

[…] 
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs lockd errors after NetApp software upgrade.

2020-01-08 Thread Daniel Braniss

top posting NetAPP reply:
…
Here you can see transaction ID (0x5e15f77a) being used over port 886 and the 
NFS server successfully responds.
 
44806952020-01-08 12:20:54   132.65.116.111  
132.65.60.56   NLM  0x5e15f77a (1578497914) 886 
   V4 UNLOCK Call (Reply In 4480696) FH:0x54b075a0 svid:13629 pos:0-0
44806962020-01-08 12:20:54   132.65.60.56
132.65.116.111 NLM  0x5e15f77a (1578497914) 4045
   V4 UNLOCK Reply (Call In 4480695)
 
Here you see that 2 minutes later the client uses the same transaction ID 
(0x5e15f77a) and the same port again, but the file handle is different, so the 
client is unlocking a different file.
 
45911362020-01-08 12:22:54   132.65.116.111  
132.65.60.56   NLM  0x5e15f77a (1578497914) 886 
   [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In 4480696) 
FH:0xb14b75a8 svid:13629 pos:0-0
45925882020-01-08 12:22:57   132.65.116.111  
132.65.60.56   NLM  0x5e15f77a (1578497914) 886 
   [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In 4480696) 
FH:0xb14b75a8 svid:13629 pos:0-0
45988622020-01-08 12:23:03   132.65.116.111  
132.65.60.56   NLM  0x5e15f77a (1578497914) 886 
   [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In 4480696) 
FH:0xb14b75a8 svid:13629 pos:0-0
46088712020-01-08 12:23:21   132.65.116.111  
132.65.60.56   NLM  0x5e15f77a (1578497914) 886 
   [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In 4480696) 
FH:0xb14b75a8 svid:13629 pos:0-0
46359842020-01-08 12:23:59   132.65.116.111  
132.65.60.56   NLM  0x5e15f77a (1578497914) 886 
   [RPC retransmission of #4480695]V4 UNLOCK Call (Reply In 4480696) 
FH:0xb14b75a8 svid:13629 pos:0-0
 
transaction ID reuse is also seen for a number of other transaction IDs 
starting at the same time.
 
Withing ONTAP 9.3 we have changed the way our Replay-Cache tracks requests by 
including a checksum of the RPC request. Both in in this and earlier releases 
ONTAP would cache the call in frame 4480695, but starintg in 9.3 we then cache 
the checksum as part of that.
 
When the client sends the request in frame 4591136 it uses the same transaction 
ID (0x5e15f77a) and same port again. Here the problem is that we already hold a 
checksum in cache for the “same transaction”
 …

this seems to be happening after the client did not receive the response and 
re-transmits the request.

danny


> On 24 Dec 2019, at 5:02, Rick Macklem  wrote:
> 
> Richard P Mackerras wrote:
>> Hi,
>> 
>> We had some bully type workloads emerge when we moved a lot of block
>> storage from old XIV to new all flash 3PAR. I wonder if your IMAP issue
>> might have emerged just because suddenly there was the opportunity with all
>> flash. QOS is good on 9.x ONTAP. If anyone says it’s not then they last
>> looked on 8.x. So I suggest you QOS the IMAP workload.
>> 
>> Nobody should be using UDP with NFS unless they have a very specific set
>> of circumstances. TCP was a real step forward.
> Well, I can't argue with this, considering I did the first working 
> implementation
> of NFS over TCP. It was actually Mike Karels that suggested I try doing so,
> There's a paper in a very old Usenix Conference Proceedings, but it is so old
> that it isn't on the Usenix web page (around 1988 in Denver, if I recall).  I 
> don't
> even have a copy myself, although I was the author.
> 
> Now, having said that, I must note that the Network Lock Manager (NLM) and
> Network Status Monitor (NSM) were not NFS. They were separate stateful
> protocols (poorly designed imho) that Sun never published.
> 
> NFS as Sun designed it (NFSv2 and NFSv3) were "stateless server" protocols,
> so that they could work reliably without server crash recovery.
> However, the NLM was inherently stateful, since it was dealing with file 
> locks.
> 
> So, you can't really lump the NLM with NFS (and you should avoid use of the
> NLM over any transport imho).
> 
> NFSv4 tackled the difficult problem of having a "stateful server" and crash 
> recovery,
> which resulted in a much more complex protocol (compare the size of RFC-1813
> vs RFC-5661 to get some idea of this).
> 
> rick
> 
> Cheers
> 
> Richard
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___

Re: usb QR reader

2020-01-05 Thread Daniel Braniss



> On 5 Jan 2020, at 17:08, Hans Petter Selasky  wrote:
> 
> On 2020-01-05 15:32, Daniel Braniss wrote:
>> status 0x6a1a3 
>> 
>> 16:25:17.790304 usbus5.2 SUBM-CTRL-EP=,SPD=FULL,NFR=1,SLEN=8,IVAL=0
>> frame[0] WRITE 8 bytes
>>   21 22 03 00 00 00 00 00  -- -- -- -- -- -- -- --  |!"..|
>> flags 0x10 
>> status 0x4a1a3 
>> 
>> 16:25:17.790346 usbus5.2 
>> DONE-CTRL-EP=,SPD=FULL,NFR=0,SLEN=0,IVAL=50,ERR=TIMEOUT
>> flags 0 <0>
>> status 0x8a1a5 
>> 
>> 16:25:18.797312 usbus5.2 
>> DONE-CTRL-EP=,SPD=FULL,NFR=0,SLEN=0,IVAL=0,ERR=TIMEOUT
>> flags 0x10 
> 
> Hi,
> 
> There are some USB requests your USB device doesn't respond to. ERR=TIMEOUT
> 
> You might find the corresponding driver and enable .debug=16 under :
> 
> sysctl hw.usb | grep debug
> 
> To get more information what exactly goes wrong.
> 
> —HPS

did that:
sysctl hw.usb.debug=16
but can’t see any changes, /var/log/messages seems the same as before,
BTW, the device works fine when I connect it to a mac, so it must be something 
in the umodem0

cheers,
danny
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: usb QR reader

2020-01-05 Thread Daniel Braniss



> On 3 Jan 2020, at 13:05, Hans Petter Selasky  wrote:
> 
> On 2020-01-03 11:56, Daniel Braniss wrote:
>> Hi Hans,
>> can you shed some light/help?
>> thanks,
>>  danny
>>> On 2 Jan 2020, at 11:11, Daniel Braniss  wrote:
>>> 
>>> Hi,
>>> after connecting this QR reader I see a new /dev/ttyU but as soon as I try 
>>> tip,
>>> the device disconnects. (BTW, it’s configured as a ‘Virtual Serial Port’)
>>> 
>>> dmsg:
>>> …
>>> Jan  2 10:54:57 pampero kernel: umodem0 on uhub1
>>> Jan  2 10:54:57 pampero kernel: umodem0: >> 2/0, rev 1.10/1.00, addr 38> on usbus0
>>> Jan  2 10:54:57 pampero kernel: umodem0: data interface 1, has no CM over 
>>> data, has no break
>>> Jan  2 10:56:01 pampero kernel: umodem0: at uhub1, port 2, addr 38 
>>> (disconnected)
>>> Jan  2 10:56:02 pampero kernel: umodem0:
>>> Jan  2 10:56:02 pampero kernel: detached
>>> Jan  2 10:56:03 pampero kernel: umodem0 on uhub1
>>> Jan  2 10:56:03 pampero kernel: umodem0: >> 2/0, rev 1.10/1.00, addr 38> on usbus0
>>> Jan  2 10:56:03 pampero kernel: umodem0: data interface 1, has no CM over 
>>> data, has no break
>>> …
>>> 
>>> and usbconfig:
>>> pampero# usbconfig
>>> ugen0.1: <0x8086 XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER 
>>> (5.0Gbps) pwr=SAVE (0mA)
>>> ugen0.2:  at usbus0, cfg=0 md=HOST spd=HIGH 
>>> (480Mbps) pwr=SAVE (500mA)
>>> ugen0.4:  at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) 
>>> pwr=ON (98mA)
>>> ugen0.5:  at usbus0, cfg=0 
>>> md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
>>> ugen0.6:  at usbus0, cfg=0 md=HOST spd=SUPER 
>>> (5.0Gbps) pwr=SAVE (0mA)
>>> ugen0.3:  at usbus0, 
>>> cfg=0 md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
>>> ugen0.7:  at usbus0, cfg=0 md=HOST spd=FULL 
>>> (12Mbps) pwr=ON (100mA) <— this is the QR
>>> 
>>> any ideas?
> 
> Can you run:
> 
> usbdump -i usbus0 -f 7 -s 65536 -vvv
> 
> Before attaching the device. Make sure numbers after ugen are 0 and 7. Look 
> for non ERR=0 .
> 
> —HPS


so I connected the QR reader to another host, running 12.1 stable.

neo-black-2# usbconfig
ugen0.1:  at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) 
pwr=SAVE (0mA)
ugen1.1:  at usbus1, cfg=0 md=HOST spd=FULL (12Mbps) 
pwr=SAVE (0mA)
ugen2.1:  at usbus2, cfg=0 md=HOST spd=HIGH (480Mbps) 
pwr=SAVE (0mA)
ugen3.1:  at usbus3, cfg=0 md=HOST spd=FULL (12Mbps) 
pwr=SAVE (0mA)
ugen4.1:  at usbus4, cfg=0 md=HOST spd=HIGH (480Mbps) 
pwr=SAVE (0mA)
ugen5.1:  at usbus5, cfg=0 md=HOST spd=FULL (12Mbps) 
pwr=SAVE (0mA)
ugen5.2:  at usbus5, cfg=0 md=HOST spd=FULL (12Mbps) 
pwr=ON (100mA)


and so i did 
neo-black-2# usbdump -i usbus5 -f2 -s 65536 -vvv

nothing happens, so this is what I get after typing ’tip usb’ and nothing else 
- i don’t know who is doing the chitchat and after a very short while it 
disconnects.

16:25:16.753606 usbus5.2 SUBM-INTR-EP=0082,SPD=FULL,NFR=1,SLEN=0,IVAL=5
frame[0] READ 64 bytes
flags 0x8a 
status 0xeb023 

16:25:16.753645 usbus5.2 SUBM-CTRL-EP=,SPD=FULL,NFR=1,SLEN=8,IVAL=0
frame[0] WRITE 8 bytes
  21 22 01 00 00 00 00 00  -- -- -- -- -- -- -- --  |!"..|
flags 0x10 
status 0xea1a3 

16:25:16.755271 usbus5.2 
DONE-CTRL-EP=,SPD=FULL,NFR=1,SLEN=0,IVAL=0,ERR=0
frame[0] WRITE 8 bytes
flags 0x10 
status 0xca1a1 

16:25:16.808281 usbus5.2 SUBM-CTRL-EP=,SPD=FULL,NFR=1,SLEN=8,IVAL=50
frame[0] WRITE 8 bytes
  02 01 00 00 81 00 00 00  -- -- -- -- -- -- -- --  ||
flags 0 <0>
status 0x6a1a3 

16:25:17.790304 usbus5.2 SUBM-CTRL-EP=,SPD=FULL,NFR=1,SLEN=8,IVAL=0
frame[0] WRITE 8 bytes
  21 22 03 00 00 00 00 00  -- -- -- -- -- -- -- --  |!"..|
flags 0x10 
status 0x4a1a3 

16:25:17.790346 usbus5.2 
DONE-CTRL-EP=,SPD=FULL,NFR=0,SLEN=0,IVAL=50,ERR=TIMEOUT
flags 0 <0>
status 0x8a1a5 

16:25:18.797312 usbus5.2 
DONE-CTRL-EP=,SPD=FULL,NFR=0,SLEN=0,IVAL=0,ERR=TIMEOUT
flags 0x10 
status 0xaa1a5 

16:25:18.850220 usbus5.2 SUBM-CTRL-EP=,SPD=FULL,NFR=1,SLEN=8,IVAL=50
frame[0] WRITE 8 bytes
  02 01 00 00 81 00 00 00  -- -- -- -- -- -- -- --  ||
flags 0 <0>
status 0x4a1a3 

16:25:18.867314 usbus5.2 
DONE-BULK-EP=0081,SPD=FULL,NFR=0,SLEN=0,IVAL=0,ERR=CANCELLED
flags 0xa 
status 0xab00c 

16:25:18.867332 usbus5.2 
DONE-INTR-EP=0082,SPD=FULL,NFR=0,SLEN=0,IVAL=5,ERR=CANCELLED
flags 0x8a 
status 0x8b01c 

16:25:19.854309 usbus5.2 SUBM-CTRL-EP=,SPD=FULL,NFR=1,SLEN=8,IVAL=0
frame[0] WRITE 8 bytes
  00 09 00 00 00 00 00 00  -- -- -- -- -- -- -- --  ||
flags 0x10 
status 0x6a1a3 

16:25:19.854355 usbus5.

Re: usb QR reader

2020-01-03 Thread Daniel Braniss

Hi Hans, 
can you shed some light/help?

thanks,
danny


> On 2 Jan 2020, at 11:11, Daniel Braniss  wrote:
> 
> Hi, 
> after connecting this QR reader I see a new /dev/ttyU but as soon as I try 
> tip,
> the device disconnects. (BTW, it’s configured as a ‘Virtual Serial Port’)
> 
> dmsg:
> …
> Jan  2 10:54:57 pampero kernel: umodem0 on uhub1
> Jan  2 10:54:57 pampero kernel: umodem0:  2/0, rev 1.10/1.00, addr 38> on usbus0
> Jan  2 10:54:57 pampero kernel: umodem0: data interface 1, has no CM over 
> data, has no break
> Jan  2 10:56:01 pampero kernel: umodem0: at uhub1, port 2, addr 38 
> (disconnected)
> Jan  2 10:56:02 pampero kernel: umodem0: 
> Jan  2 10:56:02 pampero kernel: detached
> Jan  2 10:56:03 pampero kernel: umodem0 on uhub1
> Jan  2 10:56:03 pampero kernel: umodem0:  2/0, rev 1.10/1.00, addr 38> on usbus0
> Jan  2 10:56:03 pampero kernel: umodem0: data interface 1, has no CM over 
> data, has no break
> …
> 
> and usbconfig:
> pampero# usbconfig
> ugen0.1: <0x8086 XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) 
> pwr=SAVE (0mA)
> ugen0.2:  at usbus0, cfg=0 md=HOST spd=HIGH 
> (480Mbps) pwr=SAVE (500mA)
> ugen0.4:  at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) 
> pwr=ON (98mA)
> ugen0.5:  at usbus0, cfg=0 
> md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
> ugen0.6:  at usbus0, cfg=0 md=HOST spd=SUPER 
> (5.0Gbps) pwr=SAVE (0mA)
> ugen0.3:  at usbus0, cfg=0 
> md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
> ugen0.7:  at usbus0, cfg=0 md=HOST spd=FULL 
> (12Mbps) pwr=ON (100mA) <— this is the QR
> 
> any ideas?
> 
> thanks,
>   danny
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

usb QR reader

2020-01-02 Thread Daniel Braniss

Hi, 
after connecting this QR reader I see a new /dev/ttyU but as soon as I try tip,
the device disconnects. (BTW, it’s configured as a ‘Virtual Serial Port’)

dmsg:
…
Jan  2 10:54:57 pampero kernel: umodem0 on uhub1
Jan  2 10:54:57 pampero kernel: umodem0:  on usbus0
Jan  2 10:54:57 pampero kernel: umodem0: data interface 1, has no CM over data, 
has no break
Jan  2 10:56:01 pampero kernel: umodem0: at uhub1, port 2, addr 38 
(disconnected)
Jan  2 10:56:02 pampero kernel: umodem0: 
Jan  2 10:56:02 pampero kernel: detached
Jan  2 10:56:03 pampero kernel: umodem0 on uhub1
Jan  2 10:56:03 pampero kernel: umodem0:  on usbus0
Jan  2 10:56:03 pampero kernel: umodem0: data interface 1, has no CM over data, 
has no break
…

and usbconfig:
pampero# usbconfig
ugen0.1: <0x8086 XHCI root HUB> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) 
pwr=SAVE (0mA)
ugen0.2:  at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) 
pwr=SAVE (500mA)
ugen0.4:  at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) 
pwr=ON (98mA)
ugen0.5:  at usbus0, cfg=0 
md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
ugen0.6:  at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) 
pwr=SAVE (0mA)
ugen0.3:  at usbus0, cfg=0 
md=HOST spd=FULL (12Mbps) pwr=ON (100mA)
ugen0.7:  at usbus0, cfg=0 md=HOST spd=FULL (12Mbps) 
pwr=ON (100mA) <— this is the QR

any ideas?

thanks,
danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs lockd errors after NetApp software upgrade.

2019-12-21 Thread Daniel Braniss



> On 21 Dec 2019, at 19:32, Rick Macklem  wrote:
> 
> Daniel Braniss wrote:
>>> On 20 Dec 2019, at 19:19, Rick Macklem 
>>> >>mailto:rmack...@uoguelph.ca>> wrote:
>>> 
>>> Adam McDougall wrote:
>>>> Try changing bool_t do_tcp = FALSE; to TRUE in
>>>> /usr/src/sys/nlm/nlm_prot_impl.c, recompile the kernel and try again. I
>>>> think this makes it match Linux client behavior. I suspect I ran into
>>>> the same issue as you. I do think I used nolockd is a workaround
>>>> temporarily. I can provide some more details if it works.
>>> If this fixes the problem, please let me know.
>>> 
>>> I'm not sure I'd want to change the default, since it might break things for
>>> others, but I can definitely make it a tunable, so that people don't need to
>>> recompile a kernel to deal with it.
>>> 
>>> 
>> great! I was just about to see how it can be done(tunable) but need to check 
>> if it can >be done
>> at any time, or just at boot time.
> I haven't looked at the code, but I suspect changing it on the fly could 
> cause problems,
> so I am inclined to make it a tunable (boot time only).
my feelings too.
> 
>> thanks.
>> btw, currently, from several hours of analysing the traffic, it seems that 
>> nlm is UDP.
> I assume that means you haven't tried flipping it to TCP yet.
I will soon, but I have my doubts, the problem is caused my multiple events, 
i.e, it happened once while
I was doing svn checkout, but i have done it several times since, and no 
issues. So it must be
an aggregation of factors. Other hosts are reporting locks times too.

danny

> 
> Please let us know how it goes, rick
> 
> danny
> 
> 
> rick
> 
> On 12/19/19 9:21 AM, Daniel Braniss wrote:
> 
> 
> On 19 Dec 2019, at 16:09, Rick Macklem 
> mailto:rmack...@uoguelph.ca>> wrote:
> 
> Daniel Braniss wrote:
> [stuff snipped]
> all mounts are nfsv3/tcp
> This doesn't affect what the NLM code (rpc.lockd) uses. I honestly don't know 
> when
> the NLM uses tcp vs udp. I think rpc.statd still uses IP broadcast at times.
> can the replay cache have any influence here? I tend to remember way back 
> issues
> with it,
> 
> To me, it looks like a network configuration issue.
> that was/is my gut feelings too, but, as far as we can tell, nothing has 
> changed in the network infrastructure,
> the problems appeared after the NetAPP’s software was updated, it was working 
> fine till then.
> 
> the problems are also happening on freebsd 12.1
> 
> You could capture packets (maybe when a client first starts rpc.statd and 
> rpc.lockd)
> and then look at them in wireshark. I'd disable statup of rpc.lockd and 
> rpc.statd
> at boot for a test client and then run something like:
> # tcpdump -s 0 -s out.pcap host 
> - and then start rpc.statd and rpc.lockd
> Then I'd look at out.pcap in wireshark (much better at decoding this stuff 
> than
> tcpdump). I'd look for things like different reply IP addresses from the 
> Netapp,
> which might confuse this tired old NLM protocol Sun devised in the mid-1980s.
> 
> it’s going to be an interesting week end :-(
> 
> the error is also appearing on freebsd-11.2-stable, I’m now checking if it’s 
> also
> happening on 12.1
> btw, the NetApp version is 9.3P17
> Yes. I wasn't the author of the NSM and NLM code (long ago I refused to even
> try to implement it, because I knew the protocol was badly broken) and I avoid
> fiddling with. As such, it won't have change much since around FreeBSD7.
> and we haven’t had any issues with it for years, so you must have done 
> something good
> 
> cheers,
> danny
> 
> 
> rick
> 
> cheers,
>  danny
> 
> rick
> 
> Cheers
> 
> Richard
> (NetApp admin)
> 
> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss 
> mailto:da...@cs.huji.ac.il><mailto:da...@cs.huji.ac.il>> 
> wrote:
> 
> 
> On 18 Dec 2019, at 16:55, Rick Macklem 
> mailto:rmack...@uoguelph.ca><mailto:rmack...@uoguelph.ca>>
>  wrote:
> 
> Daniel Braniss wrote:
> 
> Hi,
> The server with the problems is running FreeBSD 11.1 stable, it was working 
> fine for >several months,
> but after a software upgrade of our NetAPP server it’s reporting many lockd 
> errors >and becomes catatonic,
> ...
> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not responding
> Dec 18 13:11:45 moo-09 last message repeated 7 times
> Dec 18 13:12:55 moo-09 last message repeated 8 times
> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is alive again
> Dec 18 13:13:10 moo-09 la

Re: nfs lockd errors after NetApp software upgrade.

2019-12-20 Thread Daniel Braniss



> On 20 Dec 2019, at 19:19, Rick Macklem  wrote:
> 
> Adam McDougall wrote:
>> Try changing bool_t do_tcp = FALSE; to TRUE in
>> /usr/src/sys/nlm/nlm_prot_impl.c, recompile the kernel and try again. I
>> think this makes it match Linux client behavior. I suspect I ran into
>> the same issue as you. I do think I used nolockd is a workaround
>> temporarily. I can provide some more details if it works.
> If this fixes the problem, please let me know.
> 
> I'm not sure I'd want to change the default, since it might break things for
> others, but I can definitely make it a tunable, so that people don't need to
> recompile a kernel to deal with it.
> 

great! I was just about to see how it can be done(tunable) but need to check if 
it can be done
at any time, or just at boot time.
thanks.
btw, currently, from several hours of analysing the traffic, it seems that nlm 
is UDP.
danny


> rick
> 
> On 12/19/19 9:21 AM, Daniel Braniss wrote:
>> 
>> 
>>> On 19 Dec 2019, at 16:09, Rick Macklem  wrote:
>>> 
>>> Daniel Braniss wrote:
>>> [stuff snipped]
>>>> all mounts are nfsv3/tcp
>>> This doesn't affect what the NLM code (rpc.lockd) uses. I honestly don't 
>>> know when
>>> the NLM uses tcp vs udp. I think rpc.statd still uses IP broadcast at times.
>> can the replay cache have any influence here? I tend to remember way back 
>> issues
>> with it,
>>> 
>>> To me, it looks like a network configuration issue.
>> that was/is my gut feelings too, but, as far as we can tell, nothing has 
>> changed in the network infrastructure,
>> the problems appeared after the NetAPP’s software was updated, it was 
>> working fine till then.
>> 
>> the problems are also happening on freebsd 12.1
>> 
>>> You could capture packets (maybe when a client first starts rpc.statd and 
>>> rpc.lockd)
>>> and then look at them in wireshark. I'd disable statup of rpc.lockd and 
>>> rpc.statd
>>> at boot for a test client and then run something like:
>>> # tcpdump -s 0 -s out.pcap host 
>>> - and then start rpc.statd and rpc.lockd
>>> Then I'd look at out.pcap in wireshark (much better at decoding this stuff 
>>> than
>>> tcpdump). I'd look for things like different reply IP addresses from the 
>>> Netapp,
>>> which might confuse this tired old NLM protocol Sun devised in the 
>>> mid-1980s.
>>> 
>> it’s going to be an interesting week end :-(
>> 
>>>> the error is also appearing on freebsd-11.2-stable, I’m now checking if 
>>>> it’s also
>>>> happening on 12.1
>>>> btw, the NetApp version is 9.3P17
>>> Yes. I wasn't the author of the NSM and NLM code (long ago I refused to even
>>> try to implement it, because I knew the protocol was badly broken) and I 
>>> avoid
>>> fiddling with. As such, it won't have change much since around FreeBSD7.
>> and we haven’t had any issues with it for years, so you must have done 
>> something good
>> 
>> cheers,
>>  danny
>> 
>>> 
>>> rick
>>> 
>>> cheers,
>>>   danny
>>> 
>>>> rick
>>>> 
>>>> Cheers
>>>> 
>>>> Richard
>>>> (NetApp admin)
>>>> 
>>>> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss 
>>>> mailto:da...@cs.huji.ac.il>> wrote:
>>>> 
>>>> 
>>>>> On 18 Dec 2019, at 16:55, Rick Macklem 
>>>>> mailto:rmack...@uoguelph.ca>> wrote:
>>>>> 
>>>>> Daniel Braniss wrote:
>>>>> 
>>>>>> Hi,
>>>>>> The server with the problems is running FreeBSD 11.1 stable, it was 
>>>>>> working fine for >several months,
>>>>>> but after a software upgrade of our NetAPP server it’s reporting many 
>>>>>> lockd errors >and becomes catatonic,
>>>>>> ...
>>>>>> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not 
>>>>>> responding
>>>>>> Dec 18 13:11:45 moo-09 last message repeated 7 times
>>>>>> Dec 18 13:12:55 moo-09 last message repeated 8 times
>>>>>> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is alive 
>>>>>> again
>>>>>> Dec 18 13:13:10 moo-09 last message repeated 8 times
>>>>>> Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
>>>>>> q

Re: nfs lockd errors after NetApp software upgrade.

2019-12-19 Thread Daniel Braniss



> On 19 Dec 2019, at 16:09, Rick Macklem  wrote:
> 
> Daniel Braniss wrote:
> [stuff snipped]
>> all mounts are nfsv3/tcp
> This doesn't affect what the NLM code (rpc.lockd) uses. I honestly don't know 
> when
> the NLM uses tcp vs udp. I think rpc.statd still uses IP broadcast at times.
can the replay cache have any influence here? I tend to remember way back issues
with it,
> 
> To me, it looks like a network configuration issue.
that was/is my gut feelings too, but, as far as we can tell, nothing has 
changed in the network infrastructure,
the problems appeared after the NetAPP’s software was updated, it was working 
fine till then.

the problems are also happening on freebsd 12.1

> You could capture packets (maybe when a client first starts rpc.statd and 
> rpc.lockd)
> and then look at them in wireshark. I'd disable statup of rpc.lockd and 
> rpc.statd
> at boot for a test client and then run something like:
> # tcpdump -s 0 -s out.pcap host 
> - and then start rpc.statd and rpc.lockd
> Then I'd look at out.pcap in wireshark (much better at decoding this stuff 
> than
> tcpdump). I'd look for things like different reply IP addresses from the 
> Netapp,
> which might confuse this tired old NLM protocol Sun devised in the mid-1980s.
> 
it’s going to be an interesting week end :-(
 
>> the error is also appearing on freebsd-11.2-stable, I’m now checking if it’s 
>> also
>> happening on 12.1
>> btw, the NetApp version is 9.3P17
> Yes. I wasn't the author of the NSM and NLM code (long ago I refused to even
> try to implement it, because I knew the protocol was badly broken) and I avoid
> fiddling with. As such, it won't have change much since around FreeBSD7.
and we haven’t had any issues with it for years, so you must have done 
something good

cheers,
danny

> 
> rick
> 
> cheers,
>    danny
> 
>> rick
>> 
>> Cheers
>> 
>> Richard
>> (NetApp admin)
>> 
>> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss 
>> mailto:da...@cs.huji.ac.il>> wrote:
>> 
>> 
>>> On 18 Dec 2019, at 16:55, Rick Macklem 
>>> mailto:rmack...@uoguelph.ca>> wrote:
>>> 
>>> Daniel Braniss wrote:
>>> 
>>>> Hi,
>>>> The server with the problems is running FreeBSD 11.1 stable, it was 
>>>> working fine for >several months,
>>>> but after a software upgrade of our NetAPP server it’s reporting many 
>>>> lockd errors >and becomes catatonic,
>>>> ...
>>>> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not 
>>>> responding
>>>> Dec 18 13:11:45 moo-09 last message repeated 7 times
>>>> Dec 18 13:12:55 moo-09 last message repeated 8 times
>>>> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is alive 
>>>> again
>>>> Dec 18 13:13:10 moo-09 last message repeated 8 times
>>>> Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
>>>> queue >overflow: 194 already in queue awaiting acceptance (1 occurrences)
>>>> Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
>>>> queue >overflow: 193 already in queue awaiting acceptance (3957 
>>>> occurrences)
>>>> Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
>>>> queue >overflow: 193 already in queue awaiting acceptance …
>>> Seems like their software upgrade didn't improve handling of NLM RPCs?
>>> Appears to be handling RPCs slowly and/or intermittently. Note that no one
>>> tests it with IPv6, so at least make sure you are still using IPv4 for the 
>>> mounts and
>>> try and make sure IP broadcast works between client and Netapp. I think the 
>>> NLM
>>> and NSM (rpc.statd) still use IP broadcast sometimes.
>>> 
>> we are ipv4 - we have our own class c :-)
>>> Maybe the network guys can suggest more w.r.t. why, but as I've stated 
>>> before,
>>> the NLM is a fundamentally broken protocol which was never published by Sun,
>>> so I suggest you avoid using it if at all possible.
>> well, at the moment the ball is on NetAPP court, and switching to NFSv4 at 
>> the moment is out of the question, it’s
>> a production server used by several thousand students.
>> 
>>> 
>>> - If the locks don't need to be seen by other clients, you can just use the 
>>> "nolockd"
>>> mount option.
>>> or
>>> - If locks need to be seen by other clients, try NFSv4 mounts. Netapp filers
>>> should support NFSv4.1, whi

Re: nfs lockd errors after NetApp software upgrade.

2019-12-19 Thread Daniel Braniss



> On 19 Dec 2019, at 02:22, Rick Macklem  wrote:
> 
> Richard P Mackerras wrote:
> 
>> Hi,
>> What software version is the NetApp using?
>> Is the exported volume big?
>> Is the vserver configured for 64bit identifiers?
>> 
>> If you enable NFS V4.0 or 4.1 other NFS clients using defaults might mount 
>> NFSv4.x >unexpectedly after a reboot so you need to watch that.
> The FreeBSD client always uses NFSv3 mounts by default. To get NFSv4 you must
> explicitly specify the "nfsv4" or "vers=4" mount option. For NFSv4.1, you must
> also specify "minorversion=1”.
> 
> The Linux distros I am familiar with will use the highest NFS version 
> supported by
> the server by default. (I suspect some are using NFSv4.1 without realizing it,
> which isn't necessarily bad.)
> 
> nfsstat -m
> will show you which version is actually in use for both FreeBSD and Linux.
> 
all mounts are nfsv3/tcp
the error is also appearing on freebsd-11.2-stable, I’m now checking if it’s 
also
happening on 12.1
btw, the NetApp version is 9.3P17

cheers,
danny

> rick
> 
> Cheers
> 
> Richard
> (NetApp admin)
> 
> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss 
> mailto:da...@cs.huji.ac.il>> wrote:
> 
> 
>> On 18 Dec 2019, at 16:55, Rick Macklem 
>> mailto:rmack...@uoguelph.ca>> wrote:
>> 
>> Daniel Braniss wrote:
>> 
>>> Hi,
>>> The server with the problems is running FreeBSD 11.1 stable, it was working 
>>> fine for >several months,
>>> but after a software upgrade of our NetAPP server it’s reporting many lockd 
>>> errors >and becomes catatonic,
>>> ...
>>> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not 
>>> responding
>>> Dec 18 13:11:45 moo-09 last message repeated 7 times
>>> Dec 18 13:12:55 moo-09 last message repeated 8 times
>>> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is alive 
>>> again
>>> Dec 18 13:13:10 moo-09 last message repeated 8 times
>>> Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
>>> queue >overflow: 194 already in queue awaiting acceptance (1 occurrences)
>>> Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
>>> queue >overflow: 193 already in queue awaiting acceptance (3957 occurrences)
>>> Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
>>> queue >overflow: 193 already in queue awaiting acceptance …
>> Seems like their software upgrade didn't improve handling of NLM RPCs?
>> Appears to be handling RPCs slowly and/or intermittently. Note that no one
>> tests it with IPv6, so at least make sure you are still using IPv4 for the 
>> mounts and
>> try and make sure IP broadcast works between client and Netapp. I think the 
>> NLM
>> and NSM (rpc.statd) still use IP broadcast sometimes.
>> 
> we are ipv4 - we have our own class c :-)
>> Maybe the network guys can suggest more w.r.t. why, but as I've stated 
>> before,
>> the NLM is a fundamentally broken protocol which was never published by Sun,
>> so I suggest you avoid using it if at all possible.
> well, at the moment the ball is on NetAPP court, and switching to NFSv4 at 
> the moment is out of the question, it’s
> a production server used by several thousand students.
> 
>> 
>> - If the locks don't need to be seen by other clients, you can just use the 
>> "nolockd"
>>  mount option.
>> or
>> - If locks need to be seen by other clients, try NFSv4 mounts. Netapp filers
>>  should support NFSv4.1, which is a much better protocol that NFSv4.0.
>> 
>> Good luck with it, rick
> thanks
>danny
> 
>> …
>> any ideas?
>> 
>> thanks,
>>   danny
>> 
>> ___
>> freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to 
>> "freebsd-stable-unsubscr...@freebsd.org<mailto:freebsd-stable-unsubscr...@freebsd.org>"
> 
> ___
> freebsd-stable@freebsd.org<mailto:freebsd-stable@freebsd.org> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to 
> "freebsd-stable-unsubscr...@freebsd.org<mailto:freebsd-stable-unsubscr...@freebsd.org>"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs lockd errors after NetApp software upgrade.

2019-12-18 Thread Daniel Braniss



> On 18 Dec 2019, at 17:58, Richard P Mackerras  wrote:
> 
> Hi,
> What software version is the NetApp using?
the very latest :-), but will try and find out later.

> Is the exported volume big?
about 500G, but many files
as far as I know, only accessed by one host running the web app - moodle.

> Is the vserver configured for 64bit identifiers
what the issue here?

> ?
> 
> If you enable NFS V4.0 or 4.1 other NFS clients using defaults might mount 
> NFSv4.x unexpectedly after a reboot so you need to watch that. 
> 
> Cheers 
> 
> Richard 
> (NetApp admin)
> 
> On Wed, 18 Dec 2019 at 15:46, Daniel Braniss  <mailto:da...@cs.huji.ac.il>> wrote:
> 
> 
> > On 18 Dec 2019, at 16:55, Rick Macklem  > <mailto:rmack...@uoguelph.ca>> wrote:
> > 
> > Daniel Braniss wrote:
> > 
> >> Hi,
> >> The server with the problems is running FreeBSD 11.1 stable, it was 
> >> working fine for >several months,
> >> but after a software upgrade of our NetAPP server it’s reporting many 
> >> lockd errors >and becomes catatonic,
> >> ...
> >> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not 
> >> responding
> >> Dec 18 13:11:45 moo-09 last message repeated 7 times
> >> Dec 18 13:12:55 moo-09 last message repeated 8 times
> >> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is alive 
> >> again
> >> Dec 18 13:13:10 moo-09 last message repeated 8 times
> >> Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
> >> queue >overflow: 194 already in queue awaiting acceptance (1 occurrences)
> >> Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
> >> queue >overflow: 193 already in queue awaiting acceptance (3957 
> >> occurrences)
> >> Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
> >> queue >overflow: 193 already in queue awaiting acceptance …
> > Seems like their software upgrade didn't improve handling of NLM RPCs?
> > Appears to be handling RPCs slowly and/or intermittently. Note that no one
> > tests it with IPv6, so at least make sure you are still using IPv4 for the 
> > mounts and
> > try and make sure IP broadcast works between client and Netapp. I think the 
> > NLM
> > and NSM (rpc.statd) still use IP broadcast sometimes.
> > 
> we are ipv4 - we have our own class c :-)
> > Maybe the network guys can suggest more w.r.t. why, but as I've stated 
> > before,
> > the NLM is a fundamentally broken protocol which was never published by Sun,
> > so I suggest you avoid using it if at all possible.
> well, at the moment the ball is on NetAPP court, and switching to NFSv4 at 
> the moment is out of the question, it’s
> a production server used by several thousand students.
> 
> > 
> > - If the locks don't need to be seen by other clients, you can just use the 
> > "nolockd"
> >   mount option.
> > or
> > - If locks need to be seen by other clients, try NFSv4 mounts. Netapp filers
> >   should support NFSv4.1, which is a much better protocol that NFSv4.0.
> > 
> > Good luck with it, rick
> thanks
> danny
> 
> > …
> > any ideas?
> > 
> > thanks,
> >danny
> > 
> > ___
> > freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable 
> > <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>
> > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
> > <mailto:freebsd-stable-unsubscr...@freebsd.org>"
> 
> ___
> freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable 
> <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
> <mailto:freebsd-stable-unsubscr...@freebsd.org>"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: nfs lockd errors after NetApp software upgrade.

2019-12-18 Thread Daniel Braniss



> On 18 Dec 2019, at 16:55, Rick Macklem  wrote:
> 
> Daniel Braniss wrote:
> 
>> Hi,
>> The server with the problems is running FreeBSD 11.1 stable, it was working 
>> fine for >several months,
>> but after a software upgrade of our NetAPP server it’s reporting many lockd 
>> errors >and becomes catatonic,
>> ...
>> Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not 
>> responding
>> Dec 18 13:11:45 moo-09 last message repeated 7 times
>> Dec 18 13:12:55 moo-09 last message repeated 8 times
>> Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is alive 
>> again
>> Dec 18 13:13:10 moo-09 last message repeated 8 times
>> Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
>> queue >overflow: 194 already in queue awaiting acceptance (1 occurrences)
>> Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
>> queue >overflow: 193 already in queue awaiting acceptance (3957 occurrences)
>> Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen 
>> queue >overflow: 193 already in queue awaiting acceptance …
> Seems like their software upgrade didn't improve handling of NLM RPCs?
> Appears to be handling RPCs slowly and/or intermittently. Note that no one
> tests it with IPv6, so at least make sure you are still using IPv4 for the 
> mounts and
> try and make sure IP broadcast works between client and Netapp. I think the 
> NLM
> and NSM (rpc.statd) still use IP broadcast sometimes.
> 
we are ipv4 - we have our own class c :-)
> Maybe the network guys can suggest more w.r.t. why, but as I've stated before,
> the NLM is a fundamentally broken protocol which was never published by Sun,
> so I suggest you avoid using it if at all possible.
well, at the moment the ball is on NetAPP court, and switching to NFSv4 at the 
moment is out of the question, it’s
a production server used by several thousand students.

> 
> - If the locks don't need to be seen by other clients, you can just use the 
> "nolockd"
>   mount option.
> or
> - If locks need to be seen by other clients, try NFSv4 mounts. Netapp filers
>   should support NFSv4.1, which is a much better protocol that NFSv4.0.
> 
> Good luck with it, rick
thanks
danny

> …
> any ideas?
> 
> thanks,
>danny
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

nfs lockd errors after NetApp software upgrade.

2019-12-18 Thread Daniel Braniss

Hi,
The server with the problems is running FreeBSD 11.1 stable, it was working 
fine for several months,
but after a software upgrade of our NetAPP server it’s reporting many lockd 
errors and becomes catatonic,
...
Dec 18 13:11:02 moo-09 kernel: nfs server fr-06:/web/www: lockd not responding
Dec 18 13:11:45 moo-09 last message repeated 7 times
Dec 18 13:12:55 moo-09 last message repeated 8 times
Dec 18 13:13:10 moo-09 kernel: nfs server fr-06:/web/www: lockd is alive again
Dec 18 13:13:10 moo-09 last message repeated 8 times
Dec 18 13:13:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen queue 
overflow: 194 already in queue awaiting acceptance (1 occurrences)
Dec 18 13:14:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen queue 
overflow: 193 already in queue awaiting acceptance (3957 occurrences)
Dec 18 13:15:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen queue 
overflow: 193 already in queue awaiting acceptance (3404 occurrences)
Dec 18 13:16:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen queue 
overflow: 196 already in queue awaiting acceptance (3553 occurrences)
Dec 18 13:17:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen queue 
overflow: 194 already in queue awaiting acceptance (3661 occurrences)
Dec 18 13:18:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen queue 
overflow: 197 already in queue awaiting acceptance (4030 occurrences)
Dec 18 13:19:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen queue 
overflow: 193 already in queue awaiting acceptance (2560 occurrences)
Dec 18 13:20:29 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen queue 
overflow: 193 already in queue awaiting acceptance (1495 occurrences)
Dec 18 13:21:32 moo-09 kernel: sonewconn: pcb 0xf8004cc051d0: Listen queue 
overflow: 193 already in queue awaiting acceptance (817 occurrences)
Dec 18 14:54:43 moo-09 kernel: nfs server fr-06:/mdlbck: lockd not responding
Dec 18 14:55:19 moo-09 last message repeated 2 times
Dec 18 14:55:34 moo-09 kernel: nfs server fr-06:/mdlbck: lockd is alive again
…
any ideas?

thanks,
danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: linker.hints not being update for ARMs

2019-11-12 Thread Daniel Braniss



> On 12 Nov 2019, at 18:41, Ian Lepore  wrote:
> 
> On Tue, 2019-11-12 at 12:03 +0200, Daniel Braniss wrote:
>>> On 12 Nov 2019, at 11:32, Peter Jeremy  wrote:
>>> 
>>> On 2019-Nov-12 10:30:21 +0200, Daniel Braniss 
>>> wrote:
>>>>warning: KLD '/boot/kernel/wlan.ko' is newer than the
>>>> linker.hints file
>>>>warning: KLD '/boot/kernel/rtwn.ko' is newer than the
>>>> linker.hints file
>>> 
>>> ...
>>>> the link.hints is indeed very old :
>>>> neo-000# ls -ls /boot/kernel/linker.hints 
>>>> 224 -rw-r--r--  1 root  wheel  228972 Jan  1  2010
>>>> /boot/kernel/linker.hints
>>> 
>>> Well, that's a nonsense timestamp because FreeBSD didn't support
>>> AllWinner
>>> in 2010.  My guess is that your system clock was wrong.
>>> 
>>>> how can this be fixed?
>>> 
>>> Try rerunning kldxref (with the clock set correctly).
>> 
>> the file is created before the date gets updated!
>> it seems that on allwinner epoch is 1st of Jan 2010! (after a power
>> cycle)
>> so ‘kldxref’ runs but the time stamp is wrong,
>> probably touch /boot/kernel/linker.hints after the clock is corrected
>> solves the problem
>> 
>> thanks,
>>  danny
>> 
> 
> There doesn't appear to be anything date-sensitive about creating the
> xref files.  If there is no file in a given directory that contains
> modules, the file is created.  It's also created if you've set
> kldxref_clobber=YES in rc.conf (it will rebuild on every boot). 
> Otherwise if an xref file exists, no work is done.
> 
> Since a normal installkernel renames the old directory then populates a
> clean new directory, there's no need for dates to be involved…

I’m cross building and in the arm case no file is created (I did the same for
amd64 an one was created)
> the
> first boot after installing a new kernel should see no xref file and
> build one.  If you do something like hand-install just the kernel or
> just an updated module, then the file won't get rebuilt.
> 
> I've always wished that "service kldxref restart" would just force-
> rebuild the files.
I have kldxref_enable=yes
so it got rebuild the first time after a power cycle it got created with an old 
date,
a reboot did not help since i didn’t know about the clobber stuff.
> 
> Even better, if we could make kldxref usable as a cross-tool, the xrefs
> could be generated during a crossbuild rather than on firstboot.
I thought this was too tricky, in any case it is rather fast to re-create it,
so i guess the clobber stuff should work, In any case the modules
where loaded, so the complain was just that, a warning.

danny

> 
> -- Ian
> 
> 

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: linker.hints not being update for ARMs

2019-11-12 Thread Daniel Braniss



> On 12 Nov 2019, at 11:32, Peter Jeremy  wrote:
> 
> On 2019-Nov-12 10:30:21 +0200, Daniel Braniss  wrote:
>>  warning: KLD '/boot/kernel/wlan.ko' is newer than the linker.hints file
>>  warning: KLD '/boot/kernel/rtwn.ko' is newer than the linker.hints file
> ...
>> the link.hints is indeed very old :
>> neo-000# ls -ls /boot/kernel/linker.hints 
>> 224 -rw-r--r--  1 root  wheel  228972 Jan  1  2010 /boot/kernel/linker.hints
> 
> Well, that's a nonsense timestamp because FreeBSD didn't support AllWinner
> in 2010.  My guess is that your system clock was wrong.
> 
>> how can this be fixed?
> 
> Try rerunning kldxref (with the clock set correctly).

the file is created before the date gets updated!
it seems that on allwinner epoch is 1st of Jan 2010! (after a power cycle)
so ‘kldxref’ runs but the time stamp is wrong,
probably touch /boot/kernel/linker.hints after the clock is corrected solves 
the problem

thanks,
danny

> 
> -- 
> Peter Jeremy

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: linker.hints not being update for ARMs

2019-11-12 Thread Daniel Braniss



> On 12 Nov 2019, at 10:30, Daniel Braniss  wrote:
> 
> hi,
> well, at least for allwinner, even running /etc/rc.d/kldxref has no effect,
> I noticed this when trying out a wifi dongle, 
> ...
>   Starting devd.
>   Autoloading module: if_rtwn_usb.ko
>   warning: KLD '/boot/kernel/wlan.ko' is newer than the linker.hints file
>   warning: KLD '/boot/kernel/rtwn.ko' is newer than the linker.hints file
> …
> 
> the link.hints is indeed very old :
> neo-000# ls -ls /boot/kernel/linker.hints 
> 224 -rw-r--r--  1 root  wheel  228972 Jan  1  2010 /boot/kernel/linker.hints
> 
> and strangely, other modules loaded did not complain.
> 
> how can this be fixed?

by removing the old file :-)

> 
> thanks,
>   danny
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

linker.hints not being update for ARMs

2019-11-12 Thread Daniel Braniss

hi,
well, at least for allwinner, even running /etc/rc.d/kldxref has no effect,
I noticed this when trying out a wifi dongle, 
...
Starting devd.
Autoloading module: if_rtwn_usb.ko
warning: KLD '/boot/kernel/wlan.ko' is newer than the linker.hints file
warning: KLD '/boot/kernel/rtwn.ko' is newer than the linker.hints file
…

the link.hints is indeed very old :
neo-000# ls -ls /boot/kernel/linker.hints 
224 -rw-r--r--  1 root  wheel  228972 Jan  1  2010 /boot/kernel/linker.hints

and strangely, other modules loaded did not complain.

how can this be fixed?

thanks,
danny
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

12.1 weirdness

2019-10-16 Thread Daniel Braniss

hi,
just trying out 12,1 on a DELL PowerEdge R710, and I see:
...
Oct 16 22:52:12 store-08 kernel: bce3: bce_pulse(): Warning: bootcode thinks 
driver is absent! (bc_state = 0x0023600E)
Oct 16 22:52:13 store-08 kernel: Limiting icmp unreach response from 3244 to 
200 packets/sec
Oct 16 22:52:13 store-08 kernel: bce3: bce_pulse(): Bootcode found the driver 
pulse! (bc_state = 0x0003610E)
Oct 17 06:04:29 store-08 kernel: Limiting icmp unreach response from 564 to 200 
packets/sec

there is nothing connected to bce3, only bce0, 
rev is r353486, and it’s diskless.

BTW: is there a way of knowing which port is being reported as unreachable? 

any ideas?

thanks,
danny
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: missing date from version

2019-10-11 Thread Daniel Braniss



> On 11 Oct 2019, at 11:47, Jamie Landeg-Jones  wrote:
> 
> Daniel Braniss  wrote:
> 
>> Hi,
>> I just compiled r355429 for amd64, and noticed that the compilation date is 
>> missing from ‘name -a’.
>> FreeBSD hp-600 12.1-STABLE FreeBSD 12.1-STABLE r353429 HUJI  amd64
>> 
>> there is a now (maybe for a long time?)  a -R option to newvers.sh.
>> is there an option to change this?
> 
> This is to do with reproducible builds: 
> https://wiki.freebsd.org/ReproducibleBuilds
> 
> Add to /etc/src.conf:
> 
> WITHOUT_REPRODUCIBLE_BUILD=YES

THANKS!!!
 
> 
> cheers, Jamie

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

missing date from version

2019-10-11 Thread Daniel Braniss

Hi,
I just compiled r355429 for amd64, and noticed that the compilation date is 
missing from ‘name -a’.
FreeBSD hp-600 12.1-STABLE FreeBSD 12.1-STABLE r353429 HUJI  amd64

there is a now (maybe for a long time?)  a -R option to newvers.sh.
is there an option to change this?

cheers,
danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: hw.vga.acpi_ignore_no_vga=1 for installation media

2019-03-18 Thread Daniel Braniss




> On 17 Mar 2019, at 19:34, Konstantin Belousov  wrote:
> 
> On Sun, Mar 17, 2019 at 10:10:45AM -0600, Warner Losh wrote:
>> I generally like this idea... But two caveats...
>> 
>> First, we'd need to update the docs so that folks doing serial installs can
>> unset it Though serial installs are a weird beast
>> Second, if it's really needed, we should have the installer generate it.
>> alas, only vt can tell us that, but it should be easy to add a sysctl to it
>> that says that it has done video by ignoring the absence of the vga node...
> It is not about VGA node (what is that ?).
> It is about ignoring FACP flag IAPC_BOOT_ARCH={NO_VGA}, and there are
> machines which actually break when trying to access VGA hardware despite
> the flag is set.
> Can anybody provide an example of machine where the flag is set but VGA
> works ?  For me, it is set on headless NUC when there is no monitor
> attached, and then BIOS does not configure framebuffer at all.
> 
> So the proposal is about reversing the set of broken machines, but only
> in installer ?  In other words, if it worked for installer, the installed
> system would be broken (again) ?
> 
>> 
>> Warner
>> 
>> On Sun, Mar 17, 2019 at 6:58 AM Leon Christopher Dietrich <
>> dorali...@chaotikum.org> wrote:
>> 
>>> Sound's like solid idea.
>>> 
>>> A lot of systems out there lack propper ACPI description for VGA and it
>>> would definitly make the installation on such a system much more easy.
>>> 
>>> As far as I can tell it doesn't seam to break other things and even low
>>> power system without VGA (like a pcengines apu2) don't seam to suffer.
> What apu2 reports in FACP flags ?  Do
>   acpidump -dt | grep IAPC_BOOT_ARCH


mine reports:
IAPC_BOOT_ARCH=

> 
>>> 
>>> On 17.03.19 13:00, freebsd-stable-requ...@freebsd.org wrote:
 Date: Sun, 17 Mar 2019 02:59:12 +0700
 From: Eugene Grosbein 
 To: FreeBSD stable 
 Subject: hw.vga.acpi_ignore_no_vga=1 for installation media
 Message-ID: <912fc95d-5a5e-012b-7385-0f43f50dc...@grosbein.net>
 Content-Type: text/plain; charset=koi8-r
 
 Hi!
 
 Since 11.2-RELESE, default console driver vt(4) checks ACPI table for
>>> presence of VGA in the system.
 It does not initialize console (no input, no output) if ACPI states
>>> there is no VGA adapter.
 
 There are PRs describing many cases when VGA is present but ACPI lies
 and we have a regression compared with 11.1 and earlier:
 FreeBSD cannot be installed interactively onto such a system, leaving
>>> aside serial console.
 
 vt(4) has loader knob to restore pre-11.2 behaviour and ignore ACPI:
 
 hw.vga.acpi_ignore_no_vga=1
 
 Should we add this unconditionally to the installation media designed
>>> for interactive VGA-based installation?
 
 
 --
 
>>> 
>>> 
>> ___
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> ___
> freebsd-stable@freebsd.org  mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable 
> 
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
> "

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: APU2, legacy firmware 4.0.22, FreeBSD 12.0 hangs in boot

2019-01-07 Thread Daniel Braniss

Hi Michael,


> On 7 Jan 2019, at 10:52, Michael Steinmann  wrote:
> 
> Hi,
> 
> iPXE works fine since many years on the apu1.
> 

it was not working for apu2 till I got a patched version from PCEngines a while 
back.



> Best regards,
> Michael Steinmann
> 
> 
> 
> Am So., 6. Jan. 2019 um 08:18 Uhr schrieb Daniel Braniss  <mailto:da...@cs.huji.ac.il>>:
> 
> 
> > On 5 Jan 2019, at 19:40, Ruben mailto:m...@osfux.nl>> wrote:
> > 
> > Hi,
> > 
> > Just to follow up, i've upgraded (freebsd-update) one of my apu2c4 today
> > 
> > - firmware upgrade to v4.8.0.7 (so switched from legacy to mainline)
> > - FreeBSD upgrade from 11.2 to 12.0
> > 
> > smooth sailing so far.
> > 
> > Ill try updating an apu1 ( running 11.2 ) with its current (mainline) 
> > firmware to 12.0 one of these days as well.
> > 
> 
> before I go down this road, does this boot/firmware support iPXE boot? (or 
> does it also work?)
> 
> danny
> 
> 
> > Regards,
> > 
> > Ruben
> > ___
> > freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> mailing list
> > https://lists.freebsd.org/mailman/listinfo/freebsd-stable 
> > <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>
> > To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
> > <mailto:freebsd-stable-unsubscr...@freebsd.org>"
> 

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: APU2, legacy firmware 4.0.22, FreeBSD 12.0 hangs in boot

2019-01-06 Thread Daniel Braniss




> On 5 Jan 2019, at 19:40, Ruben  wrote:
> 
> Hi,
> 
> Just to follow up, i've upgraded (freebsd-update) one of my apu2c4 today
> 
> - firmware upgrade to v4.8.0.7 (so switched from legacy to mainline)
> - FreeBSD upgrade from 11.2 to 12.0
> 
> smooth sailing so far.
> 
> Ill try updating an apu1 ( running 11.2 ) with its current (mainline) 
> firmware to 12.0 one of these days as well.
> 

before I go down this road, does this boot/firmware support iPXE boot? (or does 
it also work?)

danny


> Regards,
> 
> Ruben
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

pxeboot stuck

2018-12-07 Thread Daniel Braniss

Hi,
today’s latest 11.2 rev 341671, when booting off local disk all is fine, but
pxeboot gets stuck after printing
FreeBSD/x86 bootstrap loader. Revision 1.1
(Fri Dec  7 09:45:34 IST 2018 danny-pe-44)
-

older pxeboot get slightly further, but hang too.

older root images  work fine

so what magic is now needed?

thanks,
danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

annoying panic on boot

2018-03-13 Thread Daniel Braniss

Stopped at  vga_bitblt_one_text_pixels_block+0x13e: movl(%rax,%r13,4),%d
db> bt
Tracing pid 4 tid 100090 td 0xf8000c522620
vga_bitblt_one_text_pixels_block() at vga_bitblt_one_text_pixels_block+0x13e/fr0
vga_bitblt_text() at vga_bitblt_text+0xc0/frame 0xfe2d5160
vt_flush() at vt_flush+0x38f/frame 0xfe2d51b0
termcn_cnputc() at termcn_cnputc+0xbe/frame 0xfe2d51e0
cnputc() at cnputc+0x181/frame 0xfe2d5210
cnputs() at cnputs+0x78/frame 0xfe2d5230
putchar() at putchar+0x14d/frame 0xfe2d52b0
kvprintf() at kvprintf+0x113d/frame 0xfe2d53b0
vprintf() at vprintf+0x84/frame 0xfe2d5500
printf() at printf+0x43/frame 0xfe2d5560
cddone() at cddone+0x210/frame 0xfe2d5b20
xpt_done_process() at xpt_done_process+0x697/frame 0xfe2d5b60
xpt_done_td() at xpt_done_td+0x196/frame 0xfe2d5bb0
fork_exit() at fork_exit+0x82/frame 0xfe2d5bf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe2d5bf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 —

the above happens on boot, sometimes, the host is a dell PowerEdge R710 running 
very resent stable,
any help?

thanks,
danny
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: panic when loading mlxen

2018-02-03 Thread Daniel Braniss



> On 3 Feb 2018, at 12:16, Hans Petter Selasky  wrote:
> 
> Hi,
> 
> I think Alexander came ahead of me:
> 
> https://svnweb.freebsd.org/base?view=revision=328805
> 
> Can you try r328805 ?
> 
> --HPS

yup, it works, well it doesn’t panic.

thanks
danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: panic when loading mlxen

2018-02-03 Thread Daniel Braniss



> On 3 Feb 2018, at 11:34, Hans Petter Selasky <h...@selasky.org> wrote:
> 
> On 02/03/18 08:34, Daniel Braniss wrote:
>>> On 2 Feb 2018, at 20:47, K. Macy <km...@freebsd.org> wrote:
>>> 
>>> That's odd since it doesn't use any of taskqgroup stuff. I take it you
>>> can't get a core?
>> no core but some more info:
>> db> bt
>> Tracing pid 0 tid 10 td 0x81e0e500
>> taskqgroup_attach_cpu() at taskqgroup_attach_cpu+0x4f/frame 
>> 0x822e4c30
>> tasklet_subsystem_init() at tasklet_subsystem_init+0xde/frame 
>> 0x822e4c90
>> mi_startup() at mi_startup+0x9c/frame 0x822e4cb0
>> btext() at btext+0x2c
>>> 
>>> Also, why are you loading it in loader.conf (slower) as opposed to rc.conf?
>> sometimes it’s booted diskless, and the driver is needed early.
>> and btw, this box doesn’t even have a mellanox card.
>>> -M
>>> 
>>> 
>>> 
>>> On Fri, Feb 2, 2018 at 4:46 AM, Daniel Braniss <da...@cs.huji.ac.il> wrote:
>>>> with latest stable (r328769) when I have
>>>>mlxen_load=“YES”
>>>> in my loader.conf it panics:
>>>> 
>>>> KDB: debugger backends: ddbsize 0x4638 at 0x22d6000
>>>>  f
>>>> KDB: current backend: ddb
>>>> Copyright (c) 1992-2018 The FreeBSD Project.
>>>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>>>>   The Regents of the University of California. All rights reserved.
>>>> FreeBSD is a registered trademark of The FreeBSD Foundation.
>>>> FreeBSD 11.1-STABLE #18: Fri Feb  2 10:46:12 IST 2018
>>>>   danny@pe-44:/home/obj/pe-44/net/rnd/r+d/stable/11/sys/HUJI amd64
>>>> FreeBSD clang version 5.0.1 (tags/RELEASE_501/final 320880) (based on LLVM 
>>>> 5.0.1)
>>>> VT(vga): resolution 640x480
>>>> CPU: Intel(R) Xeon(R) CPU   E5507  @ 2.27GHz (2261.04-MHz K8-class 
>>>> CPU)
>>>> Origin="GenuineIntel"  Id=0x106a5  Family=0x6  Model=0x1a  Stepping=5
>>>> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C>
>>>> Features2=0x9ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA>
>>>> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
>>>> AMD Features2=0x1
>>>> VT-x: PAT,HLT,MTF,PAUSE,EPT,VPID
>>>> TSC: P-state invariant, performance statistics
>>>> real memory  = 25769803776 (24576 MB)
>>>> avail memory = 24931561472 (23776 MB)
>>>> Event timer "LAPIC" quality 100
>>>> ACPI APIC Table: 
>>>> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
>>>> FreeBSD/SMP: 2 package(s) x 4 core(s)
>>>> ioapic1: Changing APIC ID to 1
>>>> ioapic0  irqs 0-23 on motherboard
>>>> ioapic1  irqs 32-55 on motherboard
>>>> 
>>>> 
>>>> Fatal trap 12: page fault while in kernel mode
>>>> cpuid = 0; apic id = 10
>>>> fault virtual address   = 0x1818
>>>> fault code  = supervisor write data, page not present
>>>> instruction pointer = 0x20:0x80ad427f
>>>> stack pointer   = 0x28:0x822e3be0
>>>> frame pointer   = 0x28:0x822e3c30
>>>> code segment= base 0x0, limit 0xf, type 0x1b
>>>>   = DPL 0, pres 1, long 1, def32 0, gran 1
>>>> processor eflags= interrupt enabled, resume, IOPL = 0
>>>> current process = 0 (swapper)
>>>> [ thread pid 0 tid 10 ]
>>>> Stopped at  taskqgroup_attach_cpu+0x4f: lock cmpxchgq   %r12,(%rdi)
> 
> Hi,
> 
> It should work if you "kldload mlxen" after boot or add it to kld_list in 
> /etc/rc.conf. Looks like I have one more combination to test after the 
> LinuxKPI upgrade in 11-stable. Thanks for notifying me.
> 
Hi,
it’s ok, i don’t need it yet, I was just surprised that a simple upgrade got 
the panic.
let me know if you need me to test.

thanks,

danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: panic when loading mlxen

2018-02-03 Thread Daniel Braniss



> On 2 Feb 2018, at 20:47, K. Macy <km...@freebsd.org> wrote:
> 
> That's odd since it doesn't use any of taskqgroup stuff. I take it you
> can't get a core?

no core but some more info:
db> bt
Tracing pid 0 tid 10 td 0x81e0e500
taskqgroup_attach_cpu() at taskqgroup_attach_cpu+0x4f/frame 0x822e4c30
tasklet_subsystem_init() at tasklet_subsystem_init+0xde/frame 0x822e4c90
mi_startup() at mi_startup+0x9c/frame 0x822e4cb0
btext() at btext+0x2c

> 
> Also, why are you loading it in loader.conf (slower) as opposed to rc.conf?
sometimes it’s booted diskless, and the driver is needed early.
and btw, this box doesn’t even have a mellanox card.


> -M
> 
> 
> 
> On Fri, Feb 2, 2018 at 4:46 AM, Daniel Braniss <da...@cs.huji.ac.il> wrote:
>> with latest stable (r328769) when I have
>>mlxen_load=“YES”
>> in my loader.conf it panics:
>> 
>> KDB: debugger backends: ddbsize 0x4638 at 0x22d6000  
>>f
>> KDB: current backend: ddb
>> Copyright (c) 1992-2018 The FreeBSD Project.
>> Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
>>   The Regents of the University of California. All rights reserved.
>> FreeBSD is a registered trademark of The FreeBSD Foundation.
>> FreeBSD 11.1-STABLE #18: Fri Feb  2 10:46:12 IST 2018
>>   danny@pe-44:/home/obj/pe-44/net/rnd/r+d/stable/11/sys/HUJI amd64
>> FreeBSD clang version 5.0.1 (tags/RELEASE_501/final 320880) (based on LLVM 
>> 5.0.1)
>> VT(vga): resolution 640x480
>> CPU: Intel(R) Xeon(R) CPU   E5507  @ 2.27GHz (2261.04-MHz K8-class 
>> CPU)
>> Origin="GenuineIntel"  Id=0x106a5  Family=0x6  Model=0x1a  Stepping=5
>> Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,C>
>> Features2=0x9ce3bd<SSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,DCA>
>> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM>
>> AMD Features2=0x1
>> VT-x: PAT,HLT,MTF,PAUSE,EPT,VPID
>> TSC: P-state invariant, performance statistics
>> real memory  = 25769803776 (24576 MB)
>> avail memory = 24931561472 (23776 MB)
>> Event timer "LAPIC" quality 100
>> ACPI APIC Table: 
>> FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
>> FreeBSD/SMP: 2 package(s) x 4 core(s)
>> ioapic1: Changing APIC ID to 1
>> ioapic0  irqs 0-23 on motherboard
>> ioapic1  irqs 32-55 on motherboard
>> 
>> 
>> Fatal trap 12: page fault while in kernel mode
>> cpuid = 0; apic id = 10
>> fault virtual address   = 0x1818
>> fault code  = supervisor write data, page not present
>> instruction pointer = 0x20:0x80ad427f
>> stack pointer   = 0x28:0x822e3be0
>> frame pointer   = 0x28:0x822e3c30
>> code segment= base 0x0, limit 0xf, type 0x1b
>>   = DPL 0, pres 1, long 1, def32 0, gran 1
>> processor eflags= interrupt enabled, resume, IOPL = 0
>> current process = 0 (swapper)
>> [ thread pid 0 tid 10 ]
>> Stopped at  taskqgroup_attach_cpu+0x4f: lock cmpxchgq   %r12,(%rdi)
>> ___
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

random 11.1 boot panic

2018-02-02 Thread Daniel Braniss

this has been happening randomly for some time now,
…
cd0: Attempt to query device size failed: NOT READY, Medium not present - tray ]
Stopped at  vga_bitblt_one_text_pixels_block+0x13e: movl(%rax,%r13,4),%d
db> bt
Tracing pid 4 tid 100042 td 0xf8000a4f2620
vga_bitblt_one_text_pixels_block() at vga_bitblt_one_text_pixels_block+0x13e/fr0
vga_bitblt_text() at vga_bitblt_text+0xc0/frame 0xfe05d950f160
vt_flush() at vt_flush+0x38f/frame 0xfe05d950f1b0
termcn_cnputc() at termcn_cnputc+0xbe/frame 0xfe05d950f1e0
cnputc() at cnputc+0x181/frame 0xfe05d950f210
cnputs() at cnputs+0x78/frame 0xfe05d950f230
putchar() at putchar+0x14d/frame 0xfe05d950f2b0
kvprintf() at kvprintf+0x113d/frame 0xfe05d950f3b0
vprintf() at vprintf+0x84/frame 0xfe05d950f500
printf() at printf+0x43/frame 0xfe05d950f560
cddone() at cddone+0x210/frame 0xfe05d950fb20
xpt_done_process() at xpt_done_process+0x697/frame 0xfe05d950fb60
xpt_done_td() at xpt_done_td+0x196/frame 0xfe05d950fbb0
fork_exit() at fork_exit+0x82/frame 0xfe05d950fbf0
fork_trampoline() at fork_trampoline+0xe/frame 0xfe05d950fbf0
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
db>

danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

panic when loading mlxen

2018-02-02 Thread Daniel Braniss

with latest stable (r328769) when I have
mlxen_load=“YES”
in my loader.conf it panics:

KDB: debugger backends: ddbsize 0x4638 at 0x22d6000 
f
KDB: current backend: ddb
Copyright (c) 1992-2018 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
   The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 11.1-STABLE #18: Fri Feb  2 10:46:12 IST 2018
   danny@pe-44:/home/obj/pe-44/net/rnd/r+d/stable/11/sys/HUJI amd64
FreeBSD clang version 5.0.1 (tags/RELEASE_501/final 320880) (based on LLVM 
5.0.1)
VT(vga): resolution 640x480
CPU: Intel(R) Xeon(R) CPU   E5507  @ 2.27GHz (2261.04-MHz K8-class CPU)
 Origin="GenuineIntel"  Id=0x106a5  Family=0x6  Model=0x1a  Stepping=5
 Features=0xbfebfbff
 Features2=0x9ce3bd
 AMD Features=0x28100800
 AMD Features2=0x1
 VT-x: PAT,HLT,MTF,PAUSE,EPT,VPID
 TSC: P-state invariant, performance statistics
real memory  = 25769803776 (24576 MB)
avail memory = 24931561472 (23776 MB)
Event timer "LAPIC" quality 100
ACPI APIC Table: 
FreeBSD/SMP: Multiprocessor System Detected: 8 CPUs
FreeBSD/SMP: 2 package(s) x 4 core(s)
ioapic1: Changing APIC ID to 1
ioapic0  irqs 0-23 on motherboard
ioapic1  irqs 32-55 on motherboard


Fatal trap 12: page fault while in kernel mode
cpuid = 0; apic id = 10
fault virtual address   = 0x1818
fault code  = supervisor write data, page not present
instruction pointer = 0x20:0x80ad427f
stack pointer   = 0x28:0x822e3be0
frame pointer   = 0x28:0x822e3c30
code segment= base 0x0, limit 0xf, type 0x1b
   = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 0 (swapper)
[ thread pid 0 tid 10 ]
Stopped at  taskqgroup_attach_cpu+0x4f: lock cmpxchgq   %r12,(%rdi)
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: iscsi target and VMware/esxi timeouts -- SOLVED

2017-11-17 Thread Daniel Braniss



> On 14 Nov 2017, at 11:28, Patrick M. Hausen <hau...@punkt.de> wrote:
> 
> Hello,
> 
>> Am 14.11.2017 um 10:08 schrieb Daniel Braniss <da...@cs.huji.ac.il>:
>> 
>> Hi,
>> we are experimenting issues with several esxi’s servers that use freebsd 
>> 10.2 stable as a iscsi target.
>> ie:
>> Nov 11 17:58:16 store-07 kernel: WARNING: 132.65.11.201 
>> (iqn.1998-01.com.vmware:pe-02-2fa7cd9e): no ping reply (NOP-Out) after 5 
>> seconds; dropping connection
>> Nov 11 17:58:16 store-07 kernel: WARNING: 132.65.11.201 
>> (iqn.1998-01.com.vmware:pe-02-2fa7cd9e): no ping reply (NOP-Out) after 5 
>> seconds; dropping connection
>> Nov 11 17:58:16 store-07 kernel: WARNING: 132.65.11.205 
>> (iqn.1998-01.com.vmware:pe-03-13e8b52d): no ping reply (NOP-Out) after 5 
>> seconds; dropping connection
>> Nov 11 17:58:17 store-07 kernel: WARNING: 132.65.11.203 
>> (iqn.1998-01.com.vmware:pe-13-60e87d06): no ping reply (NOP-Out) after 5 
>> seconds; dropping connection
>> Nov 11 17:58:17 store-07 kernel: WARNING: 132.65.11.205 
>> (iqn.1998-01.com.vmware:pe-03-13e8b52d): no ping reply (NOP-Out) after 5 
>> seconds; dropping connection
>> 
>> these are 3 different esxis that almost at the same time the target looses 
>> connection to the initiators.
>> at the moment most ‘clients’  recover from the scsi error, but older 
>> freebsds don’t.
>> 
>> in any case, increasing the timeout is not helping.
>> 
>> any clues are welcome :-)
>> 
>> over the weekend i’m planning to upgrade the target to 11.1 and take for the 
>> hills.
> 
> Are you using istgt or ctld?
> 
> We have did experience similar occasional problems with the former
> but never with the latter.
> 
> Patrick

the iscsi initiator of the esxi’s (VMware) does answer to NOP’s once in a blue 
moon!
this was checked by sniffing the network.

setting kern.cam.ctl.iscsi.ping_timeout=0 solved this.

thanks,
danny


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

iscsi target and VMware/esxi timeouts

2017-11-14 Thread Daniel Braniss

Hi,
we are experimenting issues with several esxi’s servers that use freebsd 10.2 
stable as a iscsi target.
ie:
Nov 11 17:58:16 store-07 kernel: WARNING: 132.65.11.201 
(iqn.1998-01.com.vmware:pe-02-2fa7cd9e): no ping reply (NOP-Out) after 5 
seconds; dropping connection
Nov 11 17:58:16 store-07 kernel: WARNING: 132.65.11.201 
(iqn.1998-01.com.vmware:pe-02-2fa7cd9e): no ping reply (NOP-Out) after 5 
seconds; dropping connection
Nov 11 17:58:16 store-07 kernel: WARNING: 132.65.11.205 
(iqn.1998-01.com.vmware:pe-03-13e8b52d): no ping reply (NOP-Out) after 5 
seconds; dropping connection
Nov 11 17:58:17 store-07 kernel: WARNING: 132.65.11.203 
(iqn.1998-01.com.vmware:pe-13-60e87d06): no ping reply (NOP-Out) after 5 
seconds; dropping connection
Nov 11 17:58:17 store-07 kernel: WARNING: 132.65.11.205 
(iqn.1998-01.com.vmware:pe-03-13e8b52d): no ping reply (NOP-Out) after 5 
seconds; dropping connection

these are 3 different esxis that almost at the same time the target looses 
connection to the initiators.
at the moment most ‘clients’  recover from the scsi error, but older freebsds 
don’t.

in any case, increasing the timeout is not helping.

any clues are welcome :-)

over the weekend i’m planning to upgrade the target to 11.1 and take for the 
hills.

danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: issues with powerd/freq_levels

2017-08-02 Thread Daniel Braniss


> On 1 Aug 2017, at 20:45, Ian Smith <smi...@nimnet.asn.au> wrote:
> 
> On Mon, 31 Jul 2017 12:03:27 -0700, Kevin Oberman wrote:
>> On Mon, Jul 31, 2017 at 3:48 AM, Ian Smith <smi...@nimnet.asn.au> wrote:
>> 
>>> On Mon, 31 Jul 2017 10:09:11 +0300, Daniel Braniss wrote:
>>> 
>>>> I am trying out PCengines latest apu2 boards, and I just noticed that
>>> with different Freebsd versions I get
>>>> different freq_levels, and so when idling, each box (have 5) has a
>>> different freq/temperature value, ranging
>>>> from 125/69.1C, 600/59.0C to 75/56.0C
>>>> 
>>>> FreeBSD apu-4 11.1-STABLE FreeBSD 11.1-STABLE #5 f565b5a06ab3 (11) tip:
>>> Mon Jul 31 09:36:33 IDT 2017
>>>> apu-4# sysctl dev.cpu.0.freq_levels
>>>> dev.cpu.0.freq_levels: 1000/980 800/807 600/609
>>> 
>>> That looks about right.  On a Core2Duo (still on 9.3) I get:
>>> dev.est.1.freq_settings: 2401/35000 2400/35000 1600/15000 800/12000
>>> dev.est.0.freq_settings: 2401/35000 2400/35000 1600/15000 800/12000
>>> dev.cpu.0.freq_levels: 2401/35000 2400/35000 1600/15000 800/12000
>>> dev.cpu.0.freq: 800
>>> 
>>> But only because I'd added to /boot/loader.conf:
>>> 
>>> hint.p4tcc.0.disabled=1
>>> hint.acpi_throttle.0.disabled=1
>>> 
>>> which became the defaults sometime, maybe not before 11.0?  Otherwise
>>> mine would look more similar to the one below, with all 12.5% increments
>>> in frequency enabled, which doesn't actually save any power at all.
>>> 
>>>> FreeBSD apu-5 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #0 21e9d1ca9b80
>>> (11) tip: Tue May 30 11:51:48 IDT 2017
>>>> apu-5# sysctl dev.cpu.0.freq_levels
>>>> dev.cpu.0.freq_levels: 1000/966 875/845 800/795 700/695 600/600 525/525
>>> 450/450 375/375 300/300 225/225 150/150 75/75
>>> 
>>> Looks like either p4tcc or acpi_throttle is enabled?  See cpufreq(4).
>>> As above, these don't buy you anything but extra busyness for powerd.
>>> 
>>> Also noticed that the (nice, low!) milliwatt figures for 1000/800/600
>>> freqs are a bit different to the -stable one.  Slightly Different model?
>>> 
>>>> FreeBSD apu-1 10.3-STABLE FreeBSD 10.3-STABLE #4 267788fd852c (10) tip:
>>> Tue Jan 10 09:09:00 IST 2017
>>>> apu-1# sysctl dev.cpu.0.freq_levels
>>>> dev.cpu.0.freq_levels: 1000/-1 875/-1 750/-1 625/-1 500/-1 375/-1
>>> 250/-1 125/-1
>>> 
>>> And that looks like est(4) isn't enabled/attaching at all .. see dmesg
>>> on all of these for clues.
>>> 
>>>> so, any ideas as to what is going on?
>>> 
>>> Pure guesswork on experience with older versions, I'm not up to date.
>>> 
>> 
>> Very odd. Are all systems running identical CPUs and BIOSes? Identical
>> loader and sysctl configurations? Look at /var/rn/dmesg.boot for CPU
>> information. Is EST being detected? It used to be early in the boot
>> process, but is now fairly late. (In my case, about 2/3 through the
>> dmesg.boot file.
> 
> Hi Kevin, it's been a while ..
> 
> Danny, can you put up a verbose boot dmesg.boot of one(?) for a browse? 
> Or maybe apu-4 and -1, if not all.  I'd expect error msgs on -1 anyway.
they are now available  at:
http://www.cs.huji.ac.il/~danny/pcengines/ 
<http://www.cs.huji.ac.il/~danny/pcengines/>
> 
>> I have p4tcc and throttling explicitly turned off (which should now be the
>> default), but my Sandy Bridge Core i5 still shows:
>> dev.cpu.0.freq_levels: 2501/35000 2500/35000 2000/26426 1800/23233
>> 1600/20164 1400/17226 1200/14408 1000/11713 800/9140
> 
> All truly available I see on more recent processors.  Certainly not 1/8 
> duty-cycle multipliers as p4tcc and maybe? acpi_throttle (not seen here)
> 
>> The first is really bogus to indicate "turbo" mode.
> 
> Usefully bogus, in that you can flag powerd to (in your case) -M 2500 to 
> prevent it engaging "turbo" mode, as I do on my old Core2Duo, as advised 
> by Warner years ago to avoid overheating on buildworlds and such - but 
> more recent incarnations of "turbo" are supposedly far more functional.
> 
> Admittedly a digression .. mostly coming from wondering about data Karl
> posted in response, indicating different Cx levels available and so used 
> by the latter 3 AP cores, which was news to me.  I'd like to know more, 
> if only for gratuitous curiosity.  Others can tick their TL;DR box :)
> 
>> Temperature is a totally separate issue. It is

Re: issues with powerd/freq_levels

2017-08-01 Thread Daniel Braniss

all boards are identical, purchased at the same time.

> On 31 Jul 2017, at 13:48, Ian Smith <smi...@nimnet.asn.au> wrote:
> 
> On Mon, 31 Jul 2017 10:09:11 +0300, Daniel Braniss wrote:
> 
>> I am trying out PCengines latest apu2 boards, and I just noticed that with 
>> different Freebsd versions I get
>> different freq_levels, and so when idling, each box (have 5) has a different 
>> freq/temperature value, ranging
>> from 125/69.1C, 600/59.0C to 75/56.0C
>> 
>> FreeBSD apu-4 11.1-STABLE FreeBSD 11.1-STABLE #5 f565b5a06ab3 (11) tip: Mon 
>> Jul 31 09:36:33 IDT 2017
>> apu-4# sysctl dev.cpu.0.freq_levels
>> dev.cpu.0.freq_levels: 1000/980 800/807 600/609
> 
> That looks about right.  On a Core2Duo (still on 9.3) I get:
> dev.est.1.freq_settings: 2401/35000 2400/35000 1600/15000 800/12000
> dev.est.0.freq_settings: 2401/35000 2400/35000 1600/15000 800/12000
> dev.cpu.0.freq_levels: 2401/35000 2400/35000 1600/15000 800/12000
> dev.cpu.0.freq: 800
> 
> But only because I'd added to /boot/loader.conf:
> 
> hint.p4tcc.0.disabled=1
> hint.acpi_throttle.0.disabled=1
> 

the above are in my device.hints, so I assume they now standard.

> which became the defaults sometime, maybe not before 11.0?  Otherwise 
> mine would look more similar to the one below, with all 12.5% increments 
> in frequency enabled, which doesn't actually save any power at all.
> 
>> FreeBSD apu-5 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #0 21e9d1ca9b80 (11) 
>> tip: Tue May 30 11:51:48 IDT 2017
>> apu-5# sysctl dev.cpu.0.freq_levels
>> dev.cpu.0.freq_levels: 1000/966 875/845 800/795 700/695 600/600 525/525 
>> 450/450 375/375 300/300 225/225 150/150 75/75
> 
> Looks like either p4tcc or acpi_throttle is enabled?  See cpufreq(4).
> As above, these don't buy you anything but extra busyness for powerd.
> 
> Also noticed that the (nice, low!) milliwatt figures for 1000/800/600 
> freqs are a bit different to the -stable one.  Slightly Different model?
> 
>> FreeBSD apu-1 10.3-STABLE FreeBSD 10.3-STABLE #4 267788fd852c (10) tip: Tue 
>> Jan 10 09:09:00 IST 2017
>> apu-1# sysctl dev.cpu.0.freq_levels
>> dev.cpu.0.freq_levels: 1000/-1 875/-1 750/-1 625/-1 500/-1 375/-1 250/-1 
>> 125/-1
> 
> And that looks like est(4) isn't enabled/attaching at all .. see dmesg 
> on all of these for clues.
> 
>> so, any ideas as to what is going on?
> 
> Pure guesswork on experience with older versions, I'm not up to date.
> 
> cheers, Ian

well, since I’m mostly interested in 11.1 at the moment, what you are saying is 
that’s ok,
fine by me, 

thanks,
danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

issues with powerd/freq_levels

2017-07-31 Thread Daniel Braniss

I am trying out PCengines latest apu2 boards, and I just noticed that with 
different Freebsd versions I get
different freq_levels, and so when idling, each box (have 5) has a different
freq/temperature value, ranging
from 125/69.1C, 600/59.0C to 75/56.0C

FreeBSD apu-4 11.1-STABLE FreeBSD 11.1-STABLE #5 f565b5a06ab3 (11) tip: Mon Jul 
31 09:36:33 IDT 2017
apu-4# sysctl dev.cpu.0.freq_levels
dev.cpu.0.freq_levels: 1000/980 800/807 600/609

FreeBSD apu-5 11.1-PRERELEASE FreeBSD 11.1-PRERELEASE #0 21e9d1ca9b80 (11) tip: 
Tue May 30 11:51:48 IDT 2017
apu-5# sysctl dev.cpu.0.freq_levels
dev.cpu.0.freq_levels: 1000/966 875/845 800/795 700/695 600/600 525/525 450/450 
375/375 300/300 225/225 150/150 75/75


FreeBSD apu-1 10.3-STABLE FreeBSD 10.3-STABLE #4 267788fd852c (10) tip: Tue Jan 
10 09:09:00 IST 2017
apu-1# sysctl dev.cpu.0.freq_levels
dev.cpu.0.freq_levels: 1000/-1 875/-1 750/-1 625/-1 500/-1 375/-1 250/-1 125/-1

so, any ideas as to what is going on?

thanks,
danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: 11.1-Beta panics as vmware guest

2017-06-13 Thread Daniel Braniss


> On 13 Jun 2017, at 11:14, Daniel Braniss <da...@cs.huji.ac.il> wrote:
> 
> hi,
> this only happens sometimes, and on boot:
> sorry, at the moment I don’t have a serial console, so:
hrumph, I guess it was cut out, so here is the url:
 
<http://www.cs.huji.ac.il/~danny/Screen%20Shot%202017-06-13%20at%2011.09.21.png>http://www.cs.huji.ac.il/~danny/Screen%20Shot%202017-06-13%20at%2011.09.21.png
 
<http://www.cs.huji.ac.il/~danny/Screen%20Shot%202017-06-13%20at%2011.09.21.png>


___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

11.1-Beta panics as vmware guest

2017-06-13 Thread Daniel Braniss

hi,
this only happens sometimes, and on boot:
sorry, at the moment I don’t have a serial console, so:
___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: strange cat behaviour

2017-01-20 Thread Daniel Braniss


> On 20 Jan 2017, at 10:20, Marko Cupać  wrote:
> 
> Hi,
> 
> I noticed strange behaviour when listing log file which contains
> non-ascii characters with cat. It appears to hang at certain non-ascii
> character. Issuing ctrl+c un-hangs it and displays the rest of the log.
> 
> Here's where it hangs (I redacted non-relevant private information):
> 
> [94098] [Tue Jan 17 07:25:27 2017] [info]:
> RT::User::CanonicalizeUserInfoFromExternalAuth returning EmailAddress:
> aleksandra.surn...@example.org, Name: aleksandra.surname,
> Organization: Example, RealName: Aleksandra Ä
> 
> less doesn't seem to have this problem, here's how above line looks
> there:
> 
> [94098] [Tue Jan 17 07:25:27 2017] [info]:
> RT::User::CanonicalizeUserInfoFromExternalAuth returning EmailAddress:
> aleksandra.surn...@example.org, Name: aleksandra. surname,
> Organization: Example, RealName: Aleksandra
> ÄorÄeviÄ, WorkPhone: +381 66 666 666
> (/usr/local/lib/perl5/site_perl/RT/User.p m:811)
> 
> Any advice appreciated.

some control characters will confuse the terminal emulator, so try cat -v.

danny

> -- 
> Before enlightenment - chop wood, draw water.
> After  enlightenment - chop wood, draw water.
> 
> Marko Cupać
> https://www.mimar.rs/
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: NFS and amd on older FreeBSD

2017-01-13 Thread Daniel Braniss


> On 12 Jan 2017, at 21:01, Karl Young <ka...@kipshouse.org> wrote:
> 
> Daniel Braniss(da...@cs.huji.ac.il)@2017.01.12 10:25:03 +0200:
>> 
>>> On 12 Jan 2017, at 9:49 AM, Daniel Braniss <da...@cs.huji.ac.il> wrote:
>>> 
>>> 
>>>> On 12 Jan 2017, at 1:47 AM, Karl Young <ka...@kipshouse.org> wrote:
>>>> 
>>>> I inherited a lab that has a few hundred hosts running FreeBSD 7.2.
>>>> These hosts run test scripts that access files that are stored on
>>>> FreeBSD 6.3 host.  The 6.3 host exports a /data directory with NFS
>>>> 
>>>> 
>>>> On the 7.2 hosts, I can see the exported directory:
>>>> 
>>>> $ showmount -e 6.3-host
>>>> Exports list on 6.3-host
>>>> /data  Everyone
>>>> 
>>>> And access it with amd
>>>> 
>>>> $ ls -l /net/6-3.host/data
>>>> 
>>>> drwxr-xr-x 5 root  wheel  512 Jun  4  2009 git
>>>> drwxr-xr-x  4586 root  wheel83968 Nov  2 04:50 home
>>>> 
>>>> I'm trying to retire the 6.3 host and replace it with 9.3 (I know it's
>>>> old, but it's the best I can do for now).
>>>> 
>>>> I export the /data directory on the 9.3 system, and I can see it on my
>>>> 7.2 hosts.
>>>> 
>>>> $ showmount -e  9.3-host
>>>> Exports list on 9.3-host:
>>>> /data   Everyone
>>>> 
>>>> But I can't automount it:
>>>> 
>>>> $ ls -l /net/9.3-host/data
>>>> ls: /net/9.3-host/data: No such file or directory
>>>> 
>>>> If I manually mount the exported directory, it works:
>>>> 
>>>> $ sudo mount -t nfs 9.3-host:/data /mnt/data/
>>>> $ mount | grep nfs
>>>> 9.3-host:/data on /mnt/data (nfs)
>>>> 
>>>> $ ls -l /mnt/data
>>>> total 4
>>>> drwxr-xr-x  9 root  wheel  512 Dec 20 17:41 iaf2
>>>> 
>>>> I've spent some time on Google, but haven't found a solution.  I realize
>>>> these are very old versions, but I'm not in a position to upgrade them
>>>> right now.  My last resort will be to use /etc/fstab to do the NFS
>>>> mount, but I'd rather avoid that if I can.
>>>> 
>>>> Thanks for any pointers on how to resolve this.
>>>> 
>>>> -karl
>>>> 
>>>> 
>>> 
>>> if you changed the export file on the server after you tried to mount in on 
>>> the client,
>>> and will not realise this, if that’s the case, usually rebooting the client 
>>> helps.
>>> 
>> s/and/amd/ ^%$# hate spell checkers
>> 
> 
> Thanks Danny
> 
> I did try rebooting the client (and server) multiple times to no avail.


what does amq say?
you can, from another host do: amq -h client-host

btw, I thing that nfs_server must also run on the client …
I have nfs_server_enable=YES

danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: NFS and amd on older FreeBSD

2017-01-12 Thread Daniel Braniss


> On 12 Jan 2017, at 9:49 AM, Daniel Braniss <da...@cs.huji.ac.il> wrote:
> 
> 
>> On 12 Jan 2017, at 1:47 AM, Karl Young <ka...@kipshouse.org> wrote:
>> 
>> I inherited a lab that has a few hundred hosts running FreeBSD 7.2.
>> These hosts run test scripts that access files that are stored on
>> FreeBSD 6.3 host.  The 6.3 host exports a /data directory with NFS
>> 
>> 
>> On the 7.2 hosts, I can see the exported directory:
>> 
>> $ showmount -e 6.3-host
>> Exports list on 6.3-host
>> /data  Everyone
>> 
>> And access it with amd
>> 
>> $ ls -l /net/6-3.host/data
>> 
>> drwxr-xr-x 5 root  wheel  512 Jun  4  2009 git
>> drwxr-xr-x  4586 root  wheel83968 Nov  2 04:50 home
>> 
>> I'm trying to retire the 6.3 host and replace it with 9.3 (I know it's
>> old, but it's the best I can do for now).
>> 
>> I export the /data directory on the 9.3 system, and I can see it on my
>> 7.2 hosts.
>> 
>> $ showmount -e  9.3-host
>> Exports list on 9.3-host:
>> /data   Everyone
>> 
>> But I can't automount it:
>> 
>> $ ls -l /net/9.3-host/data
>> ls: /net/9.3-host/data: No such file or directory
>> 
>> If I manually mount the exported directory, it works:
>> 
>> $ sudo mount -t nfs 9.3-host:/data /mnt/data/
>> $ mount | grep nfs
>> 9.3-host:/data on /mnt/data (nfs)
>> 
>> $ ls -l /mnt/data
>> total 4
>> drwxr-xr-x  9 root  wheel  512 Dec 20 17:41 iaf2
>> 
>> I've spent some time on Google, but haven't found a solution.  I realize
>> these are very old versions, but I'm not in a position to upgrade them
>> right now.  My last resort will be to use /etc/fstab to do the NFS
>> mount, but I'd rather avoid that if I can.
>> 
>> Thanks for any pointers on how to resolve this.
>> 
>> -karl
>> 
>> 
> 
> if you changed the export file on the server after you tried to mount in on 
> the client,
> and will not realise this, if that’s the case, usually rebooting the client 
> helps.
> 
s/and/amd/ ^%$# hate spell checkers

> my .5 cents
> 
>   danny
> 
>> 
>> ___
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: NFS and amd on older FreeBSD

2017-01-11 Thread Daniel Braniss


> On 12 Jan 2017, at 1:47 AM, Karl Young  wrote:
> 
> I inherited a lab that has a few hundred hosts running FreeBSD 7.2.
> These hosts run test scripts that access files that are stored on
> FreeBSD 6.3 host.  The 6.3 host exports a /data directory with NFS
> 
> 
> On the 7.2 hosts, I can see the exported directory:
> 
> $ showmount -e 6.3-host
> Exports list on 6.3-host
> /data  Everyone
> 
> And access it with amd
> 
> $ ls -l /net/6-3.host/data
> 
> drwxr-xr-x 5 root  wheel  512 Jun  4  2009 git
> drwxr-xr-x  4586 root  wheel83968 Nov  2 04:50 home
> 
> I'm trying to retire the 6.3 host and replace it with 9.3 (I know it's
> old, but it's the best I can do for now).
> 
> I export the /data directory on the 9.3 system, and I can see it on my
> 7.2 hosts.
> 
> $ showmount -e  9.3-host
> Exports list on 9.3-host:
> /data   Everyone
> 
> But I can't automount it:
> 
> $ ls -l /net/9.3-host/data
> ls: /net/9.3-host/data: No such file or directory
> 
> If I manually mount the exported directory, it works:
> 
> $ sudo mount -t nfs 9.3-host:/data /mnt/data/
> $ mount | grep nfs
> 9.3-host:/data on /mnt/data (nfs)
> 
> $ ls -l /mnt/data
> total 4
> drwxr-xr-x  9 root  wheel  512 Dec 20 17:41 iaf2
> 
> I've spent some time on Google, but haven't found a solution.  I realize
> these are very old versions, but I'm not in a position to upgrade them
> right now.  My last resort will be to use /etc/fstab to do the NFS
> mount, but I'd rather avoid that if I can.
> 
> Thanks for any pointers on how to resolve this.
> 
> -karl
> 
> 

if you changed the export file on the server after you tried to mount in on the 
client,
and will not realise this, if that’s the case, usually rebooting the client 
helps.

my .5 cents

danny

> 
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Problems with piped tar

2016-08-23 Thread Daniel Braniss


> On 23 Aug 2016, at 10:35, Gerhard Schmidt <esta...@ze.tum.de> wrote:
> 
> Am 23.08.2016 um 09:18 schrieb Daniel Braniss:
>> 
>>> On 23 Aug 2016, at 10:06, Gerhard Schmidt <esta...@ze.tum.de> wrote:
>>> 
>>> Hi,
>>> 
>>> i'm quite often use tar to copy files using
>>> 
>>> tar cf - /some/dir | (cd /dest/dir; tar xvvf - )
>> the ‘new’ way:
>>  tar cf - /some/dir | tar xvvd - -C /dest/dir
>> which of course does not explain way your version hangs, but this one is 
>> cleaner, and btw, don’t
>> include /.
> 
> that's very strange. It's working, but doesn't solve another related
> problem. When i pipe the tar thru nc a have the same problem as my
> version. And it's no difference if there is a tar c an the receiving end
> of nc or just a '> file.tar’
> 

try with different shells. (sh/bash/csh/zsh/…)


> Regards
>   Estartu
> 
> 
> 
> -- 
> -
> Gerhard Schmidt   | E-Mail: schm...@ze.tum.de <mailto:schm...@ze.tum.de>
> TU-München  | Jabber: esta...@ze.tum.de <mailto:esta...@ze.tum.de>
> WWW & Online Services |
> Tel: 089/289-25270|
> Fax: 089/289-25257| PGP-Publickey auf Anfrage

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Problems with piped tar

2016-08-23 Thread Daniel Braniss


> On 23 Aug 2016, at 10:18, Daniel Braniss <da...@cs.huji.ac.il> wrote:
> 
>> 
>> On 23 Aug 2016, at 10:06, Gerhard Schmidt <esta...@ze.tum.de 
>> <mailto:esta...@ze.tum.de>> wrote:
>> 
>> Hi,
>> 
>> i'm quite often use tar to copy files using
>> 
>> tar cf - /some/dir | (cd /dest/dir; tar xvvf - )
> the ‘new’ way:
>   tar cf - /some/dir | tar xvvd - -C /dest/dir
s/xvvd/xvvf/

> which of course does not explain way your version hangs, but this one is 
> cleaner, and btw, don’t
> include /.
> 
>> 
>> the files are copied without a problem but the reading tar never
>> terminates and so the whole command never terminates.
>> 
>> This is new since FreeBSD 10.
>> 
>> Regards
>>  Estartu
>> 
>> 
>> -- 
>> --
>> Gerhard Schmidt| E-Mail: schm...@ze.tum.de
>> Technische Universität München | Jabber: esta...@ze.tum.de
>> WWW & Online Services  |
>> Tel: +49 89 289-25270  | PGP-PublicKey
>> Fax: +49 89 289-25257  | on request
>> ___
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 
> ___
> freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable 
> <https://lists.freebsd.org/mailman/listinfo/freebsd-stable>
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org 
> <mailto:freebsd-stable-unsubscr...@freebsd.org>"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: Problems with piped tar

2016-08-23 Thread Daniel Braniss


> On 23 Aug 2016, at 10:06, Gerhard Schmidt  wrote:
> 
> Hi,
> 
> i'm quite often use tar to copy files using
> 
> tar cf - /some/dir | (cd /dest/dir; tar xvvf - )
the ‘new’ way:
tar cf - /some/dir | tar xvvd - -C /dest/dir
which of course does not explain way your version hangs, but this one is 
cleaner, and btw, don’t
include /.

> 
> the files are copied without a problem but the reading tar never
> terminates and so the whole command never terminates.
> 
> This is new since FreeBSD 10.
> 
> Regards
>   Estartu
> 
> 
> -- 
> --
> Gerhard Schmidt| E-Mail: schm...@ze.tum.de
> Technische Universität München | Jabber: esta...@ze.tum.de
> WWW & Online Services  |
> Tel: +49 89 289-25270  | PGP-PublicKey
> Fax: +49 89 289-25257  | on request
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: UEFI & ZFS

2016-02-14 Thread Daniel Braniss


> On 14 Feb 2016, at 11:52, Yamagi Burmeister  wrote:
> 
> Hello,
> this is a known problem with Intel Skylake CPUs. Legacy boot os dead
> slow, UEFI boot is blazing fast. Have a look at this thread, it contains
> some more informations:
> 
> https://lists.freebsd.org/pipermail/freebsd-current/2015-December/059037.html
> 
> As far as I know now one has found / analyzed the root cause of this
> until now.

when saying ‘slow’, do you see slowness when printing output to the screen?
I mention this, because in the past I saw something similar, and it was a
misconfiguration with the serial console …

danny

> 
> Regard,
> 
> 
> On Fri, 12 Feb 2016 15:36:10 -0500
> "Thomas Laus"  wrote:
> 
>>> I have a new Asus H170-Plus-D3 motherboard that will be used for a DOM0 Xen
>>> Server.  It uses an Intel i5-6300 processor and a Samsung 840 EVO SSD.  I
>>> would like to use ZFS on this new installation.  The Xen Kernel does not
>>> have UEFI support at this time, so I installed FreeBSD CURRENT r295345 in
>>> 'legacy mode'.  It takes about 7 minutes to go from the first '|' character
>>> to getting the 'beastie' menu.  I changed the BIOS to UEFI and did another
>>> installation.  The boot process goes in an instant.
>> 
>> Several others have the same problem. See here on the freebsd forums:
>> 
>> http://tinyurl.com/z9oldkc
>> 
>> That is my exact problem.  It takes 4 minutes to get a complete 'beastie' 
>> menu and 7 minutes 34 seconds to login.
>> 
>> Tom
>> 
>> -- 
>> Public Keys:
>> PGP KeyID = 0x5F22FDC1
>> GnuPG KeyID = 0x620836CF
>> 
>> ___
>> freebsd-stable@freebsd.org mailing list
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"
> 
> 
> -- 
> Homepage:  www.yamagi.org
> XMPP:  yam...@yamagi.org
> GnuPG/GPG: 0xEFBCCBCB
> ___
> freebsd-stable@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-25 Thread Daniel Braniss


 On Aug 24, 2015, at 3:25 PM, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Daniel Braniss wrote:
 
 On 24 Aug 2015, at 10:22, Hans Petter Selasky h...@selasky.org wrote:
 
 On 08/24/15 01:02, Rick Macklem wrote:
 The other thing is the degradation seems to cut the rate by about half
 each time.
 300--150--70 I have no idea if this helps to explain it.
 
 Might be a NUMA binding issue for the processes involved.
 
 man cpuset
 
 --HPS
 
 I can’t see how this is relevant, given that the same host, using the
 mellanox/mlxen
 behave much better.
 Well, the ix driver has a bunch of tunables for things like number of 
 queues
 and although I'll admit I don't understand how these queues are used, I think
 they are related to CPUs and their caches. There is also something called 
 IXGBE_FDIR,
 which others have recommended be disabled. (The code is #ifdef IXGBE_FDIR, 
 but I don't
 know if it defined for your kernel?) There are also tunables for interrupt 
 rate and
 something called hw.ixgbe_tx_process_limit, which appears to limit the number 
 of packets
 to send or something like that?
 (I suspect Hans would understand this stuff much better than I do, since I 
 don't understand
 it at all.;-)
 
but how does this explain the fact that, at the same time,
the throughput to the NetApp is about 70MG/s while to
a FreeBSD it’s above 150MB/s? (window size negotiation?)
switching off TSO evens out this diff.

 At a glance, the mellanox  driver looks very different.
 
 I’m getting different results with the intel/ix depending who is the nfs
 server
 
 Who knows until you figure out what is actually going on. It could just be 
 the timing of
 handling the write RPCs or when the different servers send acks for the TCP 
 segments or ...
 that causes this for one server and not another.
 
 One of the principals used when investigating airplane accidents is to never 
 assume anything
 and just try to collect the facts until the pieces of the puzzle fall in 
 place. I think the
 same principal works for this kind of stuff.
 I once had a case where a specific read of one NFS file would fail on certain 
 machines.
 I won't bore you with the details, but after weeks we got to the point where 
 we had a lab
 of identical machines (exactly the same hardware and exactly the same 
 software loaded on them)
 and we could reproduce this problem on about half the machines and not the 
 other half. We
 (myself and the guy I worked with) finally noticed the failing machines were 
 on network ports
 for a given switch. We moved the net cables to another switch and the problem 
 went away.
 -- This particular network switch was broken in such a way that it would 
 garble one specific
packet consistently, but worked fine for everything else.
 My point here is that, if someone had suggested the network switch might be 
 broken at the
 beginning of investigating this, I would have probably dismissed it, based on 
 the network is
 working just fine, but in the end, that was the problem.
 -- I am not suggesting you have a broken network switch, just don't take 
 anything off the
table until you know what is actually going on.
 
 And to be honest, you may never know, but it is fun to try and solve these 
 puzzles.

one needs to find the clues …
at the moment:
when things go bad, they stay bad
ix/nfs/tcp/tso and NetApp
when things are ok, the numbers fluctuate, which is probably due to 
loads
on the system, but they are far above the 70MB/s (100 to 200)

 Beyond what I already suggested, I'd look at the ix driver's stats and 
 tunables and
 see if any of the tunables has an effect. (And, yes, it will take time to 
 work through these.)
 



 Good luck with it, rick
 
 
 danny
 
 ___
 freebsd-stable@freebsd.org mailto:freebsd-stable@freebsd.org mailing list
 https://lists.freebsd.org/mailman/listinfo/freebsd-stable 
 https://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org 
 mailto:freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-24 Thread Daniel Braniss


 On 24 Aug 2015, at 10:22, Hans Petter Selasky h...@selasky.org wrote:
 
 On 08/24/15 01:02, Rick Macklem wrote:
 The other thing is the degradation seems to cut the rate by about half each 
 time.
 300--150--70 I have no idea if this helps to explain it.
 
 Might be a NUMA binding issue for the processes involved.
 
 man cpuset
 
 --HPS

I can’t see how this is relevant, given that the same host, using the 
mellanox/mlxen
behave much better.
I’m getting different results with the intel/ix depending who is the nfs server


danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-24 Thread Daniel Braniss


 On 24 Aug 2015, at 02:02, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Daniel Braniss wrote:
 
 On 22 Aug 2015, at 14:59, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Daniel Braniss wrote:
 
 On Aug 22, 2015, at 12:46 AM, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Yonghyeon PYUN wrote:
 On Wed, Aug 19, 2015 at 09:00:35AM -0400, Rick Macklem wrote:
 Hans Petter Selasky wrote:
 On 08/19/15 09:42, Yonghyeon PYUN wrote:
 On Wed, Aug 19, 2015 at 09:00:52AM +0200, Hans Petter Selasky wrote:
 On 08/18/15 23:54, Rick Macklem wrote:
 Ouch! Yes, I now see that the code that counts the # of mbufs is
 before
 the
 code that adds the tcp/ip header mbuf.
 
 In my opinion, this should be fixed by setting if_hw_tsomaxsegcount
 to
 whatever
 the driver provides - 1. It is not the driver's responsibility to
 know if
 a tcp/ip
 header mbuf will be added and is a lot less confusing that
 expecting
 the
 driver
 author to know to subtract one. (I had mistakenly thought that
 tcp_output() had
 added the tc/ip header mbuf before the loop that counts mbufs in
 the
 list.
 Btw,
 this tcp/ip header mbuf also has leading space for the MAC layer
 header.)
 
 
 Hi Rick,
 
 Your question is good. With the Mellanox hardware we have separate
 so-called inline data space for the TCP/IP headers, so if the TCP
 stack
 subtracts something, then we would need to add something to the
 limit,
 because then the scatter gather list is only used for the data part.
 
 
 I think all drivers in tree don't subtract 1 for
 if_hw_tsomaxsegcount.  Probably touching Mellanox driver would be
 simpler than fixing all other drivers in tree.
 
 Maybe it can be controlled by some kind of flag, if all the three
 TSO
 limits should include the TCP/IP/ethernet headers too. I'm pretty
 sure
 we want both versions.
 
 
 Hmm, I'm afraid it's already complex.  Drivers have to tell almost
 the same information to both bus_dma(9) and network stack.
 
 Don't forget that not all drivers in the tree set the TSO limits
 before
 if_attach(), so possibly the subtraction of one TSO fragment needs to
 go
 into ip_output() 
 
 Ok, I realized that some drivers may not know the answers before
 ether_ifattach(),
 due to the way they are configured/written (I saw the use of
 if_hw_tsomax_update()
 in the patch).
 
 I was not able to find an interface that configures TSO parameters
 after if_t conversion.  I'm under the impression
 if_hw_tsomax_update() is not designed to use this way.  Probably we
 need a better one?(CCed to Gleb).
 
 
 If it is subtracted as a part of the assignment to if_hw_tsomaxsegcount
 in
 tcp_output()
 at line#791 in tcp_output() like the following, I don't think it should
 matter if the
 values are set before ether_ifattach()?
 /*
  * Subtract 1 for the tcp/ip header mbuf that
  * will be prepended to the mbuf chain in this
  * function in the code below this block.
  */
 if_hw_tsomaxsegcount = tp-t_tsomaxsegcount - 1;
 
 I don't have a good solution for the case where a driver doesn't plan
 on
 using the
 tcp/ip header provided by tcp_output() except to say the driver can add
 one
 to the
 setting to compensate for that (and if they fail to do so, it still
 works,
 although
 somewhat suboptimally). When I now read the comment in sys/net/if_var.h
 it
 is clear
 what it means, but for some reason I didn't read it that way before? (I
 think it was
 the part that said the driver didn't have to subtract for the headers
 that
 confused me?)
 In any case, we need to try and come up with a clear definition of what
 they need to
 be set to.
 
 I can now think of two ways to deal with this:
 1 - Leave tcp_output() as is, but provide a macro for the device driver
 authors to use
  that sets if_hw_tsomaxsegcount with a flag for driver uses tcp/ip
  header mbuf,
  documenting that this flag should normally be true.
 OR
 2 - Change tcp_output() as above, noting that this is a workaround for
 confusion w.r.t.
  whether or not if_hw_tsomaxsegcount should include the tcp/ip header
  mbuf and
  update the comment in if_var.h to reflect this. Then drivers that
  don't
  use the
  tcp/ip header mbuf can increase their value for if_hw_tsomaxsegcount
  by
  1.
  (The comment should also mention that a value of 35 or greater is
  much
  preferred to
   32 if the hardware will support that.)
 
 
 Both works for me.  My preference is 2 just because it's very
 common for most drivers that use tcp/ip header mbuf.
 Thanks for this comment. I tend to agree, both for the reason you state
 and
 also
 because the patch is simple enough that it might qualify as an errata for
 10.2.
 
 I am hoping Daniel Braniss will be able to test the patch and let us know
 if it
 improves performance with TSO enabled?
 
 send me the patch and I’ll test it ASAP.
danny
 
 Patch is attached. The one for head will also include an update to the
 comment

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-23 Thread Daniel Braniss


 On 22 Aug 2015, at 14:59, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Daniel Braniss wrote:
 
 On Aug 22, 2015, at 12:46 AM, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Yonghyeon PYUN wrote:
 On Wed, Aug 19, 2015 at 09:00:35AM -0400, Rick Macklem wrote:
 Hans Petter Selasky wrote:
 On 08/19/15 09:42, Yonghyeon PYUN wrote:
 On Wed, Aug 19, 2015 at 09:00:52AM +0200, Hans Petter Selasky wrote:
 On 08/18/15 23:54, Rick Macklem wrote:
 Ouch! Yes, I now see that the code that counts the # of mbufs is
 before
 the
 code that adds the tcp/ip header mbuf.
 
 In my opinion, this should be fixed by setting if_hw_tsomaxsegcount
 to
 whatever
 the driver provides - 1. It is not the driver's responsibility to
 know if
 a tcp/ip
 header mbuf will be added and is a lot less confusing that expecting
 the
 driver
 author to know to subtract one. (I had mistakenly thought that
 tcp_output() had
 added the tc/ip header mbuf before the loop that counts mbufs in the
 list.
 Btw,
 this tcp/ip header mbuf also has leading space for the MAC layer
 header.)
 
 
 Hi Rick,
 
 Your question is good. With the Mellanox hardware we have separate
 so-called inline data space for the TCP/IP headers, so if the TCP
 stack
 subtracts something, then we would need to add something to the limit,
 because then the scatter gather list is only used for the data part.
 
 
 I think all drivers in tree don't subtract 1 for
 if_hw_tsomaxsegcount.  Probably touching Mellanox driver would be
 simpler than fixing all other drivers in tree.
 
 Maybe it can be controlled by some kind of flag, if all the three TSO
 limits should include the TCP/IP/ethernet headers too. I'm pretty sure
 we want both versions.
 
 
 Hmm, I'm afraid it's already complex.  Drivers have to tell almost
 the same information to both bus_dma(9) and network stack.
 
 Don't forget that not all drivers in the tree set the TSO limits before
 if_attach(), so possibly the subtraction of one TSO fragment needs to go
 into ip_output() 
 
 Ok, I realized that some drivers may not know the answers before
 ether_ifattach(),
 due to the way they are configured/written (I saw the use of
 if_hw_tsomax_update()
 in the patch).
 
 I was not able to find an interface that configures TSO parameters
 after if_t conversion.  I'm under the impression
 if_hw_tsomax_update() is not designed to use this way.  Probably we
 need a better one?(CCed to Gleb).
 
 
 If it is subtracted as a part of the assignment to if_hw_tsomaxsegcount
 in
 tcp_output()
 at line#791 in tcp_output() like the following, I don't think it should
 matter if the
 values are set before ether_ifattach()?
   /*
* Subtract 1 for the tcp/ip header mbuf that
* will be prepended to the mbuf chain in this
* function in the code below this block.
*/
   if_hw_tsomaxsegcount = tp-t_tsomaxsegcount - 1;
 
 I don't have a good solution for the case where a driver doesn't plan on
 using the
 tcp/ip header provided by tcp_output() except to say the driver can add
 one
 to the
 setting to compensate for that (and if they fail to do so, it still
 works,
 although
 somewhat suboptimally). When I now read the comment in sys/net/if_var.h
 it
 is clear
 what it means, but for some reason I didn't read it that way before? (I
 think it was
 the part that said the driver didn't have to subtract for the headers
 that
 confused me?)
 In any case, we need to try and come up with a clear definition of what
 they need to
 be set to.
 
 I can now think of two ways to deal with this:
 1 - Leave tcp_output() as is, but provide a macro for the device driver
 authors to use
   that sets if_hw_tsomaxsegcount with a flag for driver uses tcp/ip
   header mbuf,
   documenting that this flag should normally be true.
 OR
 2 - Change tcp_output() as above, noting that this is a workaround for
 confusion w.r.t.
   whether or not if_hw_tsomaxsegcount should include the tcp/ip header
   mbuf and
   update the comment in if_var.h to reflect this. Then drivers that
   don't
   use the
   tcp/ip header mbuf can increase their value for if_hw_tsomaxsegcount
   by
   1.
   (The comment should also mention that a value of 35 or greater is much
   preferred to
32 if the hardware will support that.)
 
 
 Both works for me.  My preference is 2 just because it's very
 common for most drivers that use tcp/ip header mbuf.
 Thanks for this comment. I tend to agree, both for the reason you state and
 also
 because the patch is simple enough that it might qualify as an errata for
 10.2.
 
 I am hoping Daniel Braniss will be able to test the patch and let us know
 if it
 improves performance with TSO enabled?
 
 send me the patch and I’ll test it ASAP.
  danny
 
 Patch is attached. The one for head will also include an update to the comment
 in sys/net/if_var.h, but that isn't needed for testing.


well, the plot thickens.

Yesterday, before running the new kernel, I

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-22 Thread Daniel Braniss



 On Aug 22, 2015, at 12:46 AM, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Yonghyeon PYUN wrote:
 On Wed, Aug 19, 2015 at 09:00:35AM -0400, Rick Macklem wrote:
 Hans Petter Selasky wrote:
 On 08/19/15 09:42, Yonghyeon PYUN wrote:
 On Wed, Aug 19, 2015 at 09:00:52AM +0200, Hans Petter Selasky wrote:
 On 08/18/15 23:54, Rick Macklem wrote:
 Ouch! Yes, I now see that the code that counts the # of mbufs is
 before
 the
 code that adds the tcp/ip header mbuf.
 
 In my opinion, this should be fixed by setting if_hw_tsomaxsegcount
 to
 whatever
 the driver provides - 1. It is not the driver's responsibility to
 know if
 a tcp/ip
 header mbuf will be added and is a lot less confusing that expecting
 the
 driver
 author to know to subtract one. (I had mistakenly thought that
 tcp_output() had
 added the tc/ip header mbuf before the loop that counts mbufs in the
 list.
 Btw,
 this tcp/ip header mbuf also has leading space for the MAC layer
 header.)
 
 
 Hi Rick,
 
 Your question is good. With the Mellanox hardware we have separate
 so-called inline data space for the TCP/IP headers, so if the TCP
 stack
 subtracts something, then we would need to add something to the limit,
 because then the scatter gather list is only used for the data part.
 
 
 I think all drivers in tree don't subtract 1 for
 if_hw_tsomaxsegcount.  Probably touching Mellanox driver would be
 simpler than fixing all other drivers in tree.
 
 Maybe it can be controlled by some kind of flag, if all the three TSO
 limits should include the TCP/IP/ethernet headers too. I'm pretty sure
 we want both versions.
 
 
 Hmm, I'm afraid it's already complex.  Drivers have to tell almost
 the same information to both bus_dma(9) and network stack.
 
 Don't forget that not all drivers in the tree set the TSO limits before
 if_attach(), so possibly the subtraction of one TSO fragment needs to go
 into ip_output() 
 
 Ok, I realized that some drivers may not know the answers before
 ether_ifattach(),
 due to the way they are configured/written (I saw the use of
 if_hw_tsomax_update()
 in the patch).
 
 I was not able to find an interface that configures TSO parameters
 after if_t conversion.  I'm under the impression
 if_hw_tsomax_update() is not designed to use this way.  Probably we
 need a better one?(CCed to Gleb).
 
 
 If it is subtracted as a part of the assignment to if_hw_tsomaxsegcount in
 tcp_output()
 at line#791 in tcp_output() like the following, I don't think it should
 matter if the
 values are set before ether_ifattach()?
 /*
  * Subtract 1 for the tcp/ip header mbuf that
  * will be prepended to the mbuf chain in this
  * function in the code below this block.
  */
 if_hw_tsomaxsegcount = tp-t_tsomaxsegcount - 1;
 
 I don't have a good solution for the case where a driver doesn't plan on
 using the
 tcp/ip header provided by tcp_output() except to say the driver can add one
 to the
 setting to compensate for that (and if they fail to do so, it still works,
 although
 somewhat suboptimally). When I now read the comment in sys/net/if_var.h it
 is clear
 what it means, but for some reason I didn't read it that way before? (I
 think it was
 the part that said the driver didn't have to subtract for the headers that
 confused me?)
 In any case, we need to try and come up with a clear definition of what
 they need to
 be set to.
 
 I can now think of two ways to deal with this:
 1 - Leave tcp_output() as is, but provide a macro for the device driver
 authors to use
that sets if_hw_tsomaxsegcount with a flag for driver uses tcp/ip
header mbuf,
documenting that this flag should normally be true.
 OR
 2 - Change tcp_output() as above, noting that this is a workaround for
 confusion w.r.t.
whether or not if_hw_tsomaxsegcount should include the tcp/ip header
mbuf and
update the comment in if_var.h to reflect this. Then drivers that don't
use the
tcp/ip header mbuf can increase their value for if_hw_tsomaxsegcount by
1.
(The comment should also mention that a value of 35 or greater is much
preferred to
 32 if the hardware will support that.)
 
 
 Both works for me.  My preference is 2 just because it's very
 common for most drivers that use tcp/ip header mbuf.
 Thanks for this comment. I tend to agree, both for the reason you state and 
 also
 because the patch is simple enough that it might qualify as an errata for 
 10.2.
 
 I am hoping Daniel Braniss will be able to test the patch and let us know if 
 it
 improves performance with TSO enabled?

send me the patch and I’ll test it ASAP.
danny

 
 rick
 
 ___
 freebsd-stable@freebsd.org mailing list
 https://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-19 Thread Daniel Braniss


 On 19 Aug 2015, at 16:00, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Hans Petter Selasky wrote:
 On 08/19/15 09:42, Yonghyeon PYUN wrote:
 On Wed, Aug 19, 2015 at 09:00:52AM +0200, Hans Petter Selasky wrote:
 On 08/18/15 23:54, Rick Macklem wrote:
 Ouch! Yes, I now see that the code that counts the # of mbufs is before
 the
 code that adds the tcp/ip header mbuf.
 
 In my opinion, this should be fixed by setting if_hw_tsomaxsegcount to
 whatever
 the driver provides - 1. It is not the driver's responsibility to know if
 a tcp/ip
 header mbuf will be added and is a lot less confusing that expecting the
 driver
 author to know to subtract one. (I had mistakenly thought that
 tcp_output() had
 added the tc/ip header mbuf before the loop that counts mbufs in the
 list.
 Btw,
 this tcp/ip header mbuf also has leading space for the MAC layer header.)
 
 
 Hi Rick,
 
 Your question is good. With the Mellanox hardware we have separate
 so-called inline data space for the TCP/IP headers, so if the TCP stack
 subtracts something, then we would need to add something to the limit,
 because then the scatter gather list is only used for the data part.
 
 
 I think all drivers in tree don't subtract 1 for
 if_hw_tsomaxsegcount.  Probably touching Mellanox driver would be
 simpler than fixing all other drivers in tree.
 
 Maybe it can be controlled by some kind of flag, if all the three TSO
 limits should include the TCP/IP/ethernet headers too. I'm pretty sure
 we want both versions.
 
 
 Hmm, I'm afraid it's already complex.  Drivers have to tell almost
 the same information to both bus_dma(9) and network stack.
 
 Don't forget that not all drivers in the tree set the TSO limits before
 if_attach(), so possibly the subtraction of one TSO fragment needs to go
 into ip_output() 
 
 Ok, I realized that some drivers may not know the answers before 
 ether_ifattach(),
 due to the way they are configured/written (I saw the use of 
 if_hw_tsomax_update()
 in the patch).
 
 If it is subtracted as a part of the assignment to if_hw_tsomaxsegcount in 
 tcp_output()
 at line#791 in tcp_output() like the following, I don't think it should 
 matter if the
 values are set before ether_ifattach()?
   /*
* Subtract 1 for the tcp/ip header mbuf that
* will be prepended to the mbuf chain in this
* function in the code below this block.
*/
   if_hw_tsomaxsegcount = tp-t_tsomaxsegcount - 1;
 
 I don't have a good solution for the case where a driver doesn't plan on 
 using the
 tcp/ip header provided by tcp_output() except to say the driver can add one 
 to the
 setting to compensate for that (and if they fail to do so, it still works, 
 although
 somewhat suboptimally). When I now read the comment in sys/net/if_var.h it is 
 clear
 what it means, but for some reason I didn't read it that way before? (I think 
 it was
 the part that said the driver didn't have to subtract for the headers that 
 confused me?)
 In any case, we need to try and come up with a clear definition of what they 
 need to
 be set to.
 
 I can now think of two ways to deal with this:
 1 - Leave tcp_output() as is, but provide a macro for the device driver 
 authors to use
that sets if_hw_tsomaxsegcount with a flag for driver uses tcp/ip header 
 mbuf,
documenting that this flag should normally be true.
 OR
 2 - Change tcp_output() as above, noting that this is a workaround for 
 confusion w.r.t.
whether or not if_hw_tsomaxsegcount should include the tcp/ip header mbuf 
 and
update the comment in if_var.h to reflect this. Then drivers that don't 
 use the
tcp/ip header mbuf can increase their value for if_hw_tsomaxsegcount by 1.
(The comment should also mention that a value of 35 or greater is much 
 preferred to
 32 if the hardware will support that.)
 
 Also, I'd like to apologize for some of my emails getting a little blunt. I 
 just find
 it flustrating that this problem is still showing up and is even in 10.2. 
 This is partly
 my fault for not making it clearer to driver authors what 
 if_hw_tsomaxsegcount should be
 set to, because I had it incorrect.
 
 Hopefully we can come up with a solution that everyone is comfortable with, 
 rick


ok guys,
when you have some code for me to try just let me know.

danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-18 Thread Daniel Braniss


 On Aug 18, 2015, at 12:49 AM, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Daniel Braniss wrote:
 
 On Aug 17, 2015, at 3:21 PM, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Daniel Braniss wrote:
 
 On Aug 17, 2015, at 1:41 PM, Christopher Forgeron csforge...@gmail.com
 wrote:
 
 FYI, I can regularly hit 9.3 Gib/s with my Intel X520-DA2's and FreeBSD
 10.1. Before 10.1 it was less.
 
 
 this is NOT iperf/3 where i do get close to wire speed,
 it’s NFS writes, i.e., almost real work :-)
 
 I used to tweak the card settings, but now it's just stock. You may want
 to
 check your settings, the Mellanox may just have better defaults for your
 switch.
 
 Have you tried disabling TSO for the Intel? With TSO enabled, it will be
 copying
 every transmitted mbuf chain to a new chain of mbuf clusters via.
 m_defrag() when
 TSO is enabled. (Assuming you aren't an 82598 chip. Most seem to be the
 82599 chip
 these days?)
 
 
 hi Rick
 
 how can i check the chip?
 
 Haven't a clue. Does dmesg tell you? (To be honest, since disabling TSO 
 helped,
 I'll bet you don't have a 82598.)
 
 This has been fixed in the driver very recently, but those fixes won't be
 in 10.1.
 
 rick
 ps: If you could test with 10.2, it would be interesting to see how the ix
 does with
   the current driver fixes in it?
 
 I new TSO was involved!
 ok, firstly, it’s 10.2 stable.
 with TSO enabled, ix is bad, around 64MGB/s.
 disabling TSO it’s better, around 130
 
 Hmm, could you check to see of these lines are in sys/dev/ixgbe/if_ix.c at 
 around
 line#2500?
  /* TSO parameters */
 2572   ifp-if_hw_tsomax = 65518;
 2573   ifp-if_hw_tsomaxsegcount = IXGBE_82599_SCATTER;
 2574   ifp-if_hw_tsomaxsegsize = 2048;
 
 They are in stable/10. I didn't look at releng/10.2. (And if they're in a 
 #ifdef
 for FreeBSD11, take the #ifdef away.)
 If they are there and not ifdef'd, I can't explain why disabling TSO would 
 help.
 Once TSO is fixed so that it handles the 64K transmit segments without 
 copying all
 the mbufs, I suspect you might get better perf. with it enabled?
 

this is 10.2 :
they are on lines  2509-2511 and I don’t see any #ifdefs around it.

the plot thickens :-)

danny

 Good luck with it, rick
 
 still, mlxen0 is about 250! with and without TSO
 
 
 
 On Mon, Aug 17, 2015 at 6:41 AM, Slawa Olhovchenkov s...@zxy.spb.ru
 mailto:s...@zxy.spb.ru wrote:
 On Mon, Aug 17, 2015 at 10:27:41AM +0300, Daniel Braniss wrote:
 
 hi,
 I have a host (Dell R730) with both cards, connected to an HP8200
 switch at 10Gb.
 when writing to the same storage (netapp) this is what I get:
 ix0:~130MGB/s
 mlxen0  ~330MGB/s
 this is via nfs/tcpv3
 
 I can get similar (bad) performance with the mellanox if I increase
 the file size
 to 512MGB.
 
 Look like mellanox have internal beffer for caching and do ACK
 acclerating.
 
 so at face value, it seems the mlxen does a better use of resources
 than the intel.
 Any ideas how to improve ix/intel's performance?
 
 Are you sure about netapp performance?
 ___
 freebsd-...@freebsd.org mailto:freebsd-...@freebsd.org mailing list
 https://lists.freebsd.org/mailman/listinfo/freebsd-net
 https://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
 mailto:freebsd-net-unsubscr...@freebsd.org
 
 
 ___
 freebsd-stable@freebsd.org mailing list
 https://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 
 ___
 freebsd-stable@freebsd.org mailing list
 https://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-18 Thread Daniel Braniss

sorry, it’s been a tough day, we had a major meltdown, caused by a faulty gbic 
:-(
anyways, could you tell me what to do?
comment out, fix the off by one?

the machine is not yet production.

thanks,
danny

 On 18 Aug 2015, at 16:32, Hans Petter Selasky h...@selasky.org wrote:
 
 On 08/18/15 14:53, Rick Macklem wrote:
 2572  ifp-if_hw_tsomax = 65518;
 2573ifp-if_hw_tsomaxsegcount = 
 IXGBE_82599_SCATTER;
 2574ifp-if_hw_tsomaxsegsize = 2048;
 
 Hi,
 
 If IXGBE_82599_SCATTER is the maximum scatter/gather entries the hardware can 
 do, remember to subtract one fragment for the TCP/IP-header mbuf!
 
 I think there is an off-by-one here:
 
 ifp-if_hw_tsomax = 65518;
 ifp-if_hw_tsomaxsegcount = IXGBE_82599_SCATTER - 1;
 ifp-if_hw_tsomaxsegsize = 2048;
 
 Refer to:
 
 *
 * NOTE: The TSO limits only apply to the data payload part of
 * a TCP/IP packet. That means there is no need to subtract
 * space for ethernet-, vlan-, IP- or TCP- headers from the
 * TSO limits unless the hardware driver in question requires
 * so.
 
 In sys/net/if_var.h
 
 Thank you!
 
 --HPS
 

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-17 Thread Daniel Braniss


 On Aug 17, 2015, at 12:41 PM, Slawa Olhovchenkov s...@zxy.spb.ru wrote:
 
 On Mon, Aug 17, 2015 at 10:27:41AM +0300, Daniel Braniss wrote:
 
 hi,
  I have a host (Dell R730) with both cards, connected to an HP8200 
 switch at 10Gb.
  when writing to the same storage (netapp) this is what I get:
  ix0:~130MGB/s
  mlxen0  ~330MGB/s
  this is via nfs/tcpv3
 
  I can get similar (bad) performance with the mellanox if I increase the 
 file size
  to 512MGB.
 
 Look like mellanox have internal beffer for caching and do ACK acclerating.
what ever they are doing, it’s impressive :-)

 
  so at face value, it seems the mlxen does a better use of resources 
 than the intel.
  Any ideas how to improve ix/intel's performance?
 
 Are you sure about netapp performance?

yes, and why should it act differently if the request is coming from the same 
host? in any case
the numbers are quiet consistent since I have measured it from several hosts, 
and at different times.

danny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-17 Thread Daniel Braniss


 On Aug 17, 2015, at 1:41 PM, Christopher Forgeron csforge...@gmail.com 
 wrote:
 
 FYI, I can regularly hit 9.3 Gib/s with my Intel X520-DA2's and FreeBSD 10.1. 
 Before 10.1 it was less.
 

this is NOT iperf/3 where i do get close to wire speed,
it’s NFS writes, i.e., almost real work :-)

 I used to tweak the card settings, but now it's just stock. You may want to 
 check your settings, the Mellanox may just have better defaults for your 
 switch. 
 
 On Mon, Aug 17, 2015 at 6:41 AM, Slawa Olhovchenkov s...@zxy.spb.ru 
 mailto:s...@zxy.spb.ru wrote:
 On Mon, Aug 17, 2015 at 10:27:41AM +0300, Daniel Braniss wrote:
 
  hi,
I have a host (Dell R730) with both cards, connected to an HP8200 
  switch at 10Gb.
when writing to the same storage (netapp) this is what I get:
ix0:~130MGB/s
mlxen0  ~330MGB/s
this is via nfs/tcpv3
 
I can get similar (bad) performance with the mellanox if I increase 
  the file size
to 512MGB.
 
 Look like mellanox have internal beffer for caching and do ACK acclerating.
 
so at face value, it seems the mlxen does a better use of resources 
  than the intel.
Any ideas how to improve ix/intel's performance?
 
 Are you sure about netapp performance?
 ___
 freebsd-...@freebsd.org mailto:freebsd-...@freebsd.org mailing list
 https://lists.freebsd.org/mailman/listinfo/freebsd-net 
 https://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org 
 mailto:freebsd-net-unsubscr...@freebsd.org
 

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-17 Thread Daniel Braniss


 On Aug 17, 2015, at 3:21 PM, Rick Macklem rmack...@uoguelph.ca wrote:
 
 Daniel Braniss wrote:
 
 On Aug 17, 2015, at 1:41 PM, Christopher Forgeron csforge...@gmail.com
 wrote:
 
 FYI, I can regularly hit 9.3 Gib/s with my Intel X520-DA2's and FreeBSD
 10.1. Before 10.1 it was less.
 
 
 this is NOT iperf/3 where i do get close to wire speed,
 it’s NFS writes, i.e., almost real work :-)
 
 I used to tweak the card settings, but now it's just stock. You may want to
 check your settings, the Mellanox may just have better defaults for your
 switch.
 
 Have you tried disabling TSO for the Intel? With TSO enabled, it will be 
 copying
 every transmitted mbuf chain to a new chain of mbuf clusters via. m_defrag() 
 when
 TSO is enabled. (Assuming you aren't an 82598 chip. Most seem to be the 82599 
 chip
 these days?)
 

hi Rick

how can i check the chip?

 This has been fixed in the driver very recently, but those fixes won't be in 
 10.1.
 
 rick
 ps: If you could test with 10.2, it would be interesting to see how the ix 
 does with
the current driver fixes in it?

I new TSO was involved! 
ok, firstly, it’s 10.2 stable.
with TSO enabled, ix is bad, around 64MGB/s.
disabling TSO it’s better, around 130

still, mlxen0 is about 250! with and without TSO


 
 On Mon, Aug 17, 2015 at 6:41 AM, Slawa Olhovchenkov s...@zxy.spb.ru
 mailto:s...@zxy.spb.ru wrote:
 On Mon, Aug 17, 2015 at 10:27:41AM +0300, Daniel Braniss wrote:
 
 hi,
  I have a host (Dell R730) with both cards, connected to an HP8200
  switch at 10Gb.
  when writing to the same storage (netapp) this is what I get:
  ix0:~130MGB/s
  mlxen0  ~330MGB/s
  this is via nfs/tcpv3
 
  I can get similar (bad) performance with the mellanox if I increase
  the file size
  to 512MGB.
 
 Look like mellanox have internal beffer for caching and do ACK acclerating.
 
  so at face value, it seems the mlxen does a better use of resources
  than the intel.
  Any ideas how to improve ix/intel's performance?
 
 Are you sure about netapp performance?
 ___
 freebsd-...@freebsd.org mailto:freebsd-...@freebsd.org mailing list
 https://lists.freebsd.org/mailman/listinfo/freebsd-net
 https://lists.freebsd.org/mailman/listinfo/freebsd-net
 To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org
 mailto:freebsd-net-unsubscr...@freebsd.org
 
 
 ___
 freebsd-stable@freebsd.org mailing list
 https://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

ix(intel) vs mlxen(mellanox) 10Gb performance

2015-08-17 Thread Daniel Braniss

hi,
I have a host (Dell R730) with both cards, connected to an HP8200 
switch at 10Gb.
when writing to the same storage (netapp) this is what I get:
ix0:~130MGB/s
mlxen0  ~330MGB/s
this is via nfs/tcpv3

I can get similar (bad) performance with the mellanox if I increase the 
file size
to 512MGB.
so at face value, it seems the mlxen does a better use of resources 
than the intel.
Any ideas how to improve ix/intel’s performance?

cheers,
dnny

___
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

/usr/bin/unzip regression?

2015-08-03 Thread Daniel Braniss

hi,
the file in question is rather big, 5.3G,
older 9.3-stable works ok, while 10.x complains with
unzip: Invalid central directory signature

so is this a regression?

thanks,
danny

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Dell r730 + PERC H730P problem

2015-07-22 Thread Daniel Braniss

Hi
I’m running a pretty new 10.2-BETA, on a freshly arrived box,
and trying it out.

after creating a raid5 via mfiutil, and 
trying to restore(8) to it gives:
...
Jul 22 15:48:30 store-07 kernel: mfi0: Failed to get command
Jul 22 15:49:24 store-07 kernel: mfi0: COMMAND 0xfe00013a6440 TIMEOUT AFTER 
37 SECONDS
Jul 22 15:49:24 store-07 kernel: mfi0: COMMAND 0xfe00013aa158 TIMEOUT AFTER 
37 SECONDS
Jul 22 15:49:24 store-07 kernel: mfi0: COMMAND 0xfe00013a6000 TIMEOUT AFTER 
37 SECONDS
…

and only power cycle worked.

We have several Dell/Percs with similar configuration working fine (though the 
PERCs are older),
this is Perc H730P Mini.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Dell r730 + PERC H730P problem fixed

2015-07-22 Thread Daniel Braniss

set
hw.mfi.mrsas_enable=1
in loader.conf

 On 22 Jul 2015, at 16:33, Daniel Braniss da...@cs.huji.ac.il wrote:
 
 Hi
 I’m running a pretty new 10.2-BETA, on a freshly arrived box,
 and trying it out.
 
 after creating a raid5 via mfiutil, and 
 trying to restore(8) to it gives:
 ...
 Jul 22 15:48:30 store-07 kernel: mfi0: Failed to get command
 Jul 22 15:49:24 store-07 kernel: mfi0: COMMAND 0xfe00013a6440 TIMEOUT 
 AFTER 37 SECONDS
 Jul 22 15:49:24 store-07 kernel: mfi0: COMMAND 0xfe00013aa158 TIMEOUT 
 AFTER 37 SECONDS
 Jul 22 15:49:24 store-07 kernel: mfi0: COMMAND 0xfe00013a6000 TIMEOUT 
 AFTER 37 SECONDS
 …
 
 and only power cycle worked.
 
 We have several Dell/Percs with similar configuration working fine (though 
 the PERCs are older),
 this is Perc H730P Mini.
 
 
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: dev.cpu.0.freq disapeared

2015-03-22 Thread Daniel Braniss

Hi Jeremy,
I have a similar issue with an Sun Fire X2200:
CPU: Dual-Core AMD Opteron(tm) Processor 2218 (2613.45-MHz K8-class CPU)
  Origin=AuthenticAMD  Id=0x40f13  Family=0xf  Model=0x41  Stepping=3
  
Features=0x178bfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2,HTT
  Features2=0x2001SSE3,CX16
  AMD Features=0xea500800SYSCALL,NX,MMX+,FFXSR,RDTSCP,LM,3DNow!+,3DNow!
  AMD Features2=0x1fLAHF,CMP,SVM,ExtAPIC,CR8
  SVM: NAsids=64

and setting debug.cpufreq.verbose=“1
does not make a difference (ran diff with and without it)
you can get it at:
ftp://ftp.cs.huji.ac.il/users/danny/freebsd/dmesg.boot 
ftp://ftp.cs.huji.ac.il/users/danny/freebsd/dmesg.boot

cheers,
danny

 On Mar 22, 2015, at 7:53 AM, Peter Jeremy pe...@rulingia.com wrote:
 
 On 2015-Mar-22 00:58:55 +0300, Dmitry Sivachenko trtrmi...@gmail.com wrote:
 I have a machine with the following processor:
 
 CPU: Intel(R) Xeon(R) CPU   E5620  @ 2.40GHz (2400.14-MHz K8-class 
 CPU)
 Origin=GenuineIntel  Id=0x206c2  Family=0x6  Model=0x2c  Stepping=2
 ...
 After I upgraded to 10.1-STABLE #0 r279956, this sysctl disapeared.
 % sysctl dev.cpu.0.freq
 sysctl: unknown oid 'dev.cpu.0.freq': No such file or directory
 %
 
 What OIDs do you have?  Does dev.cpu.0 exist?  How about dev.cpu?
 
 Can you set 'debug. in /boot/loader.conf and post
 (or make available) the dmesg from a verbose boot.
 
 -- 
 Peter Jeremy

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Intel 10Gb network card

2013-09-11 Thread Daniel Braniss

 On 09/04/2013 08:25 AM, Daniel Braniss wrote:
  Q: does the copper (10G Based T) version work?
 
 Works fine for me.
 
Nice to know,
danke,
danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: Intel 10Gb network card

2013-09-04 Thread Daniel Braniss

thanks Luigi and Jack!

I also solved the question by doing
grep -ir 82599EB /sys/dev
and it found the ixgbe driver - may the src be with you :-)

My point - not well expressed - was the the manuals had little/confusing
info.

iconfig:
re0: flags=8843UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST metric 0 mtu 1500
options=8209bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,WOL_MAG
IC,LINKSTATE
...

them
man 4 re
RE(4)  FreeBSD Kernel Interfaces Manual  RE(4)

NAME
 re - RealTek 8139C+/8169/816xS/811xS/8168/810xE/8111 PCI/PCIe Ethernet
 adapter driver
or
nfe0: flags=8802BROADCAST,SIMPLEX,MULTICAST metric 0 mtu 1500
options=c219bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,TSO4,WO
L_MAGIC,VLAN_HWTSO,LINKSTATE

man 4 nfe
NFE(4) FreeBSD Kernel Interfaces Manual NFE(4)

NAME
 nfe - NVIDIA nForce MCP Ethernet driver


etc, etc, etc.

no man ix, no mention of /dev/ix%d in man ixgbe


Q: does the copper (10G Based T) version work?

cheers,
danny

 --089e0122ad0624ac8504e57d70c5
 Content-Type: text/plain; charset=ISO-8859-1
 
 ixgb is the old PCI-X based adapter, ixgbe is for all pci express hardware.
 
 The latter is almost certainly what you want :)
 
 Jack
 
 
 
 On Tue, Sep 3, 2013 at 6:24 AM, Daniel Braniss da...@cs.huji.ac.il wrote:
 
  hi,
  I have a hard time figuring this out, the kernel says:
  ...
  ix0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15 port
  0xecc0-0xecdf mem 0xd9e8-0xd9ef,0xd9ff8000-0xd9ffbfff irq 40 at
  device
  0.0 on pci4
  ix0: Using MSIX interrupts with 9 vectors
  ix0: Ethernet address: 90:e2:ba:29:c0:54
  ix0: PCI Express Bus: Speed 5.0GT/s Width x8
  ix1: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15 port
  0xece0-0xecff mem 0xd9f0-0xd9f7,0xd9ffc000-0xd9ff irq 44 at
  device
  0.1 on pci4
  ix1: Using MSIX interrupts with 9 vectors
  ix1: Ethernet address: 90:e2:ba:29:c0:55
  ix1: PCI Express Bus: Speed 5.0GT/s Width x8
  ...
 
  pciconf says:
  ix0@pci0:4:0:0: class=0x02 card=0x7a118086 chip=0x10fb8086 rev=0x01
  hdr=0x00
  vendor = 'Intel Corporation'
  device = '82599EB 10-Gigabit SFI/SFP+ Network Connection'
  class  = network
  subclass   = ethernet
 
  but both manuals ixgb and ixgbe mention a different chip, and device
  man for ixb says:
  ...
   ixgb - Intel(R) PRO/10GbE Ethernet driver for the FreeBSD operating
  sys-
   tem
  ...
  The ixgb driver provides support for PCI Gigabit Ethernet adapters
  based
   on the Intel 82597EX Ethernet controller chips.  The driver supports
 
  man for ixgbe says:
  ...
  ixgbe - Intel(R) 10Gb Ethernet driver for the FreeBSD operating system
  ...
  the Intel 82598EB
  ...
 
  to make things even more confusing, Dell says:
  DELL INTEL X520 DA2 10GBe DP+SERVER ADAPTER PCIE
 
 
  and finally, there is no man ix
 
  'will the real ix please stand up?'
  danny
 
 
 
  ___
  freebsd-stable@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
 
 
 --089e0122ad0624ac8504e57d70c5
 Content-Type: text/html; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 
 div dir=3Dltrdivixgb is the old PCI-X based adapter, ixgbe is for all=
  pci express hardware.brbr/divdivThe latter is almost certainly wha=
 t you want :)brbr/divdivJackbrbr/div/divdiv class=3Dgmail=
 _extra
 brbrdiv class=3Dgmail_quoteOn Tue, Sep 3, 2013 at 6:24 AM, Daniel B=
 raniss span dir=3Dltrlt;a href=3Dmailto:da...@cs.huji.ac.il; target=
 =3D_blankda...@cs.huji.ac.il/agt;/span wrote:brblockquote class=
 =3Dgmail_quote style=3Dmargin:0 0 0 .8ex;border-left:1px #ccc solid;padd=
 ing-left:1ex
 hi,br
 I have a hard time figuring this out, the kernel says:br
 ...br
 ix0: lt;Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15gt=
 ; portbr
 0xecc0-0xecdf mem 0xd9e8-0xd9ef,0xd9ff8000-0xd9ffbfff irq 40 at dev=
 icebr
 0.0 on pci4br
 ix0: Using MSIX interrupts with 9 vectorsbr
 ix0: Ethernet address: 90:e2:ba:29:c0:54br
 ix0: PCI Express Bus: Speed 5.0GT/s Width x8br
 ix1: lt;Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15gt=
 ; portbr
 0xece0-0xecff mem 0xd9f0-0xd9f7,0xd9ffc000-0xd9ff irq 44 at dev=
 icebr
 0.1 on pci4br
 ix1: Using MSIX interrupts with 9 vectorsbr
 ix1: Ethernet address: 90:e2:ba:29:c0:55br
 ix1: PCI Express Bus: Speed 5.0GT/s Width x8br
 ...br
 br
 pciconf says:br
 ix0@pci0:4:0:0: class=3D0x02 card=3D0x7a118086 chip=3D0x10fb8086 rev=3D=
 0x01br
 hdr=3D0x00br
 =A0 =A0 vendor =A0 =A0 =3D #39;Intel Corporation#39;br
 =A0 =A0 device =A0 =A0 =3D #39;82599EB 10-Gigabit SFI/SFP+ Network Connect=
 ion#39;br
 =A0 =A0 class =A0 =A0 =A0=3D networkbr
 =A0 =A0 subclass =A0 =3D ethernetbr
 br
 but both manuals ixgb and ixgbe

Intel 10Gb network card

2013-09-03 Thread Daniel Braniss

hi,
I have a hard time figuring this out, the kernel says:
...
ix0: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15 port 
0xecc0-0xecdf mem 0xd9e8-0xd9ef,0xd9ff8000-0xd9ffbfff irq 40 at device 
0.0 on pci4
ix0: Using MSIX interrupts with 9 vectors
ix0: Ethernet address: 90:e2:ba:29:c0:54
ix0: PCI Express Bus: Speed 5.0GT/s Width x8
ix1: Intel(R) PRO/10GbE PCI-Express Network Driver, Version - 2.5.15 port 
0xece0-0xecff mem 0xd9f0-0xd9f7,0xd9ffc000-0xd9ff irq 44 at device 
0.1 on pci4
ix1: Using MSIX interrupts with 9 vectors
ix1: Ethernet address: 90:e2:ba:29:c0:55
ix1: PCI Express Bus: Speed 5.0GT/s Width x8
...

pciconf says:
ix0@pci0:4:0:0: class=0x02 card=0x7a118086 chip=0x10fb8086 rev=0x01 
hdr=0x00
vendor = 'Intel Corporation'
device = '82599EB 10-Gigabit SFI/SFP+ Network Connection'
class  = network
subclass   = ethernet

but both manuals ixgb and ixgbe mention a different chip, and device
man for ixb says:
...
 ixgb - Intel(R) PRO/10GbE Ethernet driver for the FreeBSD operating sys-
 tem
...
The ixgb driver provides support for PCI Gigabit Ethernet adapters based
 on the Intel 82597EX Ethernet controller chips.  The driver supports
 
man for ixgbe says:
...
ixgbe - Intel(R) 10Gb Ethernet driver for the FreeBSD operating system
...
the Intel 82598EB
...

to make things even more confusing, Dell says:
DELL INTEL X520 DA2 10GBe DP+SERVER ADAPTER PCIE


and finally, there is no man ix

'will the real ix please stand up?'
danny



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: another? NFS deadlock on 9.2-PRERELEASE

2013-08-28 Thread Daniel Braniss

 Daniel Braniss wrote:
   Daniel Braniss wrote:
 Daniel Braniss wrote:
  I upgraded our web server, and only after 3 hours it hung :-(
  (as a side note, I have 2 other web servers, also running 9.2
  doing
  great :-)
  go figure.
  
  anyways, in
  ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/0
  
  is the info after a forced panic.
  
 Looks like the same hang to me. Several threads are sleeping on
 pgrbwt
 and lots are waiting for an NFS vnode lock.
 
 It should be fixed in RC3 (or revert r250907). If it still
 hangs
 with
 RC3 (or r250907 reverted), email again.
 
im following stable, hence it's till calling itself
9.2-PRERELEASE,
but
I did a sync this morning - local time, after rc3 was anounced.
but after 3.45 minutes is hung, data in
ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/1

I can't easely revert r250907, since i'm using mercuriall, but if
someone
can send me the pre r250907 files, i'll try.

 The pre-r250907 version of uipc_syscalls is at:
  http://people.freebsd.org/~rmacklem/uipc_syscalls.c
 in case you want to try it.
 
thanks, I think I have a kernel pre r250907.
In the meantime I did 2 things:
1- made the /(root) local - as opposed to nfs'ed
2- have a watchdog to reboot in case of hang

the host has been up for more than 14hs ( i doubt it's because of 2 -)

lets see how things develope

thanks,
danny

 rick
 
   r254947, which was committed to stable/9 a few hours ago is
   believed to
   fix the problem. Please update your stable/9 to post-r254947 and
   try it.
   
  the current kernel has that fix (sys/kern/uipc_syscalls.c)
  and if you check the core.txt/1 you will see no pgrbwt, only newnsf
  ...
  
  danny
  
   rick
   
thanks,
danny

 rick
 
  my guts say its running out of resources - mainly network
  related,
  but
  can't pinpoint it.
  
  any help will be most welcomed
  
  cheers,
  danny
  
  
  
  
  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to
freebsd-stable-unsubscr...@freebsd.org

  
  
  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: another? NFS deadlock on 9.2-PRERELEASE

2013-08-27 Thread Daniel Braniss

 Daniel Braniss wrote:
  I upgraded our web server, and only after 3 hours it hung :-(
  (as a side note, I have 2 other web servers, also running 9.2 doing
  great :-)
  go figure.
  
  anyways, in
  ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/0
  
  is the info after a forced panic.
  
 Looks like the same hang to me. Several threads are sleeping on pgrbwt
 and lots are waiting for an NFS vnode lock.
 
 It should be fixed in RC3 (or revert r250907). If it still hangs with
 RC3 (or r250907 reverted), email again.
 
im following stable, hence it's till calling itself 9.2-PRERELEASE, but
I did a sync this morning - local time, after rc3 was anounced.
but after 3.45 minutes is hung, data in 
ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/1

I can't easely revert r250907, since i'm using mercuriall, but if someone
can send me the pre r250907 files, i'll try.

thanks,
danny

 rick
 
  my guts say its running out of resources - mainly network related,
  but
  can't pinpoint it.
  
  any help will be most welcomed
  
  cheers,
  danny
  
  
  
  
  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: another? NFS deadlock on 9.2-PRERELEASE

2013-08-27 Thread Daniel Braniss

 Daniel Braniss wrote:
   Daniel Braniss wrote:
I upgraded our web server, and only after 3 hours it hung :-(
(as a side note, I have 2 other web servers, also running 9.2
doing
great :-)
go figure.

anyways, in
ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/0

is the info after a forced panic.

   Looks like the same hang to me. Several threads are sleeping on
   pgrbwt
   and lots are waiting for an NFS vnode lock.
   
   It should be fixed in RC3 (or revert r250907). If it still hangs
   with
   RC3 (or r250907 reverted), email again.
   
  im following stable, hence it's till calling itself 9.2-PRERELEASE,
  but
  I did a sync this morning - local time, after rc3 was anounced.
  but after 3.45 minutes is hung, data in
  ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/1
  
  I can't easely revert r250907, since i'm using mercuriall, but if
  someone
  can send me the pre r250907 files, i'll try.
  
 r254947, which was committed to stable/9 a few hours ago is believed to
 fix the problem. Please update your stable/9 to post-r254947 and try it.
 
the current kernel has that fix (sys/kern/uipc_syscalls.c)
and if you check the core.txt/1 you will see no pgrbwt, only newnsf ...

danny

 rick
 
  thanks,
  danny
  
   rick
   
my guts say its running out of resources - mainly network
related,
but
can't pinpoint it.

any help will be most welcomed

cheers,
danny





  
  
  ___
  freebsd-stable@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to
  freebsd-stable-unsubscr...@freebsd.org
  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: another? NFS deadlock on 9.2-PRERELEASE

2013-08-27 Thread Daniel Braniss

 
 --wdMRLhhF94AmkTAJ
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable
 
 On Tue, Aug 27, 2013 at 05:00:14PM +0300, Daniel Braniss wrote:
   Daniel Braniss wrote:
 Daniel Braniss wrote:
  I upgraded our web server, and only after 3 hours it hung :-(
  (as a side note, I have 2 other web servers, also running 9.2
  doing
  great :-)
  go figure.
 =20
  anyways, in
  ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/0
 =20
  is the info after a forced panic.
 =20
 Looks like the same hang to me. Several threads are sleeping on
 pgrbwt
 and lots are waiting for an NFS vnode lock.
=20
 It should be fixed in RC3 (or revert r250907). If it still hangs
 with
 RC3 (or r250907 reverted), email again.
=20
im following stable, hence it's till calling itself 9.2-PRERELEASE,
but
I did a sync this morning - local time, after rc3 was anounced.
but after 3.45 minutes is hung, data in
ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/1
   =20
I can't easely revert r250907, since i'm using mercuriall, but if
someone
can send me the pre r250907 files, i'll try.
   =20
   r254947, which was committed to stable/9 a few hours ago is believed to
   fix the problem. Please update your stable/9 to post-r254947 and try it.
  =20
  the current kernel has that fix (sys/kern/uipc_syscalls.c)
  and if you check the core.txt/1 you will see no pgrbwt, only newnsf ...
 
 There is almost no useful information in the core.txt/1.
 Provide the known data for the deadlock.

maybe the word deadlock is too strong, the host is diskless (one of many)
and so when NFS stops working/respondig, it hangs.
I will now make it dataless - the / will be from local disk, see if that
makes it easier to debug.




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

another? NFS deadlock on 9.2-PRERELEASE

2013-08-26 Thread Daniel Braniss

I upgraded our web server, and only after 3 hours it hung :-(
(as a side note, I have 2 other web servers, also running 9.2 doing great :-)
go figure.

anyways, in
ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/0

is the info after a forced panic.

my guts say its running out of resources - mainly network related, but
can't pinpoint it.

any help will be most welcomed

cheers,
danny




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

9.2-beta, [zpool] iostat fibbing?

2013-08-14 Thread Daniel Braniss

hi,
after upgrading to 9.2-betat2 (from 9.1-stable), it seems zpool iostat n is
not giving a real picture. Running iostat gives a different picture.
My suspicion got triggerd by a very repetitive output from zpool iostat:
zpool iostat h 5
   capacity operationsbandwidth
poolalloc   free   read  write   read  write
--  -  -  -  -  -  -
h   26.7T  5.84T  9214   722K  10.9M
h   26.7T  5.84T 12 60   514K  2.90M
h   26.7T  5.84T  9214   721K  10.9M
h   26.7T  5.84T  9214   722K  10.9M
h   26.7T  5.84T  9214   722K  10.9M
h   26.7T  5.84T  9214   722K  10.9M
h   26.7T  5.84T  9214   722K  10.9M
h   26.7T  5.84T  9214   722K  10.9M
h   26.7T  5.84T  9214   722K  10.9M
h   26.7T  5.84T  9214   722K  10.9M
h   26.7T  5.84T  9214   723K  10.9M

while at the same time:
iostat mfid1 5
   tty   mfid1 cpu
 tin  tout  KB/t tps  MB/s  us ni sy in id
   018 53.17 224 11.62   0  0  7  1 92
   080 60.16 211 12.38   1  0 10  1 88
   0 9 44.36  97  4.20   1  0 11  1 87
   020 34.85 134  4.55   1  0 11  1 87
   020 23.67 196  4.52   1  0 10  1 88
   0 9 71.10 159 11.05   1  0 12  1 86
   0 9 51.47  89  4.46   1  0 10  2 88
   0   166 43.40 104  4.41   1  0 10  1 88
   012 42.47  99  4.11   1  0 10  1 88
   030 57.82 196 11.05   1  0 11  1 87
   020 30.07 176  5.18   1  0 14  2 84
   020 27.34 171  4.56   1  0 12  2 86
   020 26.20 179  4.57   1  0 12  2 86
   020 66.01 224 14.42   1  0 12  1 86
   020 30.82 145  4.35   1  0  7  1 91
   030 29.81 167  4.85   1  0  8  1 90
   020 31.50 165  5.08   1  0  9  1 89
   020 53.95 253 13.33   1  0 10  2 87
   020 29.30 148  4.23   1  0 11  1 88


the 5 sec interval seems to be more of a 'suggestion' :-)
and another thing, hitting ^T usually gives:
load: 4.18  cmd: zpool 32267 [spa_namespace_lock] 58.64r 0.00u 0.01s 0% 3080k


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

9.2-BETA2 panics

2013-08-13 Thread Daniel Braniss

Hi,
this host (used as a zfs server), was working since 8.2, actually was working
nicely under 9.1, but after upgrading to the latest 9.2, it panics, 2 days
in a row. Appart of being a newer version, it's now dataless while I run it
through the loops - which could be the reason for the panics, so while I 
prepare
it to run off a local disk, and see if that mitigates the problem,
could someone take a look?

Fatal trap 12: page fault while in kernel mode
cpuid = 21; apic id = 25
fault virtual address   = 0xfcc0
fault code  = supervisor read data, page not present
instruction pointer = 0x20:0x80d17f66
stack pointer   = 0x28:0xff8d77b415b0
frame pointer   = 0x28:0xff8d77b415f0
code segment= base 0x0, limit 0xf, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags= interrupt enabled, resume, IOPL = 0
current process = 89 (txg_thread_enter)
trap number = 12
panic: page fault
cpuid = 21
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame 0xff8d77b41040
kdb_backtrace() at kdb_backtrace+0x37/frame 0xff8d77b41100
panic() at panic+0x1ce/frame 0xff8d77b41200
trap_fatal() at trap_fatal+0x290/frame 0xff8d77b41260
trap_pfault() at trap_pfault+0x211/frame 0xff8d77b412f0
trap() at trap+0x344/frame 0xff8d77b414f0
calltrap() at calltrap+0x8/frame 0xff8d77b414f0
--- trap 0xc, rip = 0x80d17f66, rsp = 0xff8d77b415b0, rbp = 
0xff8d77b415f0 ---
bcopy() at bcopy+0x16/frame 0xff8d77b415f0
kthread_add() at kthread_add+0xe4/frame 0xff8d77b41710
kproc_kthread_add() at kproc_kthread_add+0xe1/frame 0xff8d77b418c0
spa_sync() at spa_sync+0x8d1/frame 0xff8d77b41990
txg_sync_thread() at txg_sync_thread+0x139/frame 0xff8d77b41aa0
fork_exit() at fork_exit+0x11f/frame 0xff8d77b41af0
fork_trampoline() at fork_trampoline+0xe/frame 0xff8d77b41af0
--- trap 0, rip = 0, rsp = 0xff8d77b41bb0, rbp = 0 ---
Uptime: 21h5m46s

more info at:
ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/1

thanks,
danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: 9.2-BETA2 panics

2013-08-13 Thread Daniel Braniss

 
 
 - Original Message -
  Hi,
  this host (used as a zfs server), was working since 8.2, actually was 
  working
  nicely under 9.1, but after upgrading to the latest 9.2, it panics, 2 days
  in a row. Appart of being a newer version, it's now dataless while I run it
  through the loops - which could be the reason for the panics, so while I
  prepare
  it to run off a local disk, and see if that mitigates the problem,
  could someone take a look?
  
  Fatal trap 12: page fault while in kernel mode
  cpuid = 21; apic id = 25
  fault virtual address   = 0xfcc0
  fault code  = supervisor read data, page not present
  instruction pointer = 0x20:0x80d17f66
  stack pointer   = 0x28:0xff8d77b415b0
  frame pointer   = 0x28:0xff8d77b415f0
  code segment= base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres 1, long 1, def32 0, gran 1
  processor eflags= interrupt enabled, resume, IOPL = 0
  current process = 89 (txg_thread_enter)
  trap number = 12
  panic: page fault
  cpuid = 21
  KDB: stack backtrace:
  db_trace_self_wrapper() at db_trace_self_wrapper+0x2a/frame
  0xff8d77b41040
  kdb_backtrace() at kdb_backtrace+0x37/frame 0xff8d77b41100
  panic() at panic+0x1ce/frame 0xff8d77b41200
  trap_fatal() at trap_fatal+0x290/frame 0xff8d77b41260
  trap_pfault() at trap_pfault+0x211/frame 0xff8d77b412f0
  trap() at trap+0x344/frame 0xff8d77b414f0
  calltrap() at calltrap+0x8/frame 0xff8d77b414f0
  --- trap 0xc, rip = 0x80d17f66, rsp = 0xff8d77b415b0, rbp =
  0xff8d77b415f0 ---
  bcopy() at bcopy+0x16/frame 0xff8d77b415f0
  kthread_add() at kthread_add+0xe4/frame 0xff8d77b41710
  kproc_kthread_add() at kproc_kthread_add+0xe1/frame 0xff8d77b418c0
  spa_sync() at spa_sync+0x8d1/frame 0xff8d77b41990
  txg_sync_thread() at txg_sync_thread+0x139/frame 0xff8d77b41aa0
  fork_exit() at fork_exit+0x11f/frame 0xff8d77b41af0
  fork_trampoline() at fork_trampoline+0xe/frame 0xff8d77b41af0
  --- trap 0, rip = 0, rsp = 0xff8d77b41bb0, rbp = 0 ---
  Uptime: 21h5m46s
  
 
 The serialization in kthread_add() is wrong. It is possible for the
 oldtd it selects to exit and be reaped before we are able duplicate
 the copy region. I have a local patch for this, and I talked with
 julian@  jhb@ about it a few weeks ago but haven't sent them a
 patch for review. I'll get to that later today.

If you need to have it checked out, I can try it out.

thanks,
danny

 
 
  more info at:
  ftp://ftp.cs.huji.ac.il/users/danny/freebsd/core.txt/1
  
  thanks,
  danny
  
  
  ___
  freebsd-stable@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-stable
  To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org
  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: [SOLVED] Re: missing /boot/menusets.4th

2013-08-10 Thread Daniel Braniss

  On Aug 9, 2013, at 7:11 AM, Teske, Devin wrote:
 On Aug 9, 2013, at 2:55 AM, Henrik Lidström wrote:
On 08/09/13 10:28, Daniel Braniss wrote:
  as of now (sorry have no rev#) the file
sys/boot/forth/menusets.4th
  Again, apologies...
  Patched stable/9 with forgotten MFC of r242688 (see recent SVN r254146).
 --  Devin

thanks, and not only for fixing it, but for the time you invest in the project!

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

missing /boot/menusets.4th

2013-08-09 Thread Daniel Braniss

as of now (sorry have no rev#) the file
sys/boot/forth/menusets.4th
is not being installed, so boot failes!

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: stopping amd causes a freeze

2013-07-28 Thread Daniel Braniss

 On 28/07/2013 08:24, Konstantin Belousov wrote:
  On Sat, Jul 27, 2013 at 10:33:18AM +0200, Dominic Fandrey wrote:
  On 26/07/2013 19:10, Dominic Fandrey wrote:
  On 25/07/2013 12:00, Konstantin Belousov wrote:
  On Thu, Jul 25, 2013 at 09:56:59AM +0200, Dominic Fandrey wrote:
  On 22/07/2013 12:07, Konstantin Belousov wrote:
  On Mon, Jul 22, 2013 at 11:50:24AM +0200, Dominic Fandrey wrote:
  ...
 
  I run amd through sysutils/automounter, which is a scripting solution
  that generates an amd.map file based on encountered devices and devd
  events. The SIGHUP it sends to amd to tell it the map file was updated
  does not cause problems, only a -SIGKILL- SIGTERM may cause the 
  freeze.
 
  Nothing was mounted (by amd) during the last freeze.
 
  ...
 
  Are you sure that the machine did not paniced ?  Do you have serial 
  console ?
 
  The amd(8) locks itself into memory, most likely due to the fear of
  deadlock. There are some known issues with user wirings in stable/9.
  If the problem you see is indeed due to wiring, you might try to apply
  r253187-r253191.
 
  I tried that. Applying the diff was straightforward enough. But the
  resulting kernel paniced as soon as it tried to mount the root fs.
  You did provided a useful info to diagnose the issue.
 
  Patch should keep KBI compatible, but, just in case, if you have any
  third-party module, rebuild it.
 
 
  So I'll wait for the MFC from someone who knows what he/she is doing.
 
  Patch below booted for me, and I run some sanity check tests for the
  mlockall(2), which also did not resulted in misbehaviour.
 
 
  Your patch applied cleanly and the system booted with the resulting
  kernel.
 
  Amd exhibits several very strange behaviours. ...
 
  I can verify the whole thing with a clean world and kernel.
 
  This time I'll concentrate on the first instance of amd:
 
  # tail -n3 /var/log/messages
  Jul 27 10:08:56 mobileKamikaze kernel: newnfs server 
  pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
  Jul 27 10:09:41 mobileKamikaze kernel: newnfs server 
  pid5868@mobileKamikaze:/var/run/automounter.amd.mnt: not responding
  Jul 27 10:11:41 mobileKamikaze last message repeated 3 times
 
  The process, it turns out, simply doesn't exist. There is another
  process, though:
  # ps auxww | grep -F sbin/amd
  root   5869   0.0  0.1  12036   8020 ??  S10:08am   0:00.01 
  /usr/sbin/amd -r -p -a /var/run/automounter.amd -c 4 -w 2 
  /var/run/automounter.amd.mnt /var/run/automounter.amd.map
 
  # cat /var/run/automounter.amd.pid
  5868
 
  Here is what I think happens, amd forks a subprocess and the main
  process, silently dies after it wrote its pidfile.
  Nothing dies silently.  Either process was killed by signal, or it
  exited with the explicit call to exit(2).  In the first case, default
  kernel settings of kern.logsigexit should make a record in the syslog.
  The machdep.uprintf_signal might be also useful, but not for daemons.
 
 Well, after I reverted your patch I got some things in the syslog.
 Sometimes amd works as expected, sometimes it dies right after starting:
 Jul 28 10:19:42 mobileKamikaze kernel: pid 24217 (amd), uid 0: exited on 
 signal 11 (core dumped)
 
 This is just all over confusing.

just to confuse you a bit more :-)
I gave up with mlockall(2) so I compiled amd statically linked.

my 5 cents.

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: NFS deadlock on 9.2-Beta1

2013-07-27 Thread Daniel Braniss

On Jul 24, 2013, at 5:25 PM, Rick Macklem rmack...@uoguelph.ca wrote:

Michael Tratz wrote:
Two machines (NFS Server: running ZFS / Client: disk-less), both are
running FreeBSD r253506. The NFS client starts to deadlock processes
within a few hours. It usually gets worse from there on. The
processes stay in D state. I haven't been able to reproduce it
when I want it to happen. I only have to wait a few hours until the
deadlocks occur when traffic to the client machine starts to pick
up. The only way to fix the deadlocks is to reboot the client. Even
an ls to the path which is deadlocked, will deadlock ls itself. It's
totally random what part of the file system gets deadlocked. The NFS
server itself has no problem at all to access the files/path when
something is deadlocked on the client.

Last night I decided to put an older kernel on the system r252025
(June 20th). The NFS server stayed untouched. So far 0 deadlocks on
the client machine (it should have deadlocked by now). FreeBSD is
working hard like it always does. :-) There are a few changes to the
NFS code from the revision which seems to work until Beta1. I
haven't tried to narrow it down if one of those commits are causing
the problem. Maybe someone has an idea what could be wrong and I can
test a patch or if it's something else, because I'm not a kernel
expert. :-)

Well, the only NFS client change committed between r252025 and r253506
is r253124. It fixes a file corruption problem caused by a previous
commit that delayed the vnode_pager_setsize() call until after the
nfs node mutex lock was unlocked.

If you can test with only r253124 reverted to see if that gets rid of
the hangs, it would be useful, although from the procstats, I doubt it.

I have run several procstat -kk on the processes including the ls
which deadlocked. You can see them here:

http://pastebin.com/1RPnFT6r

All the processes you show seem to be stuck waiting for a vnode lock
or in __utmx_op_wait. (I`m not sure what the latter means.)

What is missing is what processes are holding the vnode locks and
what they are stuck on.

A starting point might be ``ps axhl``, to see what all the threads
are doing (particularily the WCHAN for them all). If you can drop into
the debugger when the NFS mounts are hung and do a ```show alllocks``
that could help. See:
http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

I`ll admit I`d be surprised if r253124 caused this, but who knows.

If there have been changes to your network device driver between
r252025 and r253506, I`d try reverting those. (If an RPC gets stuck
waiting for a reply while holding a vnode lock, that would do it.)

Good luck with it and maybe someone else can think of a commit
between r252025 and r253506 that could cause vnode locking or network
problems.

rick

I have tried to mount the file system with and without nolockd. It
didn't make a difference. Other than that it is mounted with:

rw,nfsv3,tcp,noatime,rsize=32768,wsize=32768

Let me know if you need me to do something else or if some other
output is required. I would have to go back to the problem kernel
and wait until the deadlock occurs to get that information.

Thanks Rick and Steven for your quick replies.

I spoke too soon regarding r252025 fixing the problem. The same issue started
to show up after about 1 day and a few hours of uptime.

ps axhl shows all those stuck processes in newnfs

I recompiled the GENERIC kernel for Beta1 with the debugging options:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

ps and debugging output:

http://pastebin.com/1v482Dfw

(I only listed processes matching newnfs, if you need the whole list, please
let me know)

The first PID showing up having that problem is 14001. Certainly the show
alllocks command shows interesting information for that PID.
I looked through the commit history for those files mentioned in the output
to see if there is something obvious to me. But I don't know. :-)
I hope that information helps you to dig deeper into the issue what might be
causing those deadlocks.

I did include the pciconf -lv, because you mentioned network device drivers.
It's Intel igb. The same hardware is running a kernel from January 19th, 2013
also as an NFS client. That machine is rock solid. No problems at all.

I also went to r251611. That's before r251641 (The NFS FHA changes). Same
problem. Here is another debugging output from that kernel:

http://pastebin.com/ryv8BYc4

If I should test something else or provide some other output, please let me
know.

Again thank you!

Michael

just a quick 'me too', It usually happens on our ftp server, and it's been
happening for a long time. It's diskless, and it happens randomly, so it's
difficult to

Re: make buildworld is now 50% slower

2013-07-19 Thread Daniel Braniss

 On Sun, Jul 07, 2013 at 11:50:29AM +0300, Daniel Braniss wrote:
   On Fri, Jul 05, 2013 at 02:39:00PM +0200, Dimitry Andric wrote:
[redirecting to the correct mailing list, freebsd-stable@ ...]

On Jul 5, 2013, at 10:53, Daniel Braniss da...@cs.huji.ac.il wrote:
 after today's update of 9.1-STABLE I noticed that make 
 build[world|kernel] are
 taking conciderable more time, is it because the upgrade of clang?
 and if so, is the code produced any better?
 
 before:
 buildwordl:26m4.52s real 2h28m32.12s user 36m6.27s sys
 buildkernel:   7m29.42s real 23m22.22s user 4m26.26s sys
 
 today:
 buildwordl:   34m29.80s real 2h38m9.37s user 37m7.61s sys
 buildkernel:15m31.52s real 22m59.40s user 4m33.06s sys

Ehm, your user and sys times are not that much different at all, they
add up to about 5% slower for buildworld, and 1% faster for build 
kernel.
Are you sure nothing else is running on that machine, eating up CPU time
while you are building? :)

But yes, clang 3.3 is of course somewhat larger than 3.2.  You might
especially notice that, if you are using gcc, which is very slow at
compiling C++.

In any case, if you do not care about clang, just set WITHOUT_CLANG= in
your /etc/src.conf, and you can shave off some build time.
   
   I just built world/kernel (stable/9 r252769) 5 hours ago.  Results:
   
   time make -j4 buildworld  = roughly 21 minutes on my hardware
   time make -j4 buildkernel = roughly 8 minutes on my hardware
   
  
  It's been a long time since I saw such numbers, maybe it's time
  to see where time is being spent, I will run it without clang to compare 
  with
  your numbers.
  
   These numbers are about the norm for me, meaning I do not see a
   substantial increase in build times.
   
   Key point: I do not use/build/grok clang, i.e. WITHOUT_CLANG=true is in
   my src.conf.  But I am aware of the big clang change in r252723.
   
   If hardware details are wanted, ask, but I don't think it's relevant to
   what the root cause is.
   
  
  from what you are saying, I guess clang is not responsible.
  looking for my Sherlock Holmes hat.
 
 Some points to those numbers I stated above:
 
 - System is an Intel Q9550 with 8GB of RAM
 
 - Single SSD (UFS2+SU+TRIM) is used for root, /usr, /var, /tmp, and swap
 
 - /usr/src is on ZFS (raidz1 + 3 disks) -- however I got equally small
 numbers when it was on the SSD
 
 - /usr/src is using compression=lz4  (to folks from -fs: yeah, I'm
 trying it out to see how much of an impact it has on interactivity.  I
 can still tell when it kicks in, but it's way, way better than lzjb.
 Rather not get into that here)
 
 - Contents of /etc/src.conf (to give you some idea of what I disable):
 
 WITHOUT_ATM=true
 WITHOUT_BLUETOOTH=true
 WITHOUT_CLANG=true
 WITHOUT_FLOPPY=true
 WITHOUT_FREEBSD_UPDATE=true
 WITHOUT_INET6=true
 WITHOUT_IPFILTER=true
 WITHOUT_IPX=true
 WITHOUT_KERBEROS=true
 WITHOUT_LIB32=true
 WITHOUT_LPR=true
 WITHOUT_NDIS=true
 WITHOUT_NETGRAPH=true
 WITHOUT_PAM_SUPPORT=true
 WITHOUT_PPP=true
 WITHOUT_SENDMAIL=true
 WITHOUT_WIRELESS=true
 WITH_OPENSSH_NONE_CIPHER=true
 
 It's WITHOUT_CLANG that cuts down the buildworld time by a *huge* amount
 (I remember when it got introduced, my buildworld jumped up to something
 like 40 minutes); the rest probably save a minute or two at most.
 
 - /etc/make.conf doesn't contain much that's relevant, other than:
 
 CPUTYPE?=core2
 
 # For DTrace; also affects ports
 STRIP=
 CFLAGS+=  -fno-omit-frame-pointer
 
 - I do some tweaks in /etc/sysctl.conf (mainly vfs.read_min and
 vfs.read_max), but I will admit I am not completely sure what those
 do quite yet (I just saw the commit from scottl@ a while back talking
 about how an increased vfs.read_min helps them at Netflix quite a
 lot).  I also adjust kern.maxvnodes.
 
 - Some ZFS ARC settings are adjusted in /boot/loader.conf (I'm playing
 with some stuff I read in Andriy Gapon's ZFS PDF), but they definitely
 do not have a major impact on the numbers I listed off.
 
 - I do increase kern.maxdsiz, kern.dfldsiz, and kern.maxssiz in
 /boot/loader.conf to 2560M/2560M/256M respectively, but that was mainly
 from the days when I ran MySQL and needed a huge userland processes.
 
 All in all my numbers are low/small because of two things: the SSD, and
 WITHOUT_CLANG.
 
 Hope this gives you somewhere to start/stuff to ponder.

indeed!

on my pretty much standard dev machine, PowerEdge R710, X5550  @ 2.67GHz, 
16384 MB,
root on ufs, the rest is via ZFS, this is what I'm getting for buildworld:
with clang  27m32.91s real 2h42m52.82s user 36m20.69s sys
without clang   13m24.19s real 1h23m26.52s user 29m5.18s sys

on a similar machine but with root via NFS, and /var on ZFS, the numbers are 
similar:
with clang  23m30.92s real 2h9m8.85s user 29m27.84s sys
without clang   12m7.53s real 1h7m54.24s user 23m9.78s sys
(this host is newer, PowerEdge

Re: make buildworld is now 50% slower

2013-07-07 Thread Daniel Braniss

 On Fri, Jul 05, 2013 at 02:39:00PM +0200, Dimitry Andric wrote:
  [redirecting to the correct mailing list, freebsd-stable@ ...]
  
  On Jul 5, 2013, at 10:53, Daniel Braniss da...@cs.huji.ac.il wrote:
   after today's update of 9.1-STABLE I noticed that make 
   build[world|kernel] are
   taking conciderable more time, is it because the upgrade of clang?
   and if so, is the code produced any better?
   
   before:
   buildwordl:26m4.52s real 2h28m32.12s user 36m6.27s sys
   buildkernel:   7m29.42s real 23m22.22s user 4m26.26s sys
   
   today:
   buildwordl:   34m29.80s real 2h38m9.37s user 37m7.61s sys
   buildkernel:15m31.52s real 22m59.40s user 4m33.06s sys
  
  Ehm, your user and sys times are not that much different at all, they
  add up to about 5% slower for buildworld, and 1% faster for build kernel.
  Are you sure nothing else is running on that machine, eating up CPU time
  while you are building? :)
  
  But yes, clang 3.3 is of course somewhat larger than 3.2.  You might
  especially notice that, if you are using gcc, which is very slow at
  compiling C++.
  
  In any case, if you do not care about clang, just set WITHOUT_CLANG= in
  your /etc/src.conf, and you can shave off some build time.
 
 I just built world/kernel (stable/9 r252769) 5 hours ago.  Results:
 
 time make -j4 buildworld  = roughly 21 minutes on my hardware
 time make -j4 buildkernel = roughly 8 minutes on my hardware
 

It's been a long time since I saw such numbers, maybe it's time
to see where time is being spent, I will run it without clang to compare with
your numbers.

 These numbers are about the norm for me, meaning I do not see a
 substantial increase in build times.
 
 Key point: I do not use/build/grok clang, i.e. WITHOUT_CLANG=true is in
 my src.conf.  But I am aware of the big clang change in r252723.
 
 If hardware details are wanted, ask, but I don't think it's relevant to
 what the root cause is.
 

from what you are saying, I guess clang is not responsible.
looking for my Sherlock Holmes hat.
thanks,
danny

 -- 
 | Jeremy Chadwick   j...@koitsu.org |
 | UNIX Systems Administratorhttp://jdc.koitsu.org/ |
 | Making life hard for others since 1977. PGP 4BD6C0CB |
 


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

installworld dtrace problems

2013-06-14 Thread Daniel Braniss

with the latest changes to dtrace, make installworld has problems,
some directories are not created:
/usr/share/dtrace
/usr/share/dtrace/toolkit

creating them is a workaround.

danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-06-03 Thread Daniel Braniss

 On Fri, May 31, 2013 at 08:24:47AM +0300, Daniel Braniss wrote:
   On Thursday, May 30, 2013 2:44:35 am Daniel Braniss wrote:
 --/04w6evG8XlLl3ft
 Content-Type: text/x-diff; charset=us-ascii
 Content-Disposition: attachment; filename=bge.media_sts.diff
 
 Index: sys/dev/bge/if_bge.c
 ===
 --- sys/dev/bge/if_bge.c  (revision 251021)
 +++ sys/dev/bge/if_bge.c  (working copy)
 @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct 
 ifmediar
  
   BGE_LOCK(sc);
  
 + if ((ifp-if_flags  IFF_UP) == 0) {
 + BGE_UNLOCK(sc);
 + return;
 + }
   if (sc-bge_flags  BGE_FLAG_TBI) {
   ifmr-ifm_status = IFM_AVALID;
   ifmr-ifm_active = IFM_ETHER;
 
 --/04w6evG8XlLl3ft--
after 18hs, the logs are empty!
it seems the patch fixes the problem.

now maybe it's time to hunt for who is randomly calling for 
bge_ifmedia_sts
...
   
   It could be any number of daemons that query interface state such as an
   SNMP server, ladvd, etc.
   
   If you wanted help you could modify the patch so that it does something 
   like 
   this:
   
  #include sys/proc.h
 if (/* test for IFF_UP */) {
 BGE_UNLOCK(sc);
 if_printf(ifp, state queried on down interface by pid %d (%s),
  --|
   add a \n
 curthread-td_proc-p_pid, curthread-td_proc-p_comm);
 return;
 }
   
   -- 
   John Baldwin
  snmpd call this several times a second, (difficult to measeure since 
  sysolog 
  just says
   last message repeated 22 times
  in any case, the DOWN/UP appears once every few hours, oh well.
  I have now stopped the snmpd daemon, maybe there is someone else ...
 
 I have no idea why snmpd wants to know media status for interfaces
 that are put into down state. The media status resolved after
 bringing up the interface may be different one that was seen
 before.
 The patch also makes dhclient think driver got a valid link
 regardless of link establishment. I guess that wouldn't be
 issue though. I'll commit the patch after some more testing.
 
 Thanks for reporting and testing!

no problem!

after more than 3 days, there were no more 'reports', so snmpd was the culprit.
the snmpd we use is from ports, i'll try and see waht's going on ...

thanks
danny

  
  thanks,
  danny
  
  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-05-30 Thread Daniel Braniss

 
 --/04w6evG8XlLl3ft
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 On Tue, May 28, 2013 at 09:55:24AM +0300, Daniel Braniss wrote:
   On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote:
 On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote:
   On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote:
hi, after upgrading to 9.1-stable, this particular hardware - 
SunFire X2200,
   
   Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' 
   output.
   
  
  bge0: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 
  0x009003 mem 
  0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on 
  pci6
  bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 
  MHz
  miibus2: MII bus on bge0
  brgphy0: BCM5714 1000BASE-T media interface PHY 1 on miibus2
  brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 
  1000baseT, 
  1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, 
  auto-flow
  bge0: Ethernet address: 00:1b:24:5d:5b:bd
  bge1: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 
  0x009003 mem 
  0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on 
  pci6
  bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 
  MHz
  miibus3: MII bus on bge1
  brgphy1: BCM5714 1000BASE-T media interface PHY 1 on miibus3
  brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 
  1000baseT, 
  1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, 
  auto-flow
  bge1: Ethernet address: 00:1b:24:5d:5b:be
  
  sf-10 ifconfig bge1
  bge1: flags=8802BROADCAST,SIMPLEX,MULTICAST metric 0 mtu 1500
  
  options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
  TE
  ether 00:1b:24:5d:5b:be
  nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
  media: Ethernet autoselect (100baseTX full-duplex)
  status: active
  
 
 Because bge1 is not UP, I wonder how you get link UP/DOWN events.
 Do you have some network script run by cron?

no scripts.
this port is shared with the ILO/IPMI, and back in March you fixed a 
problem
that it was hanging soon after it was initialized by the driver,
(r248226 - but I'm not sure if it was ever MFC'ed).
   
   It was MFCed.
   
Initialy I thought it could be caused by connections to it from other
hosts (either via the web, or ssh) so I killed them, but it didn't help.
without that patch the connection fails, and I don't see any DOWN/UP.
   
   Could you check how many number of interrupts you get from bge1?
   Ideally you shouldn't get any interrupts for bge1.
  
  it's not even mentioned :-)
  sf-04 vmstat -i
  interrupt  total   rate
  irq3: uart1  964  0
  irq4: uart06  0
  irq14: ata0   227354  0
  irq17: bge0  1021981  2
  irq21: ohci0  28  0
  irq22: ehci0   2  0
  irq23: atapci1293228  0
  cpu0:timer 383244076   1124
  cpu1:timer   2225144  6
  cpu2:timer   2056087  6
  cpu3:timer   2093943  6
  Total  391162813   1147
  
 
 Then the only way link UP/DOWN event could be generated for DOWN
 interface would be invocation of media status query
 (i.e. ifconfig -a) triggered by an external application.  Most
 drivers I touched check IFF_UP flag before poking media status
 register. However I'm not sure you're seeing this issue because you
 do not use any network script run by cron.
 Anyway, try attached patch and let me know whether it makes any
 difference.
 
   

 
is toggeling bge1 DOWN/UP every few hours, this port is being 
used by the ILO.
To check, I upgraded another identical host, and the same 
problem appears. 
   
   What is the last known working revision?
  
  I have no idea, but I have older versions, and ill start from the 
  oldets 
  (9.1-prerelease), but
  it will take time, since it takes hours till it happens.
  
 
 ok.


  
  
 
 --/04w6evG8XlLl3ft
 Content-Type: text/x-diff; charset=us-ascii
 Content-Disposition: attachment; filename=bge.media_sts.diff
 
 Index: sys/dev/bge/if_bge.c
 ===
 --- sys/dev/bge/if_bge.c  (revision 251021)
 +++ sys/dev/bge/if_bge.c  (working copy)
 @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar
  
   BGE_LOCK(sc);
  
 + if ((ifp-if_flags  IFF_UP) == 0

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-05-30 Thread Daniel Braniss

 On Thursday, May 30, 2013 2:44:35 am Daniel Braniss wrote:
   --/04w6evG8XlLl3ft
   Content-Type: text/x-diff; charset=us-ascii
   Content-Disposition: attachment; filename=bge.media_sts.diff
   
   Index: sys/dev/bge/if_bge.c
   ===
   --- sys/dev/bge/if_bge.c  (revision 251021)
   +++ sys/dev/bge/if_bge.c  (working copy)
   @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar

 BGE_LOCK(sc);

   + if ((ifp-if_flags  IFF_UP) == 0) {
   + BGE_UNLOCK(sc);
   + return;
   + }
 if (sc-bge_flags  BGE_FLAG_TBI) {
 ifmr-ifm_status = IFM_AVALID;
 ifmr-ifm_active = IFM_ETHER;
   
   --/04w6evG8XlLl3ft--
  after 18hs, the logs are empty!
  it seems the patch fixes the problem.
  
  now maybe it's time to hunt for who is randomly calling for bge_ifmedia_sts
  ...
 
 It could be any number of daemons that query interface state such as an
 SNMP server, ladvd, etc.
 
 If you wanted help you could modify the patch so that it does something like 
 this:
 
#include sys/proc.h
   if (/* test for IFF_UP */) {
   BGE_UNLOCK(sc);
   if_printf(ifp, state queried on down interface by pid %d (%s),
--|
 add a \n
   curthread-td_proc-p_pid, curthread-td_proc-p_comm);
   return;
   }
 
 -- 
 John Baldwin
snmpd call this several times a second, (difficult to measeure since sysolog 
just says
 last message repeated 22 times
in any case, the DOWN/UP appears once every few hours, oh well.
I have now stopped the snmpd daemon, maybe there is someone else ...

thanks,
danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-05-29 Thread Daniel Braniss

 
 --/04w6evG8XlLl3ft
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 
 On Tue, May 28, 2013 at 09:55:24AM +0300, Daniel Braniss wrote:
   On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote:
 On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote:
   On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote:
hi, after upgrading to 9.1-stable, this particular hardware - 
SunFire X2200,
   
   Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' 
   output.
   
  
  bge0: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 
  0x009003 mem 
  0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on 
  pci6
  bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 
  MHz
  miibus2: MII bus on bge0
  brgphy0: BCM5714 1000BASE-T media interface PHY 1 on miibus2
  brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 
  1000baseT, 
  1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, 
  auto-flow
  bge0: Ethernet address: 00:1b:24:5d:5b:bd
  bge1: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 
  0x009003 mem 
  0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on 
  pci6
  bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 
  MHz
  miibus3: MII bus on bge1
  brgphy1: BCM5714 1000BASE-T media interface PHY 1 on miibus3
  brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 
  1000baseT, 
  1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, 
  auto-flow
  bge1: Ethernet address: 00:1b:24:5d:5b:be
  
  sf-10 ifconfig bge1
  bge1: flags=8802BROADCAST,SIMPLEX,MULTICAST metric 0 mtu 1500
  
  options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
  TE
  ether 00:1b:24:5d:5b:be
  nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
  media: Ethernet autoselect (100baseTX full-duplex)
  status: active
  
 
 Because bge1 is not UP, I wonder how you get link UP/DOWN events.
 Do you have some network script run by cron?

no scripts.
this port is shared with the ILO/IPMI, and back in March you fixed a 
problem
that it was hanging soon after it was initialized by the driver,
(r248226 - but I'm not sure if it was ever MFC'ed).
   
   It was MFCed.
   
Initialy I thought it could be caused by connections to it from other
hosts (either via the web, or ssh) so I killed them, but it didn't help.
without that patch the connection fails, and I don't see any DOWN/UP.
   
   Could you check how many number of interrupts you get from bge1?
   Ideally you shouldn't get any interrupts for bge1.
  
  it's not even mentioned :-)
  sf-04 vmstat -i
  interrupt  total   rate
  irq3: uart1  964  0
  irq4: uart06  0
  irq14: ata0   227354  0
  irq17: bge0  1021981  2
  irq21: ohci0  28  0
  irq22: ehci0   2  0
  irq23: atapci1293228  0
  cpu0:timer 383244076   1124
  cpu1:timer   2225144  6
  cpu2:timer   2056087  6
  cpu3:timer   2093943  6
  Total  391162813   1147
  
 
 Then the only way link UP/DOWN event could be generated for DOWN
 interface would be invocation of media status query
 (i.e. ifconfig -a) triggered by an external application.  Most
 drivers I touched check IFF_UP flag before poking media status
 register. However I'm not sure you're seeing this issue because you
 do not use any network script run by cron.
 Anyway, try attached patch and let me know whether it makes any
 difference.
 
   

 
is toggeling bge1 DOWN/UP every few hours, this port is being 
used by the ILO.
To check, I upgraded another identical host, and the same 
problem appears. 
   
   What is the last known working revision?
  
  I have no idea, but I have older versions, and ill start from the 
  oldets 
  (9.1-prerelease), but
  it will take time, since it takes hours till it happens.
  
 
 ok.


  
  
 
 --/04w6evG8XlLl3ft
 Content-Type: text/x-diff; charset=us-ascii
 Content-Disposition: attachment; filename=bge.media_sts.diff
 
 Index: sys/dev/bge/if_bge.c
 ===
 --- sys/dev/bge/if_bge.c  (revision 251021)
 +++ sys/dev/bge/if_bge.c  (working copy)
 @@ -5583,6 +5583,10 @@ bge_ifmedia_sts(struct ifnet *ifp, struct ifmediar
  
   BGE_LOCK(sc);
  
 + if ((ifp-if_flags  IFF_UP) == 0

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-05-28 Thread Daniel Braniss

 On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote:
   On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote:
hi, after upgrading to 9.1-stable, this particular hardware - SunFire 
X2200,
   
   Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output.
   
  
  bge0: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003 
  mem 
  0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6
  bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
  miibus2: MII bus on bge0
  brgphy0: BCM5714 1000BASE-T media interface PHY 1 on miibus2
  brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
  1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
  bge0: Ethernet address: 00:1b:24:5d:5b:bd
  bge1: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 0x009003 
  mem 
  0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6
  bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
  miibus3: MII bus on bge1
  brgphy1: BCM5714 1000BASE-T media interface PHY 1 on miibus3
  brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
  1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
  bge1: Ethernet address: 00:1b:24:5d:5b:be
  
  sf-10 ifconfig bge1
  bge1: flags=8802BROADCAST,SIMPLEX,MULTICAST metric 0 mtu 1500
  
  options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
  TE
  ether 00:1b:24:5d:5b:be
  nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
  media: Ethernet autoselect (100baseTX full-duplex)
  status: active
  
 
 Because bge1 is not UP, I wonder how you get link UP/DOWN events.
 Do you have some network script run by cron?

no scripts.
this port is shared with the ILO/IPMI, and back in March you fixed a problem
that it was hanging soon after it was initialized by the driver,
(r248226 - but I'm not sure if it was ever MFC'ed).
Initialy I thought it could be caused by connections to it from other
hosts (either via the web, or ssh) so I killed them, but it didn't help.
without that patch the connection fails, and I don't see any DOWN/UP.

 
is toggeling bge1 DOWN/UP every few hours, this port is being used by 
the ILO.
To check, I upgraded another identical host, and the same problem 
appears. 
   
   What is the last known working revision?
  
  I have no idea, but I have older versions, and ill start from the oldets 
  (9.1-prerelease), but
  it will take time, since it takes hours till it happens.
  
 
 ok.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-05-28 Thread Daniel Braniss

...
 There are ways you can speed up the replication time. I tend to flood a ser=
 ver with
 TCP while I've heard of it happening under UDP flood too.
 
 Here's a nice way to flood a server with TCP (assuming you have SSH access =
 to the
 system via keys):
 
 sh -c 'while :;do dd if=3D/dev/urandom of=3D/dev/stdout bs=3D1m count=3D102=
 4 | ssh HOST2KILL /sbin/md5; done'
 
 Run that about 16 times in separate screen sessions from various other host=
 s on your network,
 taking care to replace HOST2KILL with the hostname or IP of the box with =
 the SunFire X2200.
 
 Let that run for a while, and then when you think you've had a reset (if yo=
 u weren't standing
 there watching for one)=85
 
 grep 'bge.*DOWN' /var/log/messages
 
 On a system that has booted and stayed up-and-running, there shouldn't be a=
 ny messages like this:
 
 bge0: link state changed to DOWN
 
 When you actually get this message (if your experience is like ours), you'l=
 l be down for 90 seconds
 while the NIC resets.
 
 However, since you say you have some older 9.1 releases=85 I'd start by fir=
 st trying to bring the
 replication time of the problem down by using TCP and/or UDP floods. That w=
 ay you'll be able to
 test for resolution of the problem as you progress up to stable/9 (where th=
 e problem should be fixed
 by the aforementioned SVN revisions -- specific to your hardware).
...
 any ideas?
 
 
 Well, you say the connection is OK=85 so it doesn't sound like a full reset=
  as it
 was in our case (we have a different chipset).
 
 But I agree that a log full of those would be annoying.
 
 Try getting up to stable/9 in its current state (note: stable/8 also has al=
 l the
 aforementioned revisions too).
 --
 Devin

Hi Devin,
the kernel is pretty new, actually last Friday's, and the svn says
it's r250960.

the bg1 port is not UP, it's shared with the onboard BMC/ILO/IPMI thingy.
connecting to it via ssh gets me into it's ILO manager:
...
Sun(TM) Embedded Lights Out Manager

Copyright 2004-2006 Sun Microsystems, Inc. All rights reserved.

Version 3.23
...
and so typing
start AgentInfo/console
I can get to the 'serial' console.

cheers, and thanks,
danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-05-28 Thread Daniel Braniss

 On Tue, May 28, 2013 at 09:28:00AM +0300, Daniel Braniss wrote:
   On Mon, May 27, 2013 at 10:59:28AM +0300, Daniel Braniss wrote:
 On Fri, May 24, 2013 at 05:31:13PM +0300, Daniel Braniss wrote:
  hi, after upgrading to 9.1-stable, this particular hardware - 
  SunFire X2200,
 
 Show me dmesg(bge(4) and brgphy(4) only) and 'ifconfig bge1' output.
 

bge0: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 
0x009003 mem 
0xfdff-0xfdff,0xfdfe-0xfdfe irq 17 at device 4.0 on pci6
bge0: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
miibus2: MII bus on bge0
brgphy0: BCM5714 1000BASE-T media interface PHY 1 on miibus2
brgphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge0: Ethernet address: 00:1b:24:5d:5b:bd
bge1: Broadcom NetXtreme Gigabit Ethernet Controller, ASIC rev. 
0x009003 mem 
0xfdfc-0xfdfc,0xfdfb-0xfdfb irq 18 at device 4.1 on pci6
bge1: CHIP ID 0x9003; ASIC REV 0x09; CHIP REV 0x90; PCI-X 133 MHz
miibus3: MII bus on bge1
brgphy1: BCM5714 1000BASE-T media interface PHY 1 on miibus3
brgphy1:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
1000baseT-master, 1000baseT-FDX, 1000baseT-FDX-master, auto, auto-flow
bge1: Ethernet address: 00:1b:24:5d:5b:be

sf-10 ifconfig bge1
bge1: flags=8802BROADCAST,SIMPLEX,MULTICAST metric 0 mtu 1500

options=8009bRXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,VLAN_HWCSUM,LINKSTA
TE
ether 00:1b:24:5d:5b:be
nd6 options=21PERFORMNUD,AUTO_LINKLOCAL
media: Ethernet autoselect (100baseTX full-duplex)
status: active

   
   Because bge1 is not UP, I wonder how you get link UP/DOWN events.
   Do you have some network script run by cron?
  
  no scripts.
  this port is shared with the ILO/IPMI, and back in March you fixed a problem
  that it was hanging soon after it was initialized by the driver,
  (r248226 - but I'm not sure if it was ever MFC'ed).
 
 It was MFCed.
 
  Initialy I thought it could be caused by connections to it from other
  hosts (either via the web, or ssh) so I killed them, but it didn't help.
  without that patch the connection fails, and I don't see any DOWN/UP.
 
 Could you check how many number of interrupts you get from bge1?
 Ideally you shouldn't get any interrupts for bge1.

it's not even mentioned :-)
sf-04 vmstat -i
interrupt  total   rate
irq3: uart1  964  0
irq4: uart06  0
irq14: ata0   227354  0
irq17: bge0  1021981  2
irq21: ohci0  28  0
irq22: ehci0   2  0
irq23: atapci1293228  0
cpu0:timer 383244076   1124
cpu1:timer   2225144  6
cpu2:timer   2056087  6
cpu3:timer   2093943  6
Total  391162813   1147

 
  
   
  is toggeling bge1 DOWN/UP every few hours, this port is being used 
  by the ILO.
  To check, I upgraded another identical host, and the same problem 
  appears. 
 
 What is the last known working revision?

I have no idea, but I have older versions, and ill start from the 
oldets 
(9.1-prerelease), but
it will take time, since it takes hours till it happens.

   
   ok.
  
  


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

Re: SunFire X2200 ilo's bge1 DOWN/UP

2013-05-28 Thread Daniel Braniss


[...]
 1. r248226 in head was MFC'd to stable/9 as r248858.  Validation:
 
 http://svnweb.freebsd.org/base/stable/9/sys/dev/bge/if_bge.c?view=log
 
 So the answer: whether or not you have that MFC in stable/9 depends on
 what SVN rev your kernel is.

I do a svnsync then I convert to mercurial so from the svn logs I see that
the highest rev number is 250960.

[...]
 rant
 That piggybacking crap never should have been invented.  All it has
 done is cause problems for every OS I know of (including Windows) since
 its inception, and is also exactly why today almost all vendors I've
 seen provide a dedicated NIC and RJ45 port for the iLO/IPMI interface.
 It's admission the piggybacking method doesn't work.  And may it rot
 in hell for all I care, while simultaneously feeling very sorry for
 those who have to suffer/deal with it.
 
 This is just another reason why I've always been very picky about what
 hardware I'd buy for server deployments.  Vendors never actually
 disclose this crap until you've shelled out money for the hardware, by
 which point it's too late and you're suffering.  Really great model --
 for the pocketbook.  :/
 /rant

I couldn't agree more!

[...]

in the case of the SunFire X2200, it has 4 bge ports, the
2nd, bge1, is only used by the ilo, it's not enabled (UP'ed),
it doesn't have an interrupt assigned, it's, as far as I can tell,
just anoying to have the DOWN/UP messages - unless something more sinester
is lurking.

thanks,
danny


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org

1 2 3 >

1 - 100 of 260 matches

Mail list logo