Re: Proposal: Restore malloc(9) interface

2024-01-09 Thread Masanobu SAITOH
Hi.

On 2023/12/31 7:44, Taylor R Campbell wrote:
> I propose to deprecate the kmem(9) interface and go back to the
> malloc(9) interface.
> 
> 
> 1. The main difference is that the malloc(9) interface enables
>attribution of memory usage: how many bytes have been used for this
>purpose vs that purpose, partitioned by named malloc tags like
>M_MBUF or M_ACPI.  The conversion to the kmem(9) interface lost all
>this valuable diagnostic information.
> 
>I've personally spent probably dozens of hours over the last year
>or two puzzling over `vmstat -m' output to guess which subsystem
>might be leaking memory based on allocation sizes and which
>kmem-N pool looks fishy.  This is extremely frustrating and a
>huge waste of time to recover information we used to gather and
>report systematically.

I think adding malloc type again is good thing.

> 2. A secondary difference is reduced diffs from FreeBSD and OpenBSD
>drivers if we use malloc(9).
> 
> 3. A small difference is that kmem(9) distinguishes legacy allocation
>from interrupt context, kmem_intr_alloc/free, from front ends that
>forbid that, kmem_alloc/free.
> 
>I'm not sure this has provided much valuable diagnostic
>information, but it has provided a lot of frustrating crashes.  If
>we want the same frustrating crashes we could introduce an M_INTR
>flag which is mandatory when calling malloc from interrupt context.

Current kmen_(intr_)alloc()'s return address is usually aligned with
CACHE_LINE_SIZE. (Note that the smallest number is KMEM_ALIGN(== 8)
and the max is PAGE_SIZE.)
On kern_malloc(), the memory is first allocated with kmem_intr_alloc()
and put the malloc_header to the head. As a result, the return address
usually aligned with lower than CACHE_LINE_SIZE (at least on amd64).

I'm glad if the new malloc(9) API usually returns pointer aligned
with CACHE_LINE_SIZE or more.

> Note: I am NOT proposing any substantive changes to the implementation
> of the allocator -- I'm just proposing that we go back to the old
> _interface_, using the new pool-cache-based _implementation_, and to
> add lightweight per-CPU, per-tag usage counting to the malloc and free
> paths.
> 
> Nor am I suggesting changing anything about uvm_km(9), pool_cache(9),
> or anything else -- just changing kmem_alloc(N, KM_[NO]SLEEP) back to
> malloc(N, T, M_NOWAIT/WAITOK) and kmem_free(P, N) back to free(P, T),
> or possibly free(P, T, N) like OpenBSD does.
> 
> Thoughts?
> 
> 
> I asked for rationale for the kmem(9) interface last year, and none of
> the answers gave any compelling reason to have changed interfaces in
> the first place or to finish the conversion now:
> 
> https://mail-index.netbsd.org/tech-kern/2022/10/29/msg028498.html
> 
> As far as I can tell, we just spent umpteen hundreds of hours on
> engineering effort over the last decade to convert various drivers and
> subsystems from malloc(9) to kmem(9), in exchange for the loss of
> valuable diagnostic information about leaks, for increased cost to
> porting drivers, and for crashes when old subsystems newly converted
> to kmem(9) still allocate from interrupt context.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: proposed cpuctl modification

2023-03-10 Thread Masanobu SAITOH



On 2023/03/09 19:21, Robert Elz wrote:
> Date:Thu, 9 Mar 2023 16:21:53 +0900
> From:    Masanobu SAITOH 
> Message-ID:  <38ae66bd-1b37-c0ef-5a43-52e0c0a2a...@execsw.org>
> 
>   | Alder Lake-N? 4 E-cores share one microcode image. I have i7-12700 and it
>   | has 4 E-cores. Those 4 cores share one microcode image.
> 
> Mine is an i9-12900KS which has 8 of them (2 groups of 4).
> 
> Thanks for the confirmation, that is what looked to be happening, but
> I was just guessing from what I observed.  I just use intel processors
> (and others on occasion) I don't even pretend to understand them.
> 
>   | I think your idea is the best. Thank you for your commit.
> 
> No problem.  It was not a difficult change to make!
> 
>   | Another solutions is that the kernel returns 0 instead of EEXIST if the
>   | version number is the same as the running microcode's version.
> 
> Yes, I considered that one as well, but as you indicate, doing that just
> loses information, and gains nothing - the same number of sys calls (ioctl's)
> would be performed, all that would be saved is the check to see if the
> error is EEXIST when that happens (ie: peanuts).
> 
> kre
> 
> ps: do your E-cores ever just turn themselves off?   On mine, occasionally,
> and for no reason I can fathom, the BIOS reports there are none of them.
> (and NetBSD doesn't see them either).

I've never seen such problem on my machine.

> They come back after a power cycle.
> This is probably a BIOS issue, but ?

It might be...

> 

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Late MCU, was: proposed cpuctl modification

2023-03-09 Thread Masanobu SAITOH
Hello, all.

On 2023/03/05 11:01, Robert Elz wrote:
> Date:Fri, 3 Mar 2023 21:46:22 -0800
> From:"William 'Cryo' Coldwell" 
> Message-ID:  <7ce92f54-3746-4106-bd63-16e5e4cbc...@netbsd.org>
> 
>   | To throw some extra fun mixture into this discussion:  As of 5.19
>   | Linux will no longer allow late microcode loading:
>   |
>   | ref: https://www.phoronix.com/news/Linux-5.19-Late-ucode-Loading

We _SHOULD_ support the early microcode update function in the boot loader
or the very early stage of the kernel initialization.

One of the big reason is that a new microcode may disable a function.
If the kerne use the function, the kernel may work incorrectly.
Such type of change usually can be identified by the correcponding
CPUID bit. It may be possible to reconfigure some kernel functions by
identifying such change, but it may not possible, and it's simpler to use
the early micorocode update function than to write such complicated code.

I tried to implement the early microcode loading stuff in our boot loader
but I couldn't finish the work. NetBSD developers may read my previos mail
I wrote 2018/01/19 subjected with "x86 microcode early update"...

> I suspect this message was not really aimed at me, but at tech-kern
> my only specific knowledge about any of this is that I dislike spurious
> (seeming) error messages during the boot sequence - and so intend to
> make them go away (as there has been no objection, I will commit that
> change soon).
> 
> But:
>   | Any thoughts on the best approach to this? (boot? EFI?)
> 
> Best?  No idea.   But one approach might be to only start cpu0 in the
> kernel during bootstrap, and then have the rest started by an rc.d
> script, which could update microcode on them (if needed) first.
> 
> By moving cpuctl from /usr/sbin to /sbin and placing the firmware in
> /libdata instead of /usr/pkg that could be made to happen very early
> in the boot sequence (perhaps even before the fsck of / and rw mount).
> 
> I'm not sure how that would work wrt other things that have to happen
> (like arranging interrupt routing) - as clearly the microcode needs to
> be read from the filesystem (whether that's by the kernel, as now, or
> by cpuctl, then passing a memory image of it to the kernel doesn't seem
> as if it would make a lot of difference).

We can take two types of early loading.

 a) Update the microcode in the boot loader. Additional code is not required
in the kerne.

 b) Load the micorocode image in the boot loader and copy it to a memory
(buffer). Pass the pointer as a new boot argument. The kerne updates
all CPU's microcode as soon as possible.

Please someone(TM) try to implement..



> Longer term this could be coupled with more userland control of the
> cpu configuration - all being done at once as part of the startup
> sequence, but from userland code.
> 
> Whatever the issues really are (according to the Linux people) with
> doing the microcode update as we now do it, even assuming that is more
> or less the same as they do it, this should be safe, as code running
> on a CPU has to do it, I don't see it can make any real difference
> whether than is bios code, boot code, or early kernel code.
> 
> I don't much like the idea of extra magic blobs, or more hackery in
> the boot code however.
> 
> kre
> 

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: proposed cpuctl modification

2023-03-08 Thread Masanobu SAITOH
Hello.

I'm sorry for the late of this reply. I was busy for a few weeks.

On 2023/03/03 6:59, Robert Elz wrote:
> This message is about a proposed userland modification, but it seems
> more kernelish to me, hence I am asking here on tech-kern, rather than
> on tech-userlevel
> 
> When my system boots (intel cpu) it runs the intel-microcode (from pkgsrc)
> microcode update.
> 
> Since it is an Intel cpu, that means running the ioctl to perform the
> microcode update on every core.   The cpu has hyperthreading, and while
> I sometimes am inclined to turn that off, I haven't so far.
> 
> Naturally, as they're really the same core, while the base core ucode
> update works fine, the hyperthread companion always fails, as there the
> microcode is already up to the expected version (surprise surprise, since
> we just did that).I guess we could look and skip the update on
> the hyperthread companion cores (pseudo-cores) but that's not what I am
> proposing, partly because I'd have to work out how to do that, but also
> because that by itself would only solve half the problem.
> 
> My processor also has 8 cores, with no hyperthreading, where it looks as
> if internally, there's just 2 sets of ucode, one for the first 4, and one
> for the second 4.

Alder Lake-N? 4 E-cores share one microcode image. I have i7-12700 and it
has 4 E-cores. Those 4 cores share one microcode image.

>   Updates of the other 3 of each group find the ucode
> version already installed, and error out.
> 
> So, what I am proposing is to have cpuctl simply ignore EEXIST errors from
> the update the microcode" ioctl. unless the -v option (which already exists)
> is given.
> 
> Note that this isn't fixing any real problem - everything works as it is now,
> and everything (from what I can tell) gets updated correctly.   The issue is
> more cosmetic - the ucode update currently issues error messages for more than
> half of my (apparent) cores, which looks ugly during the boot process, and
> could lead to people wondering if there is a problem.

I think your idea is the best. Thank you for your commit.

Another solutions is that the kernel returns 0 instead of EEXIST if the
version number is the same as the running microcode's version. I think it's not
good because it hides the information that the running micocode is the latest.

> When I first installed the package from pgksrc to make this happen (a while
> ago now) I immediately disabled it (in rc.conf) so it ran no more, as the
> microcode version offered by the package was what my CPU came equipped with,
> and every core complained about the version not actually being updated.
> 
> pkgsrc was updated to a newer version, with newer microcode, I installed
> that, and enabled it again - and now I'm stuck with the ugly errors.
> 
> Perhaps someone has a better solution than this, perhaps checking which
> microcode version is installed on each core, and not attemmpting to install
> the update if it is the same version, but that's beyond my knowledge.
> It also seems likely to me that it is simpler to just install anyway, and
> ignore the "already there" error if it happens.   Probably.

I agree with you. Since we do not know what kind of sharing scheme future
CPUs will have, I think simple is the best.

 Thanks.


> My proposed patch is appended - it builds, installs, and seems to work
> just fine (not too much of a surprise, it is a trivial change).
> 
> Opinions?
> 
> kre
> 
> Index: cpuctl.8
> ===
> RCS file: /cvsroot/src/usr.sbin/cpuctl/cpuctl.8,v
> retrieving revision 1.20
> diff -u -r1.20 cpuctl.8
> --- cpuctl.8  17 May 2019 23:51:35 -  1.20
> +++ cpuctl.8  2 Mar 2023 21:36:06 -
> @@ -72,6 +72,10 @@
>  .Op Ar file
>  .Xc
>  This applies the microcode patch to CPUs.
> +Unless
> +.Fl v
> +was given, errors indicating that the microcode
> +already exists on the CPU in question are ignored.
>  If
>  .Ar cpu
>  is not specified or \-1, all CPUs are updated.
> Index: cpuctl.c
> ===
> RCS file: /cvsroot/src/usr.sbin/cpuctl/cpuctl.c,v
> retrieving revision 1.32
> diff -u -r1.32 cpuctl.c
> --- cpuctl.c  1 Feb 2022 10:45:02 -   1.32
> +++ cpuctl.c  2 Mar 2023 21:36:06 -
> @@ -247,7 +247,7 @@
>   cpuset_destroy(cpuset);
>   }
>   error = ioctl(fd, IOC_CPU_UCODE_APPLY, &uc);
> - if (error < 0) {
> + if (error < 0 && (verbose || errno != EEXIST)) {
>   if (uc.fwname[0])
>   err(EXIT_FAILURE, "%s", uc.fwname);
>   else
> 
> 

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Lock of NetBSD-current with ifconfig down / up

2022-09-19 Thread Masanobu SAITOH
Hi.

On 2022/09/18 2:55, John Klos wrote:
> Hi,
> 
> Here's a nice issue :)
> 
> Plug in ure* USB ethernet to amd64 machine running NetBSD-current (9.99.99, 
> 22-August-2022):
> 
> [ 1791670.446266] ure0 at uhub8 port 4
> [ 1791670.446266] ure0: Realtek (0x0bda) USB 10/100/1000 LAN (0x8153), rev 
> 2.10/30.00, addr 6
> [ 1791670.446266] ure0: RTL8153 ver 5c30
> [ 1791670.566267] rgephy0 at ure0 phy 0: RTL8251 1000BASE-T media interface, 
> rev. 0
> [ 1791670.586267] rgephy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 
> 1000baseT-FDX, auto
> [ 1791670.586267] ure0: Ethernet address a0:ce:c8:e7:88:5f
> [ 1791673.256299] ugen1 at uhub8 port 5
> [ 1791673.256299] ugen1: VIA Labs, Inc. (0x2109) PD3.0 USB-C Device (0x), 
> rev 2.01/0.01, addr 7
> 
> ifconfig ure0 up
> 
> No problem:
> 
> ure0: flags=0x8943 mtu 1500
> capabilities=0x3ff00
> capabilities=0x3ff00
> capabilities=0x3ff00
> enabled=0
> ec_capabilities=0x1
> ec_enabled=0
> address: a0:ce:c8:e7:88:5f
> media: Ethernet autoselect (100baseTX full-duplex)
> status: active
> inet6 fe80::a2ce:c8ff:fee7:885f%ure0/64 flags 0 scopeid 0x9
> 
> ifconfig ure0 down
> 
> Locks the machine. I couldn't get more information because it's 3000 miles 
> away. There's nothing in dmesg because the machine was power cycled.

One of my environment:

# ifconfig ure0 down
[  63.2217114] panic: kernel diagnostic assertion "mii_locked(mii)" failed: 
file "../../../../dev/mii/mii.c", line 381
[  63.2332664] cpu0: Begin traceback...
[  63.2332664] vpanic() at netbsd:vpanic+0x183
[  63.2418447] kern_assert() at netbsd:kern_assert+0x4b
[  63.2418447] mii_down() at netbsd:mii_down+0x9a
[  63.2521575] usbnet_if_stop() at netbsd:usbnet_if_stop+0x134
[  63.2521575] ether_ioctl_reinit() at netbsd:ether_ioctl_reinit+0x78
[  63.2623342] usbnet_if_ioctl() at netbsd:usbnet_if_ioctl+0x80
[  63.2717104] doifioctl() at netbsd:doifioctl+0x322
[  63.2717104] sys_ioctl() at netbsd:sys_ioctl+0x56d
[  63.2817903] syscall() at netbsd:syscall+0x196
[  63.2817903] --- syscall (number 54) ---
[  63.2817903] netbsd:syscall+0x196:
[  63.2930753] cpu0: End traceback...








> Initially I imagined it might be due to the ure* driver, but then it happened 
> locally.
> 
> On an amd64 system running 9.99.98 from 16-July-2022, I ran "ifconfig re1 
> down" and the machine locked - no ICMP, nothing for SIGINFO, no response to 
> keyboard cnmagic.
> 
> It doesn't appear to be hardware, but here's this just because.
> 
> [ 1.044097] re1 at pci2 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit 
> Ethernet (rev. 0x0c)
> [ 1.044097] re1: interrupting at msix2 vec 0
> [ 1.044097] re1: RTL8168G/8111G (0x4c00)
> [ 1.044097] re1: Ethernet address 4c:cc:6a:01:a5:e0
> [ 1.044097] re1: using 256 tx descriptors
> [ 1.044097] rgephy1 at re1 phy 7: RTL8251 1000BASE-T media interface, 
> rev. 0
> 
> I've ordered some PS/2 keyboards, because I take it that's the only way to 
> reliably get in to the kernel debugger on amd64, unless someone knows a trick 
> to make USB keyboards usable.
> 
> send-pr?
> 
> Thanks,
> John

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: panic in sysmon_envsys_unregister

2022-09-14 Thread Masanobu SAITOH


On 2022/09/14 19:01, Edgar Fuß wrote:
>> I need to build a new install image (since I have no discs).
> I applied your fix to -8 and the panic disappeared.
> Thanks for the quick fix. Maybe it should be pulled up?

I'll send pullup requests.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: panic in sysmon_envsys_unregister

2022-09-13 Thread Masanobu SAITOH
This should be fixed by mfii.c rev. 1.26. Please update it and retry.

On 2022/09/14 6:14, Edgar Fuß wrote:
> I get a panic on shutdown:
> 
> netbsd:sysmon_envsys_unregister+0x128:cmpq0(%rdx),%r12
> sysmon_envsys_unregister
> mfii_detach
> config_detach
> config_detach_all
> cpu_reboot
> kern_reboot
> sys_reboot
> syscall
> ds4da0
> es0
> fs1
> gsc632
> rdi   818f0510sme_global_mtx
> rsi   
> rbp   9008514e4da0
> rbx   90003d04c000
> rdx   0
> rcx   d9e26f07b700
> rax   0
> r80
> r90
> r10   0
> r11   0
> r12   d9e26d5b1c40
> r13   d9e26d7a5a00
> r14   1
> r15   81802c60mfii_ca
> rip   80a8c41esysmon_envsys_unregister+0x128
> cs8
> rflags10246
> rsp   9008514e4d90
> ss10
> 
> This is -current from around yesterday.
> I guess the problem is related to
>   mfii0: autoconfiguration error: unable to register with sysmon (rv = 86)
>   mfii0: autoconfiguration error: unable to create sensors
> So probably someone is trying to un-resgister something not registered.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: mfii(4) and Dell PERC

2022-08-08 Thread Masanobu SAITOH
Hi.

On 2022/08/08 19:41, Edgar Fuß wrote:
> Thanks for your answers.
> 
>> Some people reported that kern/56669 (and perhaps kern/55192) still exist 
>> on some systems :-(
> Hm.
> 
>> bioctl mfi(i)X show
> Ah, thanks.
> What do I do in case a drive fails? Will adding a hot spare automagically 
> start a reconstruction?

It will do.
(I've never done "bioctl mfii0 hotspare add". I usually add a hotspare
via BIOS setup menu.)

>> If your system has other number, please let me know.
> I don't have such a system yet. I wanted to find out about NetBSD 
> compatibility 
> before buying one.

For the BBU monitoring, it's easy to add a new device support.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: mfii(4) and Dell PERC

2022-08-08 Thread Masanobu SAITOH



On 2022/08/08 18:37, Edgar Fuß wrote:
> I'm unsure whether this is the right list, is port-amd64 more appropriate?
> 
> Does anyone use a Dell PERC H730P or similar RAID controller in RAID mode?

It seems PERC H730P's device ID is 0x005d.

https://pci-ids.ucw.cz/read/PC/1000/005d

It's the Invader series. I have Fujitsu PRAID EP400i. It's also Invader.
It works with one of my machines, but it doesn't guarantee that
your machine with PERC H730P works. Some people reported that kern/56669
(and perhaps kern/55192) still exist on some systems :-(


> mfii(4) says all configuration is done via the controller's BIOS.
> Does that mean I need to shut down in case a drive fails an I need to rebuild?
> 
> Can I monitor the RAID state?

bioctl mfi(i)X show

> Can I monitor the BBU Battery health?

Maybe. If your machine's battery_type is 1, 2, 5 or 6, you can do by envstat(8).

#define MFI_BBU_TYPE_IBBU   1
#define MFI_BBU_TYPE_BBU2
#define MFI_BBU_TYPE_IBBU09 5
#define MFI_BBU_TYPE_CVPM02 6

If your system has other number, please let me know.

> Thanks in advance.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: mfii hanging on boot

2022-06-27 Thread Masanobu SAITOH


On 2022/06/23 18:33, Edgar Fuß wrote:
>> I committed the change yesterday.
> I don't get what the #if defined(__LP64__) && 0 is for.

It's just to make the same line as of bus_space_write_8().
I don't know why it's "&& 0"ed.

---
static void
mfii_start(struct mfii_softc *sc, struct mfii_ccb *ccb)
{
#if defined(__LP64__) && 0
u_long *r = (u_long *)&ccb->ccb_req;
#else
uint32_t *r = (uint32_t *)&ccb->ccb_req;
#endif

bus_dmamap_sync(sc->sc_dmat, MFII_DMA_MAP(sc->sc_requests),
ccb->ccb_request_offset, MFII_REQUEST_SIZE,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);

#if defined(__LP64__) && 0
bus_space_write_8(sc->sc_iot, sc->sc_ioh, MFI_IQPL, *r);
#else
mutex_enter(&sc->sc_post_mtx);
bus_space_write_4(sc->sc_iot, sc->sc_ioh, MFI_IQPL, r[0]);
bus_space_barrier(sc->sc_iot, sc->sc_ioh,
MFI_IQPL, 8, BUS_SPACE_BARRIER_WRITE);

bus_space_write_4(sc->sc_iot, sc->sc_ioh, MFI_IQPH, r[1]);
bus_space_barrier(sc->sc_iot, sc->sc_ioh,
MFI_IQPH, 8, BUS_SPACE_BARRIER_WRITE);
mutex_exit(&sc->sc_post_mtx);
#endif
}
---

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: wm workqueue and agr device

2022-06-22 Thread Masanobu SAITOH
Hi.

On 2022/06/02 19:08, BERTRAND Joël wrote:
>   Hello,
> 
>   In a recent mail (if_wm.c workqueue), I have seen that it is possible
> to activate workqueue on all ethernet devices contained in my server:
> 
> [ 1.002997] wm0 at pci2 dev 0 function 0: I350 Gigabit Network
> Connection (rev. 0x01)
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 0 affinity to 0
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 1 affinity to 1
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 2 affinity to 2
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 3 affinity to 3
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 4 affinity to 4
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 5 affinity to 5
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 6 affinity to 6
> [ 1.002997] wm0: for TX and RX interrupting at msix0 vec 7 affinity to 7
> [ 1.002997] wm0: for LINK interrupting at msix0 vec 8
> [ 1.002997] wm0: PCI-Express bus
> [ 1.002997] wm0: 8192 words (16 address bits) SPI EEPROM, version
> 1.63, Image Unique ID 0ffd
> [ 1.002997] wm0: Ethernet address b4:96:91:92:77:6e
> [ 1.002997] wm0: Copper
> [ 1.002997] wm0:
> 0x40674040
> [ 1.002997] ihphy0 at wm0 phy 1: I350 10/100/1000 media interface,
> rev. 1
> [ 1.002997] wm1 at pci2 dev 0 function 1: I350 Gigabit Network
> Connection (rev. 0x01)
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 0 affinity to 0
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 1 affinity to 1
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 2 affinity to 2
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 3 affinity to 3
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 4 affinity to 4
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 5 affinity to 5
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 6 affinity to 6
> [ 1.002997] wm1: for TX and RX interrupting at msix1 vec 7 affinity to 7
> [ 1.002997] wm1: for LINK interrupting at msix1 vec 8
> [ 1.002997] wm1: PCI-Express bus
> [ 1.002997] wm1: 8192 words (16 address bits) SPI EEPROM, version
> 1.63, Image Unique ID 0ffd
> [ 1.002997] wm1: Ethernet address b4:96:91:92:77:6f
> [ 1.002997] wm1: Copper
> [ 1.002997] wm1:
> 0x40474040
> [ 1.002997] ihphy1 at wm1 phy 1: I350 10/100/1000 media interface,
> rev. 1
> [ 1.002997] wm2 at pci0 dev 25 function 0: I218 V Ethernet
> Connection (rev. 0x00)
> [ 1.002997] wm2: interrupting at msi4 vec 0
> [ 1.002997] wm2: PCI-Express bus
> [ 1.002997] wm2: 2048 words FLASH, version 0.1.4
> [ 1.002997] wm2: Ethernet address 08:62:66:47:63:99
> [ 1.002997] wm2: 0x6a4080
> [ 1.002997] ihphy2 at wm2 phy 2: i217 10/100/1000 media interface,
> rev. 5
> [ 1.002997] wm3 at pci6 dev 0 function 0: Intel i82574L (rev. 0x00)
> [ 1.002997] wm3: for TX and RX interrupting at msix6 vec 0 affinity to 1
> [ 1.002997] wm3: for TX and RX interrupting at msix6 vec 1 affinity to 2
> [ 1.002997] wm3: for LINK interrupting at msix6 vec 2
> [ 1.002997] wm3: PCI-Express bus
> [ 1.002997] wm3: 2048 words FLASH, version 1.8.0, Image Unique ID
> 
> [ 1.002997] wm3: ASPM L0s and L1 are disabled to workaround the errata.
> [ 1.002997] wm3: Ethernet address 68:05:ca:02:b2:59
> [ 1.002997] wm3: 0x224080
> [ 1.002997] makphy0 at wm3 phy 1: Marvell 88E1149 Gigabit PHY, rev. 1
> [ 1.002997] wm4 at pci7 dev 0 function 0: Intel i82574L (rev. 0x00)
> [ 1.002997] wm4: for TX and RX interrupting at msix7 vec 0 affinity to 1
> [ 1.002997] wm4: for TX and RX interrupting at msix7 vec 1 affinity to 2
> [ 1.002997] wm4: for LINK interrupting at msix7 vec 2
> [ 1.002997] wm4: PCI-Express bus
> [ 1.002997] wm4: 2048 words FLASH, version 1.8.0, Image Unique ID
> 
> [ 1.002997] wm4: ASPM L0s and L1 are disabled to workaround the errata.
> [ 1.002997] wm4: Ethernet address 68:05:ca:44:cf:30
> [ 1.002997] wm4: 0x224080
> [ 1.002997] makphy1 at wm4 phy 1: Marvell 88E1149 Gigabit PHY, rev. 1
> 
> wm0 and wm1 are bridged (bridge0) and connected to two NAS.
> wm2 is connected to WAN
> wm3 and wm4 are aggregated (agr0) and connected to LAN.
> 
>   I have tried to set in /etc/sysctl.conf:
> 
> hw.wm0.txrx_workqueue=1
> hw.wm1.txrx_workqueue=1
> hw.wm2.txrx_workqueue=1
> hw.wm3.txrx_workqueue=1
> hw.wm4.txrx_workqueue=1
> 
> and system quickly hangs (no panic, no internal debugger, no serial
> console available, only a hard lock). Before lock, agr0 was very very slow.

agr(4) uses softint. Scheduling priority of softint is higher
than workqueue. It might cause the problem.

>   If I set:
> 
> hw.wm0.txrx_workqueue=1
> hw.wm1.txrx_workqueue=1
> hw.wm2.txrx_workqueue=1
> hw.wm3.txrx_workqueue=0
> hw.wm4.txrx_workqueue=0
> 
> system seems to run as expected. Thus 

Re: mfii hanging on boot

2022-06-22 Thread Masanobu SAITOH


On 2022/06/22 4:55, c...@sdf.org wrote:
> Hello Masa -

Hi, Stephen.

> I was able to apply your patch to the 9.2 tree and built a kernel and
> pxebooted it.  Unfortunately the status now looks like this:
> 
> mfii0 at pci5 dev 0 function 0mfii0: firmware fault
> 
> https://sdf.org/eu   (full boot sequence)
> 
> The machine is 8412km away in a data center and I don't have direct
> access to it so I cannot unplug the battery, move the board to another
> slot, et cetera.

I saw almost the same situation. To recover from the error, I had to power-down
the machine, unplug the battery, keep a few minutes, plug the battery and
power-up again.

I committed the change yesterday. I guess that it fixes kern/55192 and 
kern/56669.

 Thanks.


> I can, however, power cycle/reset the machine and access the MegaCLI to
> look at the two raids.  Both appear to be intact and the controller
> reports no errors, no bad disks or any other issues.
> 
> I'm hessitant to make any changes physically to the machine until I'm
> absolutely sure that this is a recovery situation.
> 
> Thank you for your help on this, I appreciate it!
> 
> Stephen
> 
>>  I have 9272-8i and it have been working well on netbsd-9 since several
>> weeks ago.
>>
>>> mfii0 at pci1 dev 0 function 0: "LSI MegaRAID SAS 9272-8i", firmware
>>> 23.34.0-0019, 512MB cache
>>> mfii0: interrupting at ioapic0 pin 16
>>
>> Could you all who have problem with mfii(4) test with the following diff?
>>
>> Index: mfii.c
>> ===
>> RCS file: /cvsroot/src/sys/dev/pci/mfii.c,v
>> retrieving revision 1.15
>> diff -u -p -r1.15 mfii.c
>> --- mfii.c   13 May 2022 10:44:38 -  1.15
>> +++ mfii.c   21 Jun 2022 07:46:51 -
>> @@ -1840,7 +1840,11 @@ mfii_load_mfa(struct mfii_softc *sc, str
>>  static void
>>  mfii_start(struct mfii_softc *sc, struct mfii_ccb *ccb)
>>  {
>> +#if defined(__LP64__) && 0
>>  u_long *r = (u_long *)&ccb->ccb_req;
>> +#else
>> +uint32_t *r = (uint32_t *)&ccb->ccb_req;
>> +#endif
>>
>>  bus_dmamap_sync(sc->sc_dmat, MFII_DMA_MAP(sc->sc_requests),
>>  ccb->ccb_request_offset, MFII_REQUEST_SIZE,
>>
>>
>>
>>
>> --
>> ---
>> SAITOH Masanobu (msai...@execsw.org
>>  msai...@netbsd.org)
>>
> 
> 

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: mfii hanging on boot

2022-06-21 Thread Masanobu SAITOH
 I have 9272-8i and it have been working well on netbsd-9 since several
weeks ago.

> mfii0 at pci1 dev 0 function 0: "LSI MegaRAID SAS 9272-8i", firmware 
> 23.34.0-0019, 512MB cache
> mfii0: interrupting at ioapic0 pin 16

Could you all who have problem with mfii(4) test with the following diff?

Index: mfii.c
===
RCS file: /cvsroot/src/sys/dev/pci/mfii.c,v
retrieving revision 1.15
diff -u -p -r1.15 mfii.c
--- mfii.c  13 May 2022 10:44:38 -  1.15
+++ mfii.c  21 Jun 2022 07:46:51 -
@@ -1840,7 +1840,11 @@ mfii_load_mfa(struct mfii_softc *sc, str
 static void
 mfii_start(struct mfii_softc *sc, struct mfii_ccb *ccb)
 {
+#if defined(__LP64__) && 0
u_long *r = (u_long *)&ccb->ccb_req;
+#else
+   uint32_t *r = (uint32_t *)&ccb->ccb_req;
+#endif

bus_dmamap_sync(sc->sc_dmat, MFII_DMA_MAP(sc->sc_requests),
ccb->ccb_request_offset, MFII_REQUEST_SIZE,




-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: TSC improvement

2020-06-10 Thread Masanobu SAITOH
Hi.

On 2020/06/10 4:37, Joerg Sonnenberger wrote:
> On Tue, Jun 09, 2020 at 05:16:27PM +, Taylor R Campbell wrote:
>> It's great to see improvements to our calibration of the TSC (and I
>> tend to agree that cpu_counter should be serializing, so that, e.g.,
>> cpu_counter(); ...; cpu_counter() reliably measures time taken in the
>> ellipsis).
> 
> I'm pretty sure we want to have both variants.
> 
> Joerg

You mean we should have both cpu_counter() and cpu_counter_serializing()
(or cpu_counter() and cpu_counter_unserializing())? If so, why?

I have no strong opinion about this. When I wrote that change, I added
cpu_counter_serializing(). We already have tsc_get_timecount(), cpu_counter()
and cpu_counter32() as MI API. In addition, x86 also has rdtsc(and will
have some serializing functions). There are so many similar functions,
so I thought it would be good to avoid code duplication and to be simple.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: [PATCH] Re: EFI memory map

2020-01-06 Thread Masanobu SAITOH
On 2020/01/06 17:06, Emmanuel Dreyfus wrote:
> On Thu, Jan 02, 2020 at 03:55:31PM +, Emmanuel Dreyfus wrote:
>> And indeed, studying the crash in ddb shows it happens when
>> accessing a physical address that is excluded by x86_fake_clusters() 
>> but included by EFI memory map.
> 
> I think the problem lies in sys/arch/x86/x86/x86_machdep.c:init_x86_vm()
> The function loops on the memory segments provided by EFI/BIOS and 
> call x86_load_region() on the ones that can be used.
> 
> The function considers the case of a segment that spans the
> addresses used by the kernel and splits the segments in that case,
> but it does not consider the cases where the segment is included in
> kernel, or if there is a leading or trailing part that overlaps the
> kernel. 
> 
> The bug happens because a segment overlapping the kernel is used
> by x86_load_region(). 
> 
> Attached patch adds code for that three extra cases, and it makes
> the bug vanish. Is it a reasonable fix? Or did I miss something?
> 

The output:
[   1.000] MEMMAP: UEFI MEMORY MAP (17 ENTRIES):
[   1.000] MEMMAP: 0x-0x0003efff
[   1.000]  size=0x0003f000, type=1(Memory)
[   1.000] MEMMAP: 0x0003f000-0x0003
[   1.000]  size=0x1000, type=2(Reserved)
[   1.000] MEMMAP: 0x0004-0x0009
[   1.000]  size=0x0006, type=1(Memory)
[   1.000] MEMMAP: 0x000a-0x000f
[   1.000]  size=0x0006, type=2(Reserved)
[   1.000] MEMMAP: 0x0010-0x7769efff
[   1.000]  size=0x7759f000, type=1(Memory)
[   1.000] MEMMAP: 0x7769f000-0x776aefff
[   1.000]  size=0x0001, type=2(Reserved)
[   1.000] MEMMAP: 0x776af000-0x7d619fff
[   1.000]  size=0x05f6b000, type=1(Memory)
[   1.000] MEMMAP: 0x7d61a000-0x7e0f6fff
[   1.000]  size=0x00add000, type=2(Reserved)
[   1.000] MEMMAP: 0x7e0f7000-0x7e12dfff
[   1.000]  size=0x00037000, type=1(Memory)
[   1.000] MEMMAP: 0x7e12e000-0x7e1e2fff
[   1.000]  size=0x000b5000, type=4(NVS)
[   1.000] MEMMAP: 0x7e1e3000-0x7f23
[   1.000]  size=0x0105d000, type=2(Reserved)
[   1.000] MEMMAP: 0x7f24-0x7f7f
[   1.000]  size=0x005c, type=1(Memory)
[   1.000] MEMMAP: 0x7f80-0x7fff
[   1.000]  size=0x0080, type=2(Reserved)
[   1.000] MEMMAP: 0xe000-0xefff
[   1.000]  size=0x1000, type=2(Reserved)
[   1.000] MEMMAP: 0xfd00-0xfe7f
[   1.000]  size=0x0180, type=2(Reserved)
[   1.000] MEMMAP: 0xff00-0x
[   1.000]  size=0x0100, type=2(Reserved)
[   1.000] MEMMAP: 0x0001-0x00027fff
[   1.000]  size=0x00018000, type=1(Memory)
[   1.000] segment 0 - 3f000
[   1.000] loading freelist 5 0x8000-0x3f000 (0x8-0x3f)
[   1.000] segment 4 - a
[   1.000] loading freelist 5 0x4-0xa (0x40-0xa0)
[   1.000] segment 10 - 7769f000
[   1.000] split kernel overlapping to 10 - 20 and 1d4e000 - 
7769f000
[   1.000] loading freelist 5 0x10-0x20 (0x100-0x200)
[   1.000] loading freelist 4 0x1d4e000-0x4000 (0x1d4e-0x4)
[   1.000] loading freelist 3 0x4000-0x7769f000 (0x4-0x7769f)
[   1.000] segment 776af000 - 7d61a000
[   1.000] loading freelist 3 0x776af000-0x7d61a000 (0x776af-0x7d61a)
[   1.000] segment 7e0f7000 - 7e12e000
[   1.000] loading freelist 3 0x7e0f7000-0x7e12e000 (0x7e0f7-0x7e12e)
[   1.000] segment 7f24 - 7f80
[   1.000] loading freelist 3 0x7f24-0x7f80 (0x7f240-0x7f800)
[   1.000] segment 1 - 28000
[   1.000] loading default 0x1-0x28000 (0x10-0x28)
[   1.000] pool redzone disabled for 'pdppl'
[   1.000] pool redzone disabled for 'kmem-4096'
[   1.000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 
2004, 2005,
[   1.000] 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 
2016, 2017,
[   1.000] 2018, 2019, 2020 The NetBSD Foundation, Inc.  All rights 
reserved.
[   1.000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[   1.000] The Regents of the University of California.  All rights 
reserved.

[   1.000] NetBSD 9.99.34 (GENERIC) #105: Tue Jan  7 15:21:44 JST 2020
[   1.000]  
msai...@foo.execsw.org:/disk/sources/NetBSD-current/src/sys/arch/amd64/compile/GENERIC
[   1.000] total memory = 8155 MB
[   1.000] avail memory = 7890 MB
[   1.000] pool redzone disabled for 'buf4k'
[   1.000] pool redzone disabled for 'buf64k'
[   1.000] cpu_rng: RDSEED
[   1.000] timec

Re: Why NetBSD x86's bus_space_barrier does not use [sml]fence?

2019-12-01 Thread Masanobu SAITOH
On 2019/12/02 15:48, s ymgch wrote:
> I'm re-sending the last mail because I forgot to set ML addresses to
> destination e-mail address.
> 
> 2019年12月2日(月) 12:53 s ymgch :
>>
>> 2019年12月1日(日) 2:08 Taylor R Campbell :
> Are you using BUS_SPACE_MAP_PREFETCHABLE?

 No, the driver does not use it.
>>>
>>> And it doesn't use mtrrs or BUS_SPACE_MAP_CACHEABLE either, right?
>>
>> I had a misunderstanding. The driver uses BUS_SPACE_MAP_PREFETCHABLE
>> specified in PCI configuration space.

This is well known pitfall:

http://mail-index.netbsd.org/tech-kern/2017/03/22/msg021678.html

I think we should change pci_mapreg_map().


>>>
>>> Just to be sure, can you run
>>>  to confirm there are no
>>> mtrrs on the system affecting the bus space region in question?
>>
>> # ./mtrr
>> MTRR (96)
>> 
>>
>> 00:0 -  type wb flags valid, fixed
>> 01:1 -1 type wb flags valid, fixed
>> 02:2 -2 type wb flags valid, fixed
>> 03:3 -3 type wb flags valid, fixed
>> 04:4 -4 type wb flags valid, fixed
>> 05:5 -5 type wb flags valid, fixed
>> 06:6 -6 type wb flags valid, fixed
>> 07:7 -7 type wb flags valid, fixed
>> 08:8 -83fff type wb flags valid, fixed
>> 09:84000 -87fff type wb flags valid, fixed
>> 10:88000 -8bfff type wb flags valid, fixed
>> 11:8c000 -8 type wb flags valid, fixed
>> 12:9 -93fff type wb flags valid, fixed
>> 13:94000 -97fff type wb flags valid, fixed
>> 14:98000 -9bfff type wb flags valid, fixed
>> 15:9c000 -9 type wb flags valid, fixed
>> 16:a -a3fff type uc flags valid, fixed
>> 17:a4000 -a7fff type uc flags valid, fixed
>> 18:a8000 -abfff type uc flags valid, fixed
>> 19:ac000 -a type uc flags valid, fixed
>> 20:b -b3fff type uc flags valid, fixed
>> 21:b4000 -b7fff type uc flags valid, fixed
>> 22:b8000 -bbfff type uc flags valid, fixed
>> 23:bc000 -b type uc flags valid, fixed
>> 24:c -c0fff type wp flags valid, fixed
>> 25:c1000 -c1fff type wp flags valid, fixed
>> 26:c2000 -c2fff type wp flags valid, fixed
>> 27:c3000 -c3fff type wp flags valid, fixed
>> 28:c4000 -c4fff type wp flags valid, fixed
>> 29:c5000 -c5fff type wp flags valid, fixed
>> 30:c6000 -c6fff type wp flags valid, fixed
>> 31:c7000 -c7fff type wp flags valid, fixed
>> 32:c8000 -c8fff type wp flags valid, fixed
>> 33:c9000 -c9fff type wp flags valid, fixed
>> 34:ca000 -cafff type wp flags valid, fixed
>> 35:cb000 -cbfff type wp flags valid, fixed
>> 36:cc000 -ccfff type wp flags valid, fixed
>> 37:cd000 -cdfff type wp flags valid, fixed
>> 38:ce000 -cefff type wp flags valid, fixed
>> 39:cf000 -c type wp flags valid, fixed
>> 40:d -d0fff type wp flags valid, fixed
>> 41:d1000 -d1fff type wp flags valid, fixed
>> 42:d2000 -d2fff type wp flags valid, fixed
>> 43:d3000 -d3fff type wp flags valid, fixed
>> 44:d4000 -d4fff type wp flags valid, fixed
>> 45:d5000 -d5fff type wp flags valid, fixed
>> 46:d6000 -d6fff type wp flags valid, fixed
>> 47:d7000 -d7fff type wp flags valid, fixed
>> 48:d8000 -d8fff type wp flags valid, fixed
>> 49:d9000 -d9fff type wp flags valid, fixed
>> 50:da000 -dafff type wp flags valid, fixed
>> 51:db000 -dbfff type wp flags valid, fixed
>> 52:dc000 -dcfff type wp flags valid, fixed
>> 53:dd000 -ddfff type wp flags valid, fixed
>> 54:de000 -defff type wp flags valid, fixed
>> 55:df000 -d type wp flags valid, fixed
>> 56:e -e0fff type wp flags valid, fixed
>> 57:e1000 -e1fff type wp flags valid, fixed
>> 58:e2000 -e2fff type wp flags valid, fixed
>> 5

Re: ppb(4) related changes

2019-03-01 Thread Masanobu SAITOH

On 2019/01/28 20:57, Jonathan A. Kollasch wrote:

On Mon, Jan 28, 2019 at 07:34:29PM +0900, Masanobu SAITOH wrote:

  Hi.

  I'm now working to modify PCI bridge stuff. I'd like to do the following
two modifications:

  0) ppbreg.h definitions

 All of PCI configuration space's register definitions are in pcireg.h
 The ppbreg.h file also defines the same config registers. I think it's
 good to remove ppbreg.hs' definitions an use pcireg.h's definitions.
 Is it OK?


Seems reasonable, although be sure to fix everything that includes it.


 Done in the following commit:

http://mail-index.netbsd.org/source-changes/2019/03/01/msg103859.html

BTW, while modifying rbus_ppb.c, I might find a bug:

@@ -278,8 +278,8 @@ rbus_pci_addr_fixup(struct ppb_cardbus_s
}

rct.bussize_ioreqs[minbus] =
- rbus_round_up(rct.bussize_ioreqs[minbus], 4096);
-   rct.bussize_memreqs[minbus] =
+ rbus_round_up(rct.bussize_ioreqs[minbus], PCI_BRIDGE_IO_MIN);
+   rct.bussize_memreqs[minbus] =  /* XXX Not 8 but PCI_BRIDGE_MEM_MIN ? */
  rbus_round_up(rct.bussize_memreqs[minbus], 8);

printf("%s: total needs IO %08zx and MEM %08zx\nT",

I suspect the above "8" is wrong. It might be PCI_BRIDGE_MEM_MIN (0x0010UL).
I did not change it and left the comment. If there is anyone who can check
and fix, please feel free to change.

 Thanks.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: To get net ioctl number

2019-02-28 Thread Masanobu SAITOH

On 2019/02/28 16:54, matthew green wrote:

Iain Hibbert writes:

On Thu, 28 Feb 2019, Masanobu SAITOH wrote:


  I'd like to get new number for new ioctl. How should I find unused number
for it? I'm going to add new SIOCXXX. It may not enough to grep sys/net/*.h,
so I made usr.bin/kdump-ioctl.c and did

grep \'i kdump-ioctl.c | sort -n -k 5,5 | uniq | column -t

I think it might not enough because kdump-ioctl.c has no compat-related
code.

  What should I do?


In most cases, an ioctl value is only valid when performed on a handle
leading to a specific subsystem, meaning that it does not really need to
be globally distinct.


Yes, you're correct.


There is a list of the letters used in ioctl(9)

true that that don't _need_ to be globally unique, but it sure
is nice for kdump(1).


Yes, it's also correct. It's better to unique than overlap.


.mrg.


 Currently, kdump-ioctl.c doesn't include compat ioctl. If I mistakenly
choose new value which is used in the compat code, it would cause trouble.
If we can include compat stuff into kdump, it would be good for user
to know about the old syscall's behavior and it also good to check
system-wide ioctl number usage...

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


To get net ioctl number

2019-02-27 Thread Masanobu SAITOH

 Hi.

 This might be FAQ...

 I'd like to get new number for new ioctl. How should I find unused number
for it? I'm going to add new SIOCXXX. It may not enough to grep sys/net/*.h,
so I made usr.bin/kdump-ioctl.c and did

grep \'i kdump-ioctl.c | sort -n -k 5,5 | uniq | column -t

I think it might not enough because kdump-ioctl.c has no compat-related
code.

 What should I do?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: kernel frameworks for jumbo frame

2019-01-31 Thread Masanobu SAITOH

Hi, all.

 I'm sorry for being little late.

On 2019/01/31 14:21, Rin Okuyama wrote:

On 2019/01/30 23:47, Jason Thorpe wrote:

On Jan 30, 2019, at 3:29 PM, Robert Elz  wrote:

| Yes, I think something like:
  |
  | size_t mbuf_cluster_size_for_size(size_t desired_size);

whatever the name, is unlikely to be needed, that would be
used in the "must specify a supportyed size" variant of the
former, rather than the "pick a supported size that will handle
the request" version that you thought was better (as do I.)


I suppose you're right ... I was thinking that there might be some utility to 
exposing the information, in the event that it could be useful for some 
drivers.  I could certainly be left out for now, and exposed at a later time if 
there was a concrete use for it.


OK. I will write a draft, possibly in this weekend.

On 2019/01/30 22:29, Robert Elz wrote:

 Date:    Wed, 30 Jan 2019 13:07:02 +0200
 From:    Jason Thorpe 
 Message-ID:  
...

   | The idea would be that instead of adding additional property fields
   | to struct ifnet, you could add either a prop_dictionary_t or nvlist

Yes, we know that you like property lists ...

They have the advantage that the structs don't need to
keep changing nearly as often when new data is needed,
with the consequential kernel version bumps (and so, it
becomes possible to pull up any changes which require a
new property with new date) but the disadantage that it
is easy for every developer, of every driver (or whatever),
their dog, and each of the dog's fleas, to all add new
properties, ignoring what was done by each of the others,
fail to document them properly (UTSL!) and we end up with
a completely incomprehensible mess, which no-one can
use or understand.


 I'm also one of the man who don't want to use property carelessly.
The reason is almost the same as kre's.

 A few months ago, I surveyed the diversity of "struct ifreq",
"struct if_media", if_flags, and ifcaps among {Net,Open,Free,DragonFly}BSD
and MacOS(xnu). It was caos. One of big problem and the reason of the
caos is that almost all OSes don't do correct layering.

 I thinks an interface's MTU and the Layer 2's frame size should be
distinct.

 For Ethernet, the frame format is shared with some other protocols,
so such type of structures should be shared (e.g. our ec_cap*).
(XXX Last month, I added ETHERCAP_EEE into ec_cap{abilities,enable}.
I think it's not good because 802.11 doesn't have it. If we have
a good framework under the layer, it would be good to move EEE stuff
into it, but I think moving EEE stuff into if_media is not good.)

 if_media is used for more than 20 years. In this two decades, a lot
of media and a lot of optional features have been added. It's too
difficult to add optional features depend on a media, so I think
if_media should be simple and move some feature options to some other
location (new data structure).

 If someone design a good frame work I'll help to implement it.

 Thanks.


Well, I also feel it little bit too much... Anyway, I will write a
patch to see what it looks like.

On 2019/01/30 21:59, Christos Zoulas wrote:

In article <20190130110317.gc28...@mail.duskware.de>,
Martin Husemann   wrote:

On Wed, Jan 30, 2019 at 07:59:13PM +0900, Rin Okuyama wrote:

I think we agree to add API something like

On 2019/01/29 17:32, Maxime Villard wrote:

  int
  m_addjcl(struct mbuf *m, int how, int size)


Just a minor nit: avoid "JCL" in any names - it causes great pain for some
readers.


It is the function that adds support for the Job Control Language to mbufs.


Oops...

Thanks,
rin



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


ppb(4) related changes

2019-01-28 Thread Masanobu SAITOH

 Hi.

 I'm now working to modify PCI bridge stuff. I'd like to do the following
two modifications:

 0) ppbreg.h definitions

All of PCI configuration space's register definitions are in pcireg.h
The ppbreg.h file also defines the same config registers. I think it's
good to remove ppbreg.hs' definitions an use pcireg.h's definitions.
Is it OK?

 1) Mis-configuration of prefetchable memory in pciconf.c::configure_bridge()
(only for PCI_NETBSD_CONFIGURE)

In configure_bridge():


mem = pci_conf_read(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHMEM_REG);


Read register


#if ULONG_MAX > 0x
if (!PCI_BRIDGE_PREFETCHMEM_64BITS(mem) && mem_limit > 0xULL) {


Test if it's marked as 64bit (the maring is in lower 4bit)


printf("Bus %d bridge does not support 64-bit PMEM.  ",
pb->busno);
printf("Disabling prefetchable-MEM accesses\n");
mem_base  = 0x10;   /* 1M */
mem_limit = 0x00;
}
#endif
mem = (((mem_base >> 20) & PCI_BRIDGE_PREFETCHMEM_BASE_MASK)
<< PCI_BRIDGE_PREFETCHMEM_BASE_SHIFT);
mem |= (((mem_limit >> 20) & PCI_BRIDGE_PREFETCHMEM_LIMIT_MASK)
<< PCI_BRIDGE_PREFETCHMEM_LIMIT_SHIFT);
pci_conf_write(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHMEM_REG, mem);


Regenerate mem value from scratch _WITHOUT_SETTING_LOWER_4BITS_
(It's OK because the lower 4bit is read only).


/*
 * XXX -- 64-bit systems need a lot more than just this...
 */
if (PCI_BRIDGE_PREFETCHMEM_64BITS(mem)) {


Test if it's marked as 64bit or not. It must be wrong because "mem"
is not the original value, newly generated and low 4bits are 0.
It should always be false.


mem_base  = (uint64_t) mem_base  >> 32;
mem_limit = (uint64_t) mem_limit >> 32;
pci_conf_write(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHBASE32_REG,
mem_base & 0x);
pci_conf_write(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHLIMIT32_REG,
mem_limit & 0x);
}


So, proposed patch:
--
Index: pciconf.c
===
RCS file: /cvsroot/src/sys/dev/pci/pciconf.c,v
retrieving revision 1.37
diff -u -p -r1.37 pciconf.c
--- pciconf.c   5 Sep 2014 05:29:16 -   1.37
+++ pciconf.c   28 Jan 2019 10:30:13 -
@@ -855,6 +855,7 @@ configure_bridge(pciconf_dev_t *pd)
pciconf_bus_t   *pb;
pcireg_tio, iohigh, mem, cmd;
int rv;
+   boolisprefetchmem64;
 
 	pb = pd->ppb;

/* Configure I/O base & limit*/
@@ -919,8 +920,9 @@ configure_bridge(pciconf_dev_t *pd)
mem_limit = 0x00;
}
mem = pci_conf_read(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHMEM_REG);
+   isprefetchmem64 = PCI_BRIDGE_PREFETCHMEM_64BITS(mem);
 #if ULONG_MAX > 0x
-   if (!PCI_BRIDGE_PREFETCHMEM_64BITS(mem) && mem_limit > 0xULL) {
+   if (!isprefetchmem64 && mem_limit > 0xULL) {
printf("Bus %d bridge does not support 64-bit PMEM.  ",
pb->busno);
printf("Disabling prefetchable-MEM accesses\n");
@@ -936,7 +938,7 @@ configure_bridge(pciconf_dev_t *pd)
/*
 * XXX -- 64-bit systems need a lot more than just this...
 */
-   if (PCI_BRIDGE_PREFETCHMEM_64BITS(mem)) {
+   if (isprefetchmem64) {
mem_base  = (uint64_t) mem_base  >> 32;
mem_limit = (uint64_t) mem_limit >> 32;
pci_conf_write(pb->pc, pd->tag, PCI_BRIDGE_PREFETCHBASE32_REG,
--

I can't test this diff because I have no any machine which can test
PCI_NETBSD_CONFIGURE now. Is it OK to commit without test?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Tidy up / simplify "com" register initialization

2018-12-05 Thread Masanobu SAITOH

On 2018/12/04 13:53, Jason Thorpe wrote:

I'm working on an application where I need some additional run-time flexibility


PLEASE!
sys/dev/ic/com.c has some compile time #ifdef 's and it cause bad
effect to usb serial and/or puc com devices.



in how the "com" driver register map is setup.  COM_REGMAP cam do the job, but 
there is some untidiness around how com_regs is initialized.  This patch is a small step 
in cleaning that up, with some other minor changes to follow later.

This has been compiled-tested everywhere, and I will spot-check on a few 
different devices that I have handy.  Most of the changes are purely mechanical.

I'll commit in a couple of days unless there are serious objections.

Thx.

https://www.dropbox.com/s/xl4y3jv2u68bnr2/com_regs-patch.txt

-- thorpej




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-04 Thread Masanobu SAITOH

On 2018/12/05 4:53, Jaromír Doleček wrote:

I've now disabled MSI-X for ahcisata(4), need to figure what needs to
be setup there.

Le lun. 3 déc. 2018 à 22:41, Jared McNeill  a écrit :

IIUC we don't actually have confirmation that ThunderX AHCI works yet..
Nick is having issues with his board.


Okay - I misunderstood, thought it was confirmed working.

The BAR used for Cavium is slightly suspicious. AHCI spec mentions
(v1.3.1 section 2.4 Serial ATA Capability) 0x10 BAR as one of 'legacy'
access methods. It specifically only talks about 0x24 as the proper
BAR for AHCI, it doesn't mention option to have it elsewhere.

Jaromir


 I suspect Serial ATA AHCI 1.2.1 specification page 111 has the hint.

 Figure 23: Port/CCC and MSI Message Mapping, Example 1
 Figure 24: Port and MSI Message Mapping, Example 2

I suspect MSI-X also assume this layout. pci_msix_alloc_map() might
be used.

 My SuperMicro A2SDi-H-TP4F's disk is connected to not channel 0
but channel 6. I also verified that GHC.MRSM is 0.

ahcisata1 at pci0 dev 20 function 0: vendor 8086 product 19c2 (rev. 0x11)
ahcisata1: interrupting at msix2 vec 0
ahcisata1: 64-bit DMA
ahcisata1: AHCI revision 1.31, 4 ports, 32 slots, CAP 
0xc3349f43
X GHC = 8002
atabus8 at ahcisata1 channel 3
atabus9 at ahcisata1 channel 4
atabus10 at ahcisata1 channel 5
atabus11 at ahcisata1 channel 6



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-04 Thread Masanobu SAITOH

On 2018/12/04 16:44, Masanobu SAITOH wrote:

On 2018/12/04 4:55, Jaromír Doleček wrote:

Le lun. 3 déc. 2018 à 12:09, Masanobu SAITOH  a écrit :

C3000's AHCI has multi-vector MSI-X table and it doesn't work since
Nobember 20th...


Can you try if by chance this code adapted nvme(4) changes anything on
your system?

http://www.netbsd.org/~jdolecek/ahcisata_msixoff.diff


It panics:

[   1.0427481] panic: extent_alloc_region: bad size
[   1.0427481] cpu0: Begin traceback...
[   1.0427481] vpanic() at netbsd:vpanic+0x16f
[   1.0427481] snprintf() at netbsd:snprintf
[   1.0427481] extent_alloc_region() at netbsd:extent_alloc_region+0x29e
[   1.0427481] bus_space_reserve() at netbsd:bus_space_reserve+0x95
[   1.0427481] bus_space_map() at netbsd:bus_space_map+0x85
[   1.0427481] ahci_pci_attach() at netbsd:ahci_pci_attach+0xde
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] config_found_sm_loc() at netbsd:config_found_sm_loc+0x48
[   1.0427481] pci_probe_device() at netbsd:pci_probe_device+0x582
[   1.0427481] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x197
[   1.0427481] pcirescan() at netbsd:pcirescan+0x47
[   1.0427481] pciattach() at netbsd:pciattach+0x193
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] config_found_sm_loc() at netbsd:config_found_sm_loc+0x48
[   1.0427481] mp_pci_scan() at netbsd:mp_pci_scan+0xa3
[   1.0427481] mainbus_attach() at netbsd:mainbus_attach+0x332
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] cpu_configure() at netbsd:cpu_configure+0x2b
[   1.0427481] main() at netbsd:main+0x331
[   1.0427481] cpu0: End traceback...
[   1.0427481] fatal breakpoint trap in supervisor mode
[   1.0427481] trap type 1 code 0 rip 0x8021de45 cs 0x8 rflags 0x202 
cr2 0 ilevel 0x8 rsp 0x81d027f0
[   1.0427481] curlwp 0x81858900 pid 0.1 lowest kstack 
0x81cfe2c0
Stopped in pid 0.1 (system) at  netbsd:breakpoint+0x5:  leave


pcictl dump shows:

PCI configuration registers:
  Common header:
    0x00: 0x19c28086 0x02b00147 0x01060110 0x

    Vendor Name: Intel (0x8086)
    Device Name: C3000 SATA Controller 1 (0x19c2)
    Command register: 0x0147
  I/O space accesses: on
  Memory space accesses: on
  Bus mastering: on
  Special cycles: off
  MWI transactions: off
  Palette snooping: off
  Parity error checking: on
  Address/data stepping: off
  System error (SERR): on
  Fast back-to-back transactions: off
  Interrupt disable: off
    Status register: 0x02b0
  Immediate Readiness: off
  Interrupt status: inactive
  Capability List support: on
  66 MHz capable: on
  User Definable Features (UDF) support: off
  Fast back-to-back capable: on
  Data parity error detected: off
  DEVSEL timing: medium (0x1)
  Slave signaled Target Abort: off
  Master received Target Abort: off
  Master received Master Abort: off
  Asserted System Error (SERR): off
  Parity error detected: off
    Class Name: mass storage (0x01)
    Subclass Name: SATA (0x06)
    Interface Name: AHCI 1.0 (0x01)
    Revision ID: 0x10
    BIST: 0x00
    Header Type: 0x00 (0x00)
    Latency Timer: 0x00
    Cache Line Size: 0bytes (0x00)

  Type 0 ("normal" device) header:
    0x10: 0xdffb4000 0xdffbd000 0xe071 0xe061
    0x20: 0xe021 0xdffbc000 0x 0x72708086
    0x30: 0x 0x0080 0x 0x010e

    Base address register at 0x10
  type: 32-bit nonprefetchable memory
  base: 0xdffb4000
    Base address register at 0x14
  type: 32-bit nonprefetchable memory
  base: 0xdffbd000
    Base address register at 0x18
  type: I/O
  base: 0xe070
    Base address register at 0x1c
  type: I/O
  base: 0xe060
    Base address register at 0x20
  type: I/O
  base: 0xe020
    Base address register at 0x24
  type: 32-bit nonprefetchable memory
  base: 0xdffbc000
    Cardbus CIS Pointer: 0x
    Subsystem vendor ID: 0x8086
    Subsystem ID: 0x7270
    Expansion ROM Base Address Register: 0x
  base: 0x
  Expansion ROM Enable: off
  Validation Status: Validation not supported
  Validation Details: 0x0
    Capability list pointer: 0x80
    Reserved @ 0x38: 0x
    Maximum Latency: 0x00
    Minimum Grant: 0x00
    Interrupt pin: 0x01 (pin A)
    Interrupt line: 0x0e

  Capability register at 0x80
    type: 0x05 (MSI)
  Capability register at 0x70
    type: 0x01 (Power Management)
  Capability register at 0xa8
    type: 0x12 (SATA)
  Capability register at 0xd0
    type: 0x11 (MSI-X)

  PCI Power Management Capabilities Register
    Capabilities register: 0x4003
  Version: 1.2
  PME# clock: off
  Device specific initialization: off
  3.3V auxiliary current: self-powered
  D1 power management state support: off
  D2 power management state su

Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-03 Thread Masanobu SAITOH

On 2018/12/04 4:55, Jaromír Doleček wrote:

Le lun. 3 déc. 2018 à 12:09, Masanobu SAITOH  a écrit :

C3000's AHCI has multi-vector MSI-X table and it doesn't work since
Nobember 20th...


Can you try if by chance this code adapted nvme(4) changes anything on
your system?

http://www.netbsd.org/~jdolecek/ahcisata_msixoff.diff


It panics:

[   1.0427481] panic: extent_alloc_region: bad size
[   1.0427481] cpu0: Begin traceback...
[   1.0427481] vpanic() at netbsd:vpanic+0x16f
[   1.0427481] snprintf() at netbsd:snprintf
[   1.0427481] extent_alloc_region() at netbsd:extent_alloc_region+0x29e
[   1.0427481] bus_space_reserve() at netbsd:bus_space_reserve+0x95
[   1.0427481] bus_space_map() at netbsd:bus_space_map+0x85
[   1.0427481] ahci_pci_attach() at netbsd:ahci_pci_attach+0xde
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] config_found_sm_loc() at netbsd:config_found_sm_loc+0x48
[   1.0427481] pci_probe_device() at netbsd:pci_probe_device+0x582
[   1.0427481] pci_enumerate_bus() at netbsd:pci_enumerate_bus+0x197
[   1.0427481] pcirescan() at netbsd:pcirescan+0x47
[   1.0427481] pciattach() at netbsd:pciattach+0x193
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] config_found_sm_loc() at netbsd:config_found_sm_loc+0x48
[   1.0427481] mp_pci_scan() at netbsd:mp_pci_scan+0xa3
[   1.0427481] mainbus_attach() at netbsd:mainbus_attach+0x332
[   1.0427481] config_attach_loc() at netbsd:config_attach_loc+0x1a8
[   1.0427481] cpu_configure() at netbsd:cpu_configure+0x2b
[   1.0427481] main() at netbsd:main+0x331
[   1.0427481] cpu0: End traceback...
[   1.0427481] fatal breakpoint trap in supervisor mode
[   1.0427481] trap type 1 code 0 rip 0x8021de45 cs 0x8 rflags 0x202 
cr2 0 ilevel 0x8 rsp 0x81d027f0
[   1.0427481] curlwp 0x81858900 pid 0.1 lowest kstack 
0x81cfe2c0
Stopped in pid 0.1 (system) at  netbsd:breakpoint+0x5:  leave


pcictl dump shows:

PCI configuration registers:
  Common header:
0x00: 0x19c28086 0x02b00147 0x01060110 0x

Vendor Name: Intel (0x8086)
Device Name: C3000 SATA Controller 1 (0x19c2)
Command register: 0x0147
  I/O space accesses: on
  Memory space accesses: on
  Bus mastering: on
  Special cycles: off
  MWI transactions: off
  Palette snooping: off
  Parity error checking: on
  Address/data stepping: off
  System error (SERR): on
  Fast back-to-back transactions: off
  Interrupt disable: off
Status register: 0x02b0
  Immediate Readiness: off
  Interrupt status: inactive
  Capability List support: on
  66 MHz capable: on
  User Definable Features (UDF) support: off
  Fast back-to-back capable: on
  Data parity error detected: off
  DEVSEL timing: medium (0x1)
  Slave signaled Target Abort: off
  Master received Target Abort: off
  Master received Master Abort: off
  Asserted System Error (SERR): off
  Parity error detected: off
Class Name: mass storage (0x01)
Subclass Name: SATA (0x06)
Interface Name: AHCI 1.0 (0x01)
Revision ID: 0x10
BIST: 0x00
Header Type: 0x00 (0x00)
Latency Timer: 0x00
Cache Line Size: 0bytes (0x00)

  Type 0 ("normal" device) header:
0x10: 0xdffb4000 0xdffbd000 0xe071 0xe061
0x20: 0xe021 0xdffbc000 0x 0x72708086
0x30: 0x 0x0080 0x 0x010e

Base address register at 0x10
  type: 32-bit nonprefetchable memory
  base: 0xdffb4000
Base address register at 0x14
  type: 32-bit nonprefetchable memory
  base: 0xdffbd000
Base address register at 0x18
  type: I/O
  base: 0xe070
Base address register at 0x1c
  type: I/O
  base: 0xe060
Base address register at 0x20
  type: I/O
  base: 0xe020
Base address register at 0x24
  type: 32-bit nonprefetchable memory
  base: 0xdffbc000
Cardbus CIS Pointer: 0x
Subsystem vendor ID: 0x8086
Subsystem ID: 0x7270
Expansion ROM Base Address Register: 0x
  base: 0x
  Expansion ROM Enable: off
  Validation Status: Validation not supported
  Validation Details: 0x0
Capability list pointer: 0x80
Reserved @ 0x38: 0x
Maximum Latency: 0x00
Minimum Grant: 0x00
Interrupt pin: 0x01 (pin A)
Interrupt line: 0x0e

  Capability register at 0x80
type: 0x05 (MSI)
  Capability register at 0x70
type: 0x01 (Power Management)
  Capability register at 0xa8
type: 0x12 (SATA)
  Capability register at 0xd0
type: 0x11 (MSI-X)

  PCI Power Management Capabilities Register
Capabilities register: 0x4003
  Version: 1.2
  PME# clock: off
  Device specific initialization: off
  3.3V auxiliary current: self-powered
  D1 power management state support: off
  D2 power management state support: off
  PME# support D0: off
  

Re: pci_intr_alloc() vs pci_intr_establish() - retry type?

2018-12-03 Thread Masanobu SAITOH

On 2018/12/01 5:27, Jared McNeill wrote:

On Fri, 30 Nov 2018, Martin Husemann wrote:


On Fri, Nov 30, 2018 at 02:22:37PM -0400, Jared McNeill wrote:

The driver can call pci_intr_type on the handle returned by pci_intr_alloc
to see what kind of interrupt (INTx/MSI/MSI-X) was negotiated.


Yeah, but the fear was that some hardware could either do a single INT
or (lots of) MSI{X}, and now you end up with (lots of) INT returned
from pci_intr_alloc() and can just use the first.

Does this not happen for real world hardware, or is the driver supposed to
check with pci_*_count() upfront?


I don't think pci_intr_alloc will ever return more than one INTx..


C3000's AHCI has multi-vector MSI-X table and it doesn't work since
Nobember 20th...



[   1.0426355] ahcisata0 at pci0 dev 19 function 0: vendor 8086 product 19b2 
(rev. 0x10)
[   1.0426355] allocated pic msix5 type edge pin 0 level 6 to cpu0 slot 21 idt 
entry 110
[   1.0426355] ahcisata0: interrupting at msix5 vec 0
[   1.0426355] ahcisata0: 64-bit DMA
[   1.0426355] ahcisata0: AHCI revision 1.31, 1 port, 32 slots, CAP 
0xc3369f40
[   1.0426355] atabus0 at ahcisata0 channel 3
[   1.0426355] ahcisata1 at pci0 dev 20 function 0: vendor 8086 product 19c2 
(rev. 0x10)
[   1.0426355] allocated pic msix6 type edge pin 0 level 6 to cpu0 slot 22 idt 
entry 111
[   1.0426355] ahcisata1: interrupting at msix6 vec 0
[   1.0426355] ahcisata1: 64-bit DMA
[   1.0426355] ahcisata1: AHCI revision 1.31, 2 ports, 32 slots, CAP 
0xc3369f41
[   1.0426355] atabus1 at ahcisata1 channel 4
[   1.0426355] atabus2 at ahcisata1 channel 5
(snip)
[   3.4536958] wd0 at atabus2 drive 0
[   6.4561682] wd0: IDENTIFY failed
[   6.4561682] wd0: fixing 0 sector size
[   6.4561682] wd0: secperunit and ncylinders are zero
[   9.4686496] wd0(ahcisata1:5:0): using PIO mode 0
(snip)
[   9.6688146] boot device: 



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


DVF_ATTACH_INPROGRESS

2018-11-29 Thread Masanobu SAITOH

 Hi.

 While I've been working for suspend/resume, I've noticed that some devices
are reported "WARNING: power management not supported" in config_attach_loc()
even though the driver has pmf_device_register() in it. The reason is that
pmf_device_register is deferred by config_interrupts(). It's not a real bug,
but the above message is important for suspend/resume work. I wrote the
following patch:

Index: sys/sys/device.h
===
RCS file: /cvsroot/src/sys/sys/device.h,v
retrieving revision 1.156
diff -u -p -r1.156 device.h
--- sys/sys/device.h18 Sep 2018 01:25:09 -  1.156
+++ sys/sys/device.h29 Nov 2018 08:33:29 -
@@ -202,6 +202,7 @@ struct device {
 #defineDVF_CLASS_SUSPENDED 0x0008  /* device class suspend was 
called */
 #defineDVF_DRIVER_SUSPENDED0x0010  /* device driver suspend was 
called */
 #defineDVF_BUS_SUSPENDED   0x0020  /* device bus suspend was 
called */
+#defineDVF_ATTACH_INPROGRESS   0x0040  /* device attach is in progress 
*/
 #defineDVF_DETACH_SHUTDOWN 0x0080  /* device detaches safely at 
shutdown */
 
 #ifdef _KERNEL

Index: sys/kern/subr_autoconf.c
===
RCS file: /cvsroot/src/sys/kern/subr_autoconf.c,v
retrieving revision 1.263
diff -u -p -r1.263 subr_autoconf.c
--- sys/kern/subr_autoconf.c18 Sep 2018 01:25:09 -  1.263
+++ sys/kern/subr_autoconf.c29 Nov 2018 08:33:30 -
@@ -446,6 +446,10 @@ config_interrupts_thread(void *cookie)
while ((dc = TAILQ_FIRST(&interrupt_config_queue)) != NULL) {
TAILQ_REMOVE(&interrupt_config_queue, dc, dc_queue);
(*dc->dc_func)(dc->dc_dev);
+   dc->dc_dev->dv_flags &= ~DVF_ATTACH_INPROGRESS;
+   if (!device_pmf_is_registered(dc->dc_dev))
+   aprint_debug_dev(dc->dc_dev,
+   "WARNING: power management not supported\n");
config_pending_decr(dc->dc_dev);
kmem_free(dc, sizeof(*dc));
}
@@ -1597,9 +1601,10 @@ config_attach_loc(device_t parent, cfdat
 
 	(*dev->dv_cfattach->ca_attach)(parent, dev, aux);
 
-	if (!device_pmf_is_registered(dev))

-   aprint_debug_dev(dev, "WARNING: power management not "
-   "supported\n");
+   if (((dev->dv_flags & DVF_ATTACH_INPROGRESS) == 0)
+   && !device_pmf_is_registered(dev))
+   aprint_debug_dev(dev,
+   "WARNING: power management not supported\n");
 
 	config_process_deferred(&deferred_config_queue, dev);
 
@@ -1996,6 +2001,7 @@ config_interrupts(device_t dev, void (*f

dc->dc_func = func;
TAILQ_INSERT_TAIL(&interrupt_config_queue, dc, dc_queue);
config_pending_incr(dev);
+   dev->dv_flags |= DVF_ATTACH_INPROGRESS;
 }
 
 /*

Index: sys/external/bsd/drm2/i915drm/intelfb.c
===
RCS file: /cvsroot/src/sys/external/bsd/drm2/i915drm/intelfb.c,v
retrieving revision 1.15
diff -u -p -r1.15 intelfb.c
--- sys/external/bsd/drm2/i915drm/intelfb.c 27 Aug 2018 15:09:35 -  
1.15
+++ sys/external/bsd/drm2/i915drm/intelfb.c 29 Nov 2018 08:33:30 -
@@ -119,6 +119,7 @@ intelfb_attach(device_t parent, device_t
error);
goto fail1;
}
+   self->dv_flags |= DVF_ATTACH_INPROGRESS;
sc->sc_scheduled = true;
 
 	/* Success!  */

@@ -177,6 +178,7 @@ intelfb_attach_task(struct i915drmkms_ta
};
int error;
 
+	sc->sc_dev->dv_flags &= ~DVF_ATTACH_INPROGRESS;

error = drmfb_attach(&sc->sc_drmfb, &da);
if (error) {
aprint_error_dev(sc->sc_dev, "failed to attach drmfb: %d\n",


(intelfb.c doesn't use config_interrupts)

OK?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: NetBSD-8 kernel too big?

2018-09-02 Thread Masanobu SAITOH

On 2018/08/30 20:54, D'Arcy Cain wrote:

My latest build of NetBSD 8 results in a kernel size of 24,964,488
bytes.  Not sure if "kernel" is really the right term any more.  :-)

I have installed it on all of my servers except for one.  It is a HP
ProLiant DL160se G6.  The failure is something like this:

Read header failed: input/output error
Boot: ad0a:netbsd: input/output error

I did find something on the net about FreeBSD that suggests that this
can be fixed by disabling a memory hole in the bios but this system does
not seem to have that option.

The 7.1 kernel that boots is 17,812,998 bytes making it just over 16MB.
I am not sure how much of the file actually loads into memory but this
sounds close to a boundary.

I tried removing a bunch of stuff but I run the same kernel on multiple
servers and figuring out what to drop isn't that easy.  Removing a bunch
of stuff that seemed safe didn't make much difference.

I saw these two warnings in the dmesg output:

ACPI BIOS Warning (bug): 32/64X length mismatch in FADT/Gpe0Block:
128/64 (20131218/tbfadt-634)
ACPI BIOS Warning (bug): Incorrect checksum in table [OEMB] - 0x29,
should be 0x28 (20131218/tbprint-237)

The first is also in anther server's dmesg which boots the new kernel
fine and research on the net suggests that the second doesn't matter.

Anyone know what my next steps are?

Cheers.


On my machines:

netbsd-7:
 > % ls -l /netbsd
 > -rwxr-xr-x  1 root  wheel  17968892 Aug 29 23:04 /netbsd

netbsd-8:
 > % ls -l /netbsd
 > -rwxr-xr-x  1 root  wheel  24925744 Aug 25 00:19 /netbsd

almost the same size as yours.

netbsd-7:

% objdump -h /netbsd

/netbsd: file format elf64-x86-64

Sections:
Idx Name  Size  VMA   LMA   File off  Algn
  0 .text 008f1e76  8010  0010  0010  2**6
  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata   00358f53  809f1e80  009f1e80  009f1e80  2**5
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .eh_frame 00164688  80d4add8  00d4add8  00d4add8  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  3 link_set_modules 0430  80eaf460  00eaf460  00eaf460  
2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 link_set_sysctl_funcs 01e0  80eaf890  00eaf890  
00eaf890  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  5 link_set_ah_rfs 0038  80eafa70  00eafa70  00eafa70  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  6 link_set_ah_chips 0038  80eafaa8  00eafaa8  00eafaa8  
2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  7 link_set_domains 0058  80eafae0  00eafae0  00eafae0  
2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  8 link_set_bufq_strats 0020  80eafb38  00eafb38  00eafb38 
 2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
  9 link_set_evcnts 00f8  80eafb58  00eafb58  00eafb58  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 10 link_set_dkwedge_methods 0010  80eafc50  00eafc50  
00eafc50  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 11 link_set_linux_module_param_desc 0190  80eafc60  
00eafc60  00eafc60  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 12 link_set_linux_module_param_info 01a0  80eafdf0  
00eafdf0  00eafdf0  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 13 link_set_ieee80211_funcs 0020  80eaff90  00eaff90  
00eaff90  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 14 link_set_prop_linkpools 0040  80eaffb0  00eaffb0  
00eaffb0  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 15 .data 0009eae0  80fb  00fb  00eb  2**12
  CONTENTS, ALLOC, LOAD, DATA
 16 .data.cacheline_aligned 3004  8104eb00  0104eb00  
00f4eb00  2**6
  CONTENTS, ALLOC, LOAD, DATA
 17 .data.read_mostly 03b8  81051b40  01051b40  00f51b40  
2**5
  CONTENTS, ALLOC, LOAD, DATA
 18 .bss  00092000  81052000  01052000  00f51ef8  2**12
  ALLOC
 19 .note.netbsd.ident 0018  810e4000  810e4000  00f51ef8  
2**2
  CONTENTS, READONLY
 20 .comment  001a      00f51f10  2**0
  CONTENTS, READONLY
 21 .ident00019968      00f51f2a  2**0
  CONTENTS, READONLY
 22 .debug_info   4683      00f6b892  2**0
  CONTENTS, READONLY, DEBUGGING
 23 .debug_abbrev 0238      00f6ff15  2**0
  CONTENTS, READONLY, DEBUGGING
 

Re: if_wm.c: is this a bug, or am I missing something?

2018-04-22 Thread Masanobu SAITOH

On 2018/04/21 3:55, Mouse wrote:

I'm trying to write some bare-metal code (no OS) for a chip that
appears to be supported by if_wm.c, but for which the manufacturer's
documentation PDF is cripplingly broken, leading to my reading over
that file, trying to use it as hardware documentation of a sort...and I
see something odd which looks to me like a bug.  I'm wondering whether
it is or whether I'm just missing something.

In wm_get_swfw_semaphore, I find (lines 12961-12966 of 1.562)

 if (sc->sc_type == WM_T_80003)
 timeout = 50;
 else
 timeout = 200;

 for (timeout = 0; timeout < 200; timeout++) {

Given the first part of the for loop ("timeout = 0"), it looks to me as
though the assignments to timeout above it, and the test controlling
which assignment happens, are completely useless.  If I had to guess,
I'd guess the test and assignments were designed for a loop more like
for (;timeout>0;timeout--).

Is the bug in the code?  Or in my brain?


 You're correct!

 It have been fixed now (in if_wm.c rev 1.575).

Thanks!


/~\ The ASCII Mouse
\ / Ribbon Campaign
  X  Against HTML   mo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: amd64: svs

2018-01-15 Thread Masanobu SAITOH

On 2018/01/15 18:12, Emmanuel Dreyfus wrote:

On Mon, Jan 15, 2018 at 05:34:18PM +0900, Masanobu SAITOH wrote:

cpu0: MSR_IA32_ARCH_CAPABILITIES=0x1

I've not written this bit by myself.


Where can this be readen without your patch?



In cpu_ucode_intel.c::cpu_ucode_intel_apply().

-
cvs diff: Diffing .
Index: cpu_ucode_intel.c
===
RCS file: /cvsroot/src/sys/arch/x86/x86/cpu_ucode_intel.c,v
retrieving revision 1.12
diff -u -p -r1.12 cpu_ucode_intel.c
--- cpu_ucode_intel.c   1 Jun 2017 02:45:08 -   1.12
+++ cpu_ucode_intel.c   15 Jan 2018 09:31:34 -
@@ -110,8 +110,13 @@ cpu_ucode_intel_apply(struct cpu_ucode_s
uint32_t ucodetarget, oucodeversion, nucodeversion;
int platformid, cpuid;
struct intel1_ucode_header *uh;
+   struct cpu_info *ci;
+   struct cpu_info oldci;
+   int i;
void *uha;
size_t newbufsize = 0;
+   uint64_t msr = 0;
+   u_int descs[4];
int rv = 0;

if (sc->loader_version != CPU_UCODE_LOADER_INTEL1
@@ -144,15 +149,37 @@ cpu_ucode_intel_apply(struct cpu_ucode_s
intel_getcurrentucode(&nucodeversion, &platformid);
cpuid = curcpu()->ci_index;

-   kpreempt_enable();
-
if (nucodeversion != ucodetarget) {
+   kpreempt_enable();
rv = EIO;
goto out;
}

+   ci = curcpu();
+   oldci = *ci;
+#if 0
+   /* Update cpu_info */
+   cpu_probe(curcpu());
+#endif
+   if (ci->ci_max_cpuid >= 7) {
+   x86_cpuid(7, descs);
+   if (descs[3] & CPUID_SEF_ARCH_CAP)
+   msr = rdmsr(MSR_IA32_ARCH_CAPABILITIES);
+   }
+   kpreempt_enable();
+
+   if ((ci->ci_max_cpuid >= 7) && (descs[3] & CPUID_SEF_ARCH_CAP))
+   printf("cpu%d: MSR_IA32_ARCH_CAPABILITIES=0x%lx\n", cpuid,
+   msr);
printf("cpu %d: ucode 0x%x->0x%x\n", cpuid,
   oucodeversion, nucodeversion);
+   for (i = 0; i < __arraycount(ci->ci_feat_val); i++) {
+   if (oldci.ci_feat_val[i] != ci->ci_feat_val[i])
+   printf("cpu%d: cpu_feature[%d] changed from "
+   "%08x to %08x\n", cpuid, i, oldci.ci_feat_val[i],
+   ci->ci_feat_val[i]);
+   }
+
 out:
if (newbufsize != 0)
kmem_free(uha, newbufsize);
-

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: amd64: svs

2018-01-15 Thread Masanobu SAITOH

On 2018/01/15 17:42, Masanobu SAITOH wrote:

On 2018/01/15 17:34, Masanobu SAITOH wrote:

On 2018/01/15 3:42, Dave Huang wrote:

On Jan 13, 2018, at 22:20, Emmanuel Dreyfus mailto:m...@netbsd.org>> wrote:


John Klos mailto:j...@ziaspace.com>> wrote:


FWIW, Intel updated their microcode on 8-Jan-2018:

https://downloadcenter.intel.com/download/27431/

I can't find any documentation which shows what exactly is fixed for each
family of processors.


It does not seems to fix much.

I tried the tests from git repository
http://github.com/IAIK/meltdown.git

This is Linux-centric, but at least, the "secret" case seems to apply on
NetBSD. It reports vulnerability before (cpu0: microcode version 0x58,
platform ID 1) and after (cpu0: microcode version 0x80, platform ID 1)
microcode upgrade.

And I do not have a spectre test case. Anyone has that?


My understanding is that the microcode update doesn't actually *fix*
anything by itself. What it does is enable an OS to more easily fix
things.

The kernel would need to check for the new SPEC_CTRL CPUID bit,
and if present, twiddle with the new speculation control-related MSRs
at the appropriate times.
--
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: k...@azeotrope.org <mailto:k...@azeotrope.org> |  they raise a paw / the 
bat, the cat /
Telegram: @DahanC        |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 42 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++



FYI:

Diffs of "cpuctl -v identify 0" before microcode update and after on
some my machines.

Haswell:

% diff -up o1.log o2.log
--- o1.log  2018-01-11 18:11:36.897983031 +0900
+++ o2.log  2018-01-11 18:11:44.782255154 +0900
@@ -6,7 +6,7 @@ cpu0: 0003:   00
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00042120
 cpu0: 0006: 0077 0002 0009 
-cpu0: 0007:  2fbb  
+cpu0: 0007:  27ab  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300403   0603
@@ -35,8 +35,9 @@ cpu0: features1 0x7ffafbff
 cpu0: features2 0x2c100800
 cpu0: features3 0x21
-cpu0: features5 0x2fbb
-cpu0: features5 0x2fbb
+cpu0: features5 0x27ab
+cpu0: features5 0x27ab
+cpu0: SEF edx 0xc00
 cpu0: xsave features 0x7
 cpu0: xsave instructions 0x1
 cpu0: xsave area size: current 832, maximum 832, xgetbv enabled
@@ -55,4 +56,4 @@ cpu0: SMT ID 0
 cpu0: DSPM-eax 0x77
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x17, platform ID 1
+cpu0: microcode version 0x23, platform ID 1


  Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Kaby Lake (XXX on xen dom0):

% diff -u xen-0.log xen-1.log
--- xen-0.log   2018-01-15 15:39:37.350775248 +0900
+++ xen-1.log   2018-01-15 15:39:46.135931681 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00142120
 cpu0: 0006: 27f7 0002 0009 
-cpu0: 0007:  029c6fbf  
+cpu0: 0007:  029c6fbf  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300804   0603


oops, the test binary was old.


@ -47,6 +47,7 @@
 cpu0: features5 0x29c6fbf
 cpu0: features5 0x29c6fbf
 cpu0: features5 0x29c6fbf
+cpu0: SEF edx 0xc00
 cpu0: xsave features 0x1f
 cpu0: xsave instructions 0xf
 cpu0: xsave area size: current 576, maximum 1088, xgetbv enabled



@@ -67,4 +67,4 @@
 cpu0: DSPM-eax 0x27f7
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x48, platform ID 1
+cpu0: microcode version 0x80, platform ID 1


  Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Denverton:

% diff -u dnv-0.log dnv-1.log
--- dnv-0.log   2018-01-15 17:25:58.655815447 +0900
+++ dnv-1.log   2018-01-15 17:26:12.429468878 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 3c000121 0140003f 003f 0001
 cpu0: 0005: 0040 0040 0003 2020
 cpu0: 0006: 0055 0002 0009 
-cpu0: 0007:  2294e283  
+cpu0: 0007:  2294e283  2c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300404   0603
@@ -44,6 +44,7 @@
 cpu0: features3 0x101
 cpu0: features5 0x2294e283
 cpu0: features5 0x2294e283
+cpu0: SEF edx 0x2c00
 cpu0: xsave features 0x1b
 cpu0: xsave instructions 0xf
 cpu0: xsave area size: current 576, maximum 1088, xgetbv enabled
@@ -59,4 +60,4 @@
 cpu0: DSPM-eax 0x55
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x16, platform ID 0
+cpu0: microcode version 0x20, platform

Re: amd64: svs

2018-01-15 Thread Masanobu SAITOH

On 2018/01/15 17:34, Masanobu SAITOH wrote:

On 2018/01/15 3:42, Dave Huang wrote:

On Jan 13, 2018, at 22:20, Emmanuel Dreyfus mailto:m...@netbsd.org>> wrote:


John Klos mailto:j...@ziaspace.com>> wrote:


FWIW, Intel updated their microcode on 8-Jan-2018:

https://downloadcenter.intel.com/download/27431/

I can't find any documentation which shows what exactly is fixed for each
family of processors.


It does not seems to fix much.

I tried the tests from git repository
http://github.com/IAIK/meltdown.git

This is Linux-centric, but at least, the "secret" case seems to apply on
NetBSD. It reports vulnerability before (cpu0: microcode version 0x58,
platform ID 1) and after (cpu0: microcode version 0x80, platform ID 1)
microcode upgrade.

And I do not have a spectre test case. Anyone has that?


My understanding is that the microcode update doesn't actually *fix*
anything by itself. What it does is enable an OS to more easily fix
things.

The kernel would need to check for the new SPEC_CTRL CPUID bit,
and if present, twiddle with the new speculation control-related MSRs
at the appropriate times.
--
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: k...@azeotrope.org <mailto:k...@azeotrope.org> |  they raise a paw / the 
bat, the cat /
Telegram: @DahanC        |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 42 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++



FYI:

Diffs of "cpuctl -v identify 0" before microcode update and after on
some my machines.

Haswell:

% diff -up o1.log o2.log
--- o1.log  2018-01-11 18:11:36.897983031 +0900
+++ o2.log  2018-01-11 18:11:44.782255154 +0900
@@ -6,7 +6,7 @@ cpu0: 0003:   00
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00042120
 cpu0: 0006: 0077 0002 0009 
-cpu0: 0007:  2fbb  
+cpu0: 0007:  27ab  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300403   0603
@@ -35,8 +35,9 @@ cpu0: features1 0x7ffafbff
 cpu0: features2 0x2c100800
 cpu0: features3 0x21
-cpu0: features5 0x2fbb
-cpu0: features5 0x2fbb
+cpu0: features5 0x27ab
+cpu0: features5 0x27ab
+cpu0: SEF edx 0xc00
 cpu0: xsave features 0x7
 cpu0: xsave instructions 0x1
 cpu0: xsave area size: current 832, maximum 832, xgetbv enabled
@@ -55,4 +56,4 @@ cpu0: SMT ID 0
 cpu0: DSPM-eax 0x77
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x17, platform ID 1
+cpu0: microcode version 0x23, platform ID 1


  Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Kaby Lake (XXX on xen dom0):

% diff -u xen-0.log xen-1.log
--- xen-0.log   2018-01-15 15:39:37.350775248 +0900
+++ xen-1.log   2018-01-15 15:39:46.135931681 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00142120
 cpu0: 0006: 27f7 0002 0009 
-cpu0: 0007:  029c6fbf  
+cpu0: 0007:  029c6fbf  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300804   0603


oops, the test binary was old.


@ -47,6 +47,7 @@
 cpu0: features5 0x29c6fbf
 cpu0: features5 0x29c6fbf
 cpu0: features5 0x29c6fbf
+cpu0: SEF edx 0xc00
 cpu0: xsave features 0x1f
 cpu0: xsave instructions 0xf
 cpu0: xsave area size: current 576, maximum 1088, xgetbv enabled



@@ -67,4 +67,4 @@
 cpu0: DSPM-eax 0x27f7
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x48, platform ID 1
+cpu0: microcode version 0x80, platform ID 1


  Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Denverton:

% diff -u dnv-0.log dnv-1.log
--- dnv-0.log   2018-01-15 17:25:58.655815447 +0900
+++ dnv-1.log   2018-01-15 17:26:12.429468878 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 3c000121 0140003f 003f 0001
 cpu0: 0005: 0040 0040 0003 2020
 cpu0: 0006: 0055 0002 0009 
-cpu0: 0007:  2294e283  
+cpu0: 0007:  2294e283  2c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300404   0603
@@ -44,6 +44,7 @@
 cpu0: features3 0x101
 cpu0: features5 0x2294e283
 cpu0: features5 0x2294e283
+cpu0: SEF edx 0x2c00
 cpu0: xsave features 0x1b
 cpu0: xsave instructions 0xf
 cpu0: xsave area size: current 576, maximum 1088, xgetbv enabled
@@ -59,4 +60,4 @@
 cpu0: DSPM-eax 0x55
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x16, platform ID 0
+cpu0: microcode version 0x20, platform ID 0


and with my local debug pritntf():

cpu0: MSR_IA32_ARCH_CAPAB

Re: amd64: svs

2018-01-15 Thread Masanobu SAITOH

On 2018/01/15 3:42, Dave Huang wrote:

On Jan 13, 2018, at 22:20, Emmanuel Dreyfus mailto:m...@netbsd.org>> wrote:


John Klos mailto:j...@ziaspace.com>> wrote:


FWIW, Intel updated their microcode on 8-Jan-2018:

https://downloadcenter.intel.com/download/27431/

I can't find any documentation which shows what exactly is fixed for each
family of processors.


It does not seems to fix much.

I tried the tests from git repository
http://github.com/IAIK/meltdown.git

This is Linux-centric, but at least, the "secret" case seems to apply on
NetBSD. It reports vulnerability before (cpu0: microcode version 0x58,
platform ID 1) and after (cpu0: microcode version 0x80, platform ID 1)
microcode upgrade.

And I do not have a spectre test case. Anyone has that?


My understanding is that the microcode update doesn't actually *fix*
anything by itself. What it does is enable an OS to more easily fix
things.

The kernel would need to check for the new SPEC_CTRL CPUID bit,
and if present, twiddle with the new speculation control-related MSRs
at the appropriate times.
--
Name: Dave Huang         |  Mammal, mammal / their names are called /
INet: k...@azeotrope.org  |  they raise a paw / the 
bat, the cat /
Telegram: @DahanC        |  dolphin and dog / koala bear and hog -- TMBG
Dahan: Hani G Y+C 42 Y++ L+++ W- C++ T++ A+ E+ S++ V++ F- Q+++ P+ B+ PA+ PL++



FYI:

Diffs of "cpuctl -v identify 0" before microcode update and after on
some my machines.

Haswell:

% diff -up o1.log o2.log
--- o1.log  2018-01-11 18:11:36.897983031 +0900
+++ o2.log  2018-01-11 18:11:44.782255154 +0900
@@ -6,7 +6,7 @@ cpu0: 0003:   00
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00042120
 cpu0: 0006: 0077 0002 0009 
-cpu0: 0007:  2fbb  
+cpu0: 0007:  27ab  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300403   0603
@@ -35,8 +35,9 @@ cpu0: features1 0x7ffafbff
 cpu0: features2 0x2c100800
 cpu0: features3 0x21
-cpu0: features5 0x2fbb
-cpu0: features5 0x2fbb
+cpu0: features5 0x27ab
+cpu0: features5 0x27ab
+cpu0: SEF edx 0xc00
 cpu0: xsave features 0x7
 cpu0: xsave instructions 0x1
 cpu0: xsave area size: current 832, maximum 832, xgetbv enabled
@@ -55,4 +56,4 @@ cpu0: SMT ID 0
 cpu0: DSPM-eax 0x77
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x17, platform ID 1
+cpu0: microcode version 0x23, platform ID 1


 Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Kaby Lake (XXX on xen dom0):

% diff -u xen-0.log xen-1.log
--- xen-0.log   2018-01-15 15:39:37.350775248 +0900
+++ xen-1.log   2018-01-15 15:39:46.135931681 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 1c004121 01c0003f 003f 
 cpu0: 0005: 0040 0040 0003 00142120
 cpu0: 0006: 27f7 0002 0009 
-cpu0: 0007:  029c6fbf  
+cpu0: 0007:  029c6fbf  0c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300804   0603
@@ -67,4 +67,4 @@
 cpu0: DSPM-eax 0x27f7
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x48, platform ID 1
+cpu0: microcode version 0x80, platform ID 1


 Only IBRS and STIBP. No IA32_ARCH_CAPABILITIES.

Denverton:

% diff -u dnv-0.log dnv-1.log
--- dnv-0.log   2018-01-15 17:25:58.655815447 +0900
+++ dnv-1.log   2018-01-15 17:26:12.429468878 +0900
@@ -6,7 +6,7 @@
 cpu0: 0004: 3c000121 0140003f 003f 0001
 cpu0: 0005: 0040 0040 0003 2020
 cpu0: 0006: 0055 0002 0009 
-cpu0: 0007:  2294e283  
+cpu0: 0007:  2294e283  2c00
 cpu0: 0008:    
 cpu0: 0009:    
 cpu0: 000a: 07300404   0603
@@ -44,6 +44,7 @@
 cpu0: features3 0x101
 cpu0: features5 0x2294e283
 cpu0: features5 0x2294e283
+cpu0: SEF edx 0x2c00
 cpu0: xsave features 0x1b
 cpu0: xsave instructions 0xf
 cpu0: xsave area size: current 576, maximum 1088, xgetbv enabled
@@ -59,4 +60,4 @@
 cpu0: DSPM-eax 0x55
 cpu0: DSPM-ecx 0x9
 cpu0: SEF highest subleaf 
-cpu0: microcode version 0x16, platform ID 0
+cpu0: microcode version 0x20, platform ID 0


and with my local debug pritntf():

cpu0: MSR_IA32_ARCH_CAPABILITIES=0x1

I've not written this bit by myself.

The bit 0 is RDCL_NO. 336996-Speculative-Execution-Side-Channel-Mitigations.pdf
says the processor is not susceptible to Rogue Data Cache Load if this
bit is set. So just updating microcode on some newer CPUs can
mitigate Rogue Data Cache Load, right?

--
---
SAITOH Mas

Re: xcall while cold == 1

2017-12-27 Thread Masanobu SAITOH

On 2017/12/27 1:37, Martin Husemann wrote:

On Tue, Dec 26, 2017 at 07:05:38PM +0900, Masanobu SAITOH wrote:

so checking mp_online is the best, right?


I wonder if we should additionaly check for ncpu >= 2 (or ncpuonline >=
2), but that is probably overoptimization.


I think doing optimization for single CPU machine with options MULTIPROCESSOR
kernel is not so important. It would rather do optimize for non-MULTIPROCESSOR
kernel.


Martin



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: xcall while cold == 1

2017-12-26 Thread Masanobu SAITOH

On 2017/12/26 11:59, Masanobu SAITOH wrote:

On 2017/12/25 20:26, Martin Husemann wrote:

On Mon, Dec 25, 2017 at 03:42:06PM +0900, Masanobu SAITOH wrote:

  Is this intended behavior? Is it possible to use xcall while
cold==1?


Cold is (as you noted) not the right condition (but pretty close).

Xcalls don't really make any sense before cpus have been spun up.
In your case it might be good to do the loop checking for SPCF_RUNNING
and if <= 1 is found, use the code path for single cpu systems (the
current else statatement).


In init_main.c::configure2():


    cold = 0;   /* clocks are running, we're warm now! */
    s = splsched();
    curcpu()->ci_schedstate.spc_flags |= SPCF_RUNNING;
    splx(s);

    /* Boot the secondary processors. */
    for (CPU_INFO_FOREACH(cii, ci)) {
    uvm_cpu_attach(ci);
    }
    mp_online = true;


so checking mp_online is the best, right?


Martin



Updated diff:
-
Index: kern_softint.c
===
RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
retrieving revision 1.44
diff -u -p -r1.44 kern_softint.c
--- kern_softint.c  22 Nov 2017 02:20:21 -  1.44
+++ kern_softint.c  26 Dec 2017 07:47:57 -
@@ -177,6 +177,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_softint
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -430,8 +431,10 @@ softint_disestablish(void *arg)
 * it again.  So, we are only looking for handler records with
 * SOFTINT_ACTIVE already set.
 */
-   where = xc_broadcast(0, (xcfunc_t)nullop, NULL, NULL);
-   xc_wait(where);
+   if (__predict_true(mp_online)) {
+   where = xc_broadcast(0, (xcfunc_t)nullop, NULL, NULL);
+   xc_wait(where);
+   }
 
 	for (;;) {

/* Collect flag values from each CPU. */
Index: subr_pserialize.c
===
RCS file: /cvsroot/src/sys/kern/subr_pserialize.c,v
retrieving revision 1.9
diff -u -p -r1.9 subr_pserialize.c
--- subr_pserialize.c   21 Nov 2017 08:49:14 -  1.9
+++ subr_pserialize.c   26 Dec 2017 07:47:57 -
@@ -157,6 +157,11 @@ pserialize_perform(pserialize_t psz)
KASSERT(psz->psz_owner == NULL);
KASSERT(ncpu > 0);
 
+	if (__predict_false(mp_online == false)) {

+   psz_ev_excl.ev_count++;
+   return;
+   }
+
/*
 * Set up the object and put it onto the queue.  The lock
 * activity here provides the necessary memory barrier to
Index: subr_psref.c
===
RCS file: /cvsroot/src/sys/kern/subr_psref.c,v
retrieving revision 1.9
diff -u -p -r1.9 subr_psref.c
--- subr_psref.c14 Dec 2017 05:45:55 -  1.9
+++ subr_psref.c26 Dec 2017 07:47:57 -
@@ -429,8 +429,14 @@ psreffed_p(struct psref_target *target,
.ret = false,
};
 
-	/* Ask all CPUs to say whether they hold a psref to the target.  */

-   xc_wait(xc_broadcast(0, &psreffed_p_xc, &P, NULL));
+   if (__predict_true(mp_online)) {
+   /*
+* Ask all CPUs to say whether they hold a psref to the
+* target.
+*/
+   xc_wait(xc_broadcast(0, &psreffed_p_xc, &P, NULL));
+   } else
+   psreffed_p_xc(&P, NULL);
 
 	return P.ret;

 }
-

 It might not be required to increment psz_ev_excl evcnt in 
softint_disestablish()
when mp_online == false.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: xcall while cold == 1

2017-12-25 Thread Masanobu SAITOH

On 2017/12/25 20:26, Martin Husemann wrote:

On Mon, Dec 25, 2017 at 03:42:06PM +0900, Masanobu SAITOH wrote:

  Is this intended behavior? Is it possible to use xcall while
cold==1?


Cold is (as you noted) not the right condition (but pretty close).

Xcalls don't really make any sense before cpus have been spun up.
In your case it might be good to do the loop checking for SPCF_RUNNING
and if <= 1 is found, use the code path for single cpu systems (the
current else statatement).


In init_main.c::configure2():


cold = 0;   /* clocks are running, we're warm now! */
s = splsched();
curcpu()->ci_schedstate.spc_flags |= SPCF_RUNNING;
splx(s);

/* Boot the secondary processors. */
for (CPU_INFO_FOREACH(cii, ci)) {
uvm_cpu_attach(ci);
}
mp_online = true;


so checking mp_online is the best, right?


Martin




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


xcall while cold == 1

2017-12-24 Thread Masanobu SAITOH

 Hi.

 While debugging PR#52820 ("boot -1" panics on systems with ixgX interfaces),
I've noticed that xcall doesn't work while cold == 1.

 When I added softint_disestablish() near the end of the ixgbe_attach().
The following panic occured:


panic: kernel diagnostic assertion "xc->xc_donep < xc->xc_headp" failed: file 
"../../../../kern/subr_xcall.c", line 278


This KASSERT is:


static inline uint64_t
xc_lowpri(xcfunc_t func, void *arg1, void *arg2, struct cpu_info *ci)
{
xc_state_t *xc = &xc_low_pri;
CPU_INFO_ITERATOR cii;
uint64_t where;

mutex_enter(&xc->xc_lock);
while (xc->xc_headp != xc->xc_donep) {
cv_wait(&xc->xc_busy, &xc->xc_lock);
}
xc->xc_arg1 = arg1;
xc->xc_arg2 = arg2;
xc->xc_func = func;
if (ci == NULL) {
xc_broadcast_ev.ev_count++;
for (CPU_INFO_FOREACH(cii, ci)) {
if ((ci->ci_schedstate.spc_flags & SPCF_RUNNING) == 0)
continue;
xc->xc_headp += 1;
ci->ci_data.cpu_xcall_pending = true;
cv_signal(&ci->ci_data.cpu_xcall);
}
} else {
xc_unicast_ev.ev_count++;
xc->xc_headp += 1;
ci->ci_data.cpu_xcall_pending = true;
cv_signal(&ci->ci_data.cpu_xcall);
}
KASSERT(xc->xc_donep < xc->xc_headp);<= Here!
where = xc->xc_headp;
mutex_exit(&xc->xc_lock);

/* Return a low priority ticket. */
KASSERT((where & XC_PRI_BIT) == 0);
return where;
}


So I added the following debug printf:
-
@@ -252,6 +253,7 @@ xc_lowpri(xcfunc_t func, void *arg1, voi
xc_state_t *xc = &xc_low_pri;
CPU_INFO_ITERATOR cii;
uint64_t where;
+   int i = -1;

mutex_enter(&xc->xc_lock);
while (xc->xc_headp != xc->xc_donep) {
@@ -263,8 +265,11 @@ xc_lowpri(xcfunc_t func, void *arg1, voi
if (ci == NULL) {
xc_broadcast_ev.ev_count++;
for (CPU_INFO_FOREACH(cii, ci)) {
-   if ((ci->ci_schedstate.spc_flags & SPCF_RUNNING) == 0)
+   i++;
+   if ((ci->ci_schedstate.spc_flags & SPCF_RUNNING) == 0) {
+   printf("cpu %d: XXX not running\n", i);
continue;
+   }
xc->xc_headp += 1;
ci->ci_data.cpu_xcall_pending = true;
cv_signal(&ci->ci_data.cpu_xcall);
@@ -275,6 +280,9 @@ xc_lowpri(xcfunc_t func, void *arg1, voi
ci->ci_data.cpu_xcall_pending = true;
cv_signal(&ci->ci_data.cpu_xcall);
}
+   if (xc->xc_donep >= xc->xc_headp)
+   printf("XXX donep = %" PRIu64 ", headp = %" PRIu64
+   ", ci = %p\n", xc->xc_donep, xc->xc_headp, ci);
KASSERT(xc->xc_donep < xc->xc_headp);
where = xc->xc_headp;
mutex_exit(&xc->xc_lock);
-

The output says

cpu 0: XXX not running
cpu 1: XXX not running
cpu 2: XXX not running
cpu 3: XXX not running
XXX donep = 0, headp = 0, ci = 0x0
panic: kernel diagnostic assertion "xc->xc_donep < xc->xc_headp" failed: file 
"../../../../kern/subr_xcall.c", line 286


(Yes, the exact reason is not cold==1 but all CPU's SPCF_RUNNING is not set)


 Is this intended behavior? Is it possible to use xcall while
cold==1?

For softint_establish(), the following diff avoid panic:

Index: kern_softint.c
===
RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
retrieving revision 1.44
diff -u -p -r1.44 kern_softint.c
--- kern_softint.c  22 Nov 2017 02:20:21 -  1.44
+++ kern_softint.c  25 Dec 2017 06:31:57 -
@@ -177,6 +177,7 @@ __KERNEL_RCSID(0, "$NetBSD: kern_softint
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -430,8 +431,10 @@ softint_disestablish(void *arg)
 * it again.  So, we are only looking for handler records with
 * SOFTINT_ACTIVE already set.
 */
-   where = xc_broadcast(0, (xcfunc_t)nullop, NULL, NULL);
-   xc_wait(where);
+   if (__predict_true(cold == 0)) {
+   where = xc_broadcast(0, (xcfunc_t)nullop, NULL, NULL);
+   xc_wait(where);
+   }

for (;;) {
/* Collect flag values from each CPU. */


 I don't know whether this is good fix or not.

 One of the workaround is not call softint_disesablish() in
xxx_attach() and defer it using with config_interrupts(), but
I think it's dirty.

 I'm now writing a code for ixg(4) to fallback from MS

Re: increase softint_bytes

2017-11-22 Thread Masanobu SAITOH

On 2017/11/21 15:35, Masanobu SAITOH wrote:

On 2017/11/20 17:28, Masanobu SAITOH wrote:

On 2017/11/17 18:42, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Thu, 16 Nov 2017, Masanobu SAITOH wrote:


Hi, all.

Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif



I tested the patch. Now the dump comes in another place.

https://suse.uni-leipzig.de/crash/crash-with-patch.jpg

Regards
Uwe


Could you test the following patch?

 http://www.netbsd.org/~msaitoh/vlan-20171120-0.dif


Updated patch

 http://www.netbsd.org/~msaitoh/vlan-20171121-0.dif

 Fix compile error (sorry)

 Revert if_wmreg.h 1.104 and if_wm.c 1.542


 Committed in -current.


--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-21 Thread Masanobu SAITOH

On 2017/11/21 17:03, matthew green wrote:

   It'll take a little time to write this change. And, it's low level
and important code, so it will take a time to test the stability
before sending pullup request. So,

   0) Apply the following change to -current.


Index: kern_softint.c
===
RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
retrieving revision 1.43
diff -u -p -r1.43 kern_softint.c
--- kern_softint.c  4 Jul 2016 04:20:14 -   1.43
+++ kern_softint.c  21 Nov 2017 06:41:35 -
@@ -217,7 +217,7 @@ typedef struct softcpu {
   
   static void	softint_thread(void *);
   
-u_int		softint_bytes = 8192;

+u_int  softint_bytes = 16384;
   u_intsoftint_timing;
   static u_int softint_max;
   static kmutex_t  softint_lock;


   1) Sent the pullup request to netbsd-8

   2) Write auto-resize code and commit.

   3) If it's stable, send the pullup request to netbsd-8.


sounds good to me.  i would even be ok with a switch
to 16k or even 32k (platform basis?)) for netbsd-8 or


 Committed with 32K.

 Thanks!



earlier right now and allow the resize code to live in
-current until -9.

thanks!


.mrg.




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: increase softint_bytes

2017-11-20 Thread Masanobu SAITOH

On 2017/11/20 17:28, Masanobu SAITOH wrote:

On 2017/11/17 18:42, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Thu, 16 Nov 2017, Masanobu SAITOH wrote:


Hi, all.

Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif



I tested the patch. Now the dump comes in another place.

https://suse.uni-leipzig.de/crash/crash-with-patch.jpg

Regards
Uwe


Could you test the following patch?

 http://www.netbsd.org/~msaitoh/vlan-20171120-0.dif


Updated patch

http://www.netbsd.org/~msaitoh/vlan-20171121-0.dif

Fix compile error (sorry)

Revert if_wmreg.h 1.104 and if_wm.c 1.542

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-20 Thread Masanobu SAITOH



can't this be fixed by making it dynamic?


  It's not easy because the return value of softint_establish() is
made from this area's address. As you know, the value is keep by each driver.


  I'm sorry. I misread kern_softint.c. The return value is not directly point to
the area, but it's offset of the area, so it would be easy to resize it.

  I'll try to modify the code to do auto-resize.


 It'll take a little time to write this change. And, it's low level
and important code, so it will take a time to test the stability
before sending pullup request. So,

 0) Apply the following change to -current.


Index: kern_softint.c
===
RCS file: /cvsroot/src/sys/kern/kern_softint.c,v
retrieving revision 1.43
diff -u -p -r1.43 kern_softint.c
--- kern_softint.c  4 Jul 2016 04:20:14 -   1.43
+++ kern_softint.c  21 Nov 2017 06:41:35 -
@@ -217,7 +217,7 @@ typedef struct softcpu {
 
 static void	softint_thread(void *);
 
-u_int		softint_bytes = 8192;

+u_int  softint_bytes = 16384;
 u_int  softint_timing;
 static u_int   softint_max;
 static kmutex_tsoftint_lock;


 1) Sent the pullup request to netbsd-8

 2) Write auto-resize code and commit.

 3) If it's stable, send the pullup request to netbsd-8.


 OK?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-20 Thread Masanobu SAITOH

On 2017/11/17 18:42, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Thu, 16 Nov 2017, Masanobu SAITOH wrote:


Hi, all.

Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif



I tested the patch. Now the dump comes in another place.

https://suse.uni-leipzig.de/crash/crash-with-patch.jpg

Regards
Uwe


Could you test the following patch?

http://www.netbsd.org/~msaitoh/vlan-20171120-0.dif


--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-17 Thread Masanobu SAITOH

Hi, all.

On 2017/11/17 15:35, Masanobu SAITOH wrote:

Hi, mrg.

On 2017/11/17 5:05, matthew green wrote:

Masanobu SAITOH writes:

   Hi, all.

   Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif

Summary:

- Increase the default size from 8192 bytes to 32768 bytes.
- Add new option SOFTINT_BYTES to change the value.
- Add two new read-only sysctls kern.softint.{max,count}

Any comment?


can't this be fixed by making it dynamic?


  It's not easy because the return value of softint_establish() is
made from this area's address. As you know, the value is keep by each driver.


 I'm sorry. I misread kern_softint.c. The return value is not directly point to
the area, but it's offset of the area, so it would be easy to resize it.

 I'll try to modify the code to do auto-resize.






 ie, fix it properly so
that it never has this problem again -- if in 10 years we're adding
5x as many softints as today, we'll hit it again.

i also don't see why the crash can't be fixed to become an error.


The code is:

ip_init(void)
{
    const struct protosw *pr;

    in_init();
    sysctl_net_inet_ip_setup(NULL);

    pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
    KASSERT(pr != NULL);

    ip_pktq = pktq_create(IFQ_MAXLEN, ipintr, NULL);
    KASSERT(ip_pktq != NULL); <== This.






the code tries to avoid adding more than it can, but soemthing must
be wrong for the crash to occur anyway.


  I think this KASSERT() is reasonable.


  ie, at worst we'd fix it
to return an error instead of crashing.

if that is done, the sysctl exports seem unneccesary as well.


bridge, agr, strip, sl, stf, ppp, l2tp, gre and maybe some other
pseudo interfaces which is created by ifconfig command allocate
softint. Some of them may be created tens, hundreds or more.
So I think it's worth to check the current number.




thanks.


.mrg.







--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: increase softint_bytes

2017-11-16 Thread Masanobu SAITOH

On 2017/11/17 6:25, Jaromír Doleček wrote:

softint_establish() already fails if there is no space, and prints a WARNING in 
this case. The caller however needs to check for the failure, which ixgbe(4) 
mostly doesn't.


No. ixgbe() checks the return value of if_initialize() and return
correctly. That panic is not in ixgbe(4) driver but in ip_input.c::ip_init().


 Not sure if that is really the root cause, the panic seems to be actually in 
different area, and I didn't actually see the warning in the dmesg.


Making softint size fully dynamic is unfortunately not straighforward, it would 
need to reallocate the softint memory on all CPUs in a way that any serviced 
softints would continue working during this time.


 Each driver which call softint_establish() keep the return value, so
it's very hard without big modification.


It would be simple however to adjust it based on % of physical memory, instead 
of fixed value.


It might be good, but it's only small area, so fixed value might be OK.
And not so many people create a lot of pseudo interface, so I think it's
OK to make the max value as a kernel option to increase.



If I count correctly, with current 8192 bytes the system supports some 100 
softints, which seems to be on quite low side - more advanced hardware and 
drivers usually use queue and softint per cpu, so it can quickly run out on 
system with e.g. 32 cores.

I agree that the sysctl seems somewhat unnecessary, since the limit is only 
relevant during boot anyway,


As I wrote, it's incorrect because ifconfig xxxN create allocates softint.



so it's not very helpful.
Jaromir

2017-11-16 21:05 GMT+01:00 matthew green mailto:m...@eterna.com.au>>:

Masanobu SAITOH writes:
>   Hi, all.
>
>   Some device drivers now allocate a lot of softints.
> See:
>
>   http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html 
<http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html>
>
> To avoid this panic, I wrote the following patch:
>
>   http://www.netbsd.org/~msaitoh/softint-20171116-0.dif 
<http://www.netbsd.org/~msaitoh/softint-20171116-0.dif>
>
> Summary:
>
>       - Increase the default size from 8192 bytes to 32768 bytes.
>       - Add new option SOFTINT_BYTES to change the value.
>       - Add two new read-only sysctls kern.softint.{max,count}
>
> Any comment?

can't this be fixed by making it dynamic?  ie, fix it properly so
that it never has this problem again -- if in 10 years we're adding
5x as many softints as today, we'll hit it again.

i also don't see why the crash can't be fixed to become an error.
the code tries to avoid adding more than it can, but soemthing must
be wrong for the crash to occur anyway.  ie, at worst we'd fix it
to return an error instead of crashing.

if that is done, the sysctl exports seem unneccesary as well.

thanks.


.mrg.





--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: increase softint_bytes

2017-11-16 Thread Masanobu SAITOH

Hi, mrg.

On 2017/11/17 5:05, matthew green wrote:

Masanobu SAITOH writes:

   Hi, all.

   Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif

Summary:

- Increase the default size from 8192 bytes to 32768 bytes.
- Add new option SOFTINT_BYTES to change the value.
- Add two new read-only sysctls kern.softint.{max,count}

Any comment?


can't this be fixed by making it dynamic?


 It's not easy because the return value of softint_establish() is
made from this area's address. As you know, the value is keep by each driver.


 ie, fix it properly so
that it never has this problem again -- if in 10 years we're adding
5x as many softints as today, we'll hit it again.

i also don't see why the crash can't be fixed to become an error.


The code is:

ip_init(void)
{
const struct protosw *pr;

in_init();
sysctl_net_inet_ip_setup(NULL);

pr = pffindproto(PF_INET, IPPROTO_RAW, SOCK_RAW);
KASSERT(pr != NULL);

ip_pktq = pktq_create(IFQ_MAXLEN, ipintr, NULL);
KASSERT(ip_pktq != NULL); <== This.






the code tries to avoid adding more than it can, but soemthing must
be wrong for the crash to occur anyway.


 I think this KASSERT() is reasonable.


  ie, at worst we'd fix it
to return an error instead of crashing.

if that is done, the sysctl exports seem unneccesary as well.


bridge, agr, strip, sl, stf, ppp, l2tp, gre and maybe some other
pseudo interfaces which is created by ifconfig command allocate
softint. Some of them may be created tens, hundreds or more.
So I think it's worth to check the current number.




thanks.


.mrg.




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


increase softint_bytes

2017-11-16 Thread Masanobu SAITOH

 Hi, all.

 Some device drivers now allocate a lot of softints.
See:

http://mail-index.netbsd.org/current-users/2017/11/09/msg032581.html

To avoid this panic, I wrote the following patch:

http://www.netbsd.org/~msaitoh/softint-20171116-0.dif

Summary:

- Increase the default size from 8192 bytes to 32768 bytes.
- Add new option SOFTINT_BYTES to change the value.
- Add two new read-only sysctls kern.softint.{max,count}

Any comment?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Kernel code documentation and pci data structures (was: Problem with Intel WiFi card)

2017-05-22 Thread Masanobu SAITOH

Hi, Rocky.

On 2017/05/18 18:21, Rocky Hotas wrote:

Hi Masanobu and hi all!
After the discussion on this thread,

http://mail-index.netbsd.org/netbsd-users/2017/02/01/msg019280.html

I'm trying to apply this patch

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/ppb.c.diff?r1=1.35&r2=1.36

to NetBSD. The current version of OpenBSD kernel ppb.c file (which
works in my laptop, where I tried to temporarily install OpenBSD)
has some more lines and is based on the rev. 1.40.
I write to you because you made much of the last commits on the
NetBSD ppb.c sourcefile;


 It's one of the thing that I want to have it in our ppb.c.
I'd like to add PCI(e) hot-plug stuff because of nvme(4)
hot-plug. Reasons why I've not doing the work yet are that
I'm now working another job(fixing bugs in ixg(4) and wm(4))
and I'm not familiar with resource allocation near here :-(


I  write also to the list, to let the
community know these updates.
OpenBSD code is based upon the following members of the
`struct pci_attach_args':

struct extent   *pa_ioex;
struct extent   *pa_memex;

They must be added to the NetBSD code, and then correctly handled.
The extent(9) manpages both in NetBSD and in OpenBSDare both too
much concise. These pages

https://www.netbsd.org/docs/internals/en/
https://www.netbsd.org/docs/kernel/

don't mention nor pci neither extents.
So, a couple of questions:

1) What are the extent(9) used for? They were born in NetBSD (as it
is even stated in the OpenBSD manpage). After a quick discussion on
#netbsd-code with coypu, they seem to be related to address spaces.
If there is some guide like this one

http://www.unixag-kl.fh-kl.de/~jkunz/projekte/NetBSD-driver_writing-1.0.1e.pdf.gz

but for extents(9) and memory, please let me know.

2) About these lines (that are both in NetBSD and in OpenBSD):

static int
ppbmatch(device_t parent, cfdata_t match, void *aux)
{
struct pci_attach_args *pa = aux;


driver(9) explicitly states that "The match function would
type-cast the aux argument to its appropriate attachment structure and
use its contents to determine whether it supports the device". But who
initially fills the `aux' data? Whoever he is, he should then take
care about some new members, like `pa_ioex' and `pa_memex'.

Thank you for having read,

Rocky


 To configure ppb's address decoding, some of the code in
sys/dev/pci/pciconf.c can be used. Basically, to use pciconf.c,
"options PCI_NETBSD_CONFIGURE" is required and it's used for
machines that IPL or boot loader of embedded machine doesn't
initialize PCI bus stuff. Perhaps PCI_NETBSD_CONFIGURE is not
intended to use on x86, so some function might be moved outside
of #ifdef PCI_NETBSD_CONFIGURE. I think it's worth for you to
read sys/dev/pci/pciconf.c

 BTW,

http://mail-index.netbsd.org/netbsd-users/2017/02/01/msg019280.html


pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
iwn0 at pci2 dev 0 function 0: vendor 0x8086 product 0x4232 (rev. 0x00)
: can't map mem space
ppb2 at pci0 dev 28 function 1: vendor 0x8086 product 0x2942 (rev. 0x03)
ppb2: PCI Express capability version 1  x1 @ 
2.5GT/s



Did you check the PCI config area of ppb"1" on your machine?
iwn0 is under ppb1.

Could you show me the output of "pcictl pciN dump" of your ppb1?

And, If you are OK, could you me the full output of pcictl
using with the following script?

http://www.netbsd.org/~msaitoh/pcidump

(Note that the output is huge, so please don't send the output to
the ML. Could you put it somewhere and give us the link?)

 Regards.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PCI BAR's prefetchable bit and pci_mapreg_map()

2017-03-28 Thread Masanobu SAITOH

On 2017/03/24 8:13, Michael wrote:

Hello,

On Wed, 22 Mar 2017 16:35:21 +0900
Masanobu SAITOH  wrote:


  A) modify pci_mapreg_map(). Stop setting BUS_SPACE_MAP_PREFETCHABLE
 by defalut when prefetchable bit is set. If a driver really know
 the whole area of the BAR is prefetchable, set BUS_SPACE_MAP_PREFETCHABLE
 in the 4th argument(busflags) of pci_mapreg_map(). pci_mapreg_map()
 check for both the 4th argument and the prefetchable bit, it sets
 BUS_SPACE_MAP_PREFETCHABLE only when both bits are set.


On 2nd thought, I think we should ignore the prefetchable bit in the
BAR here


 in pci_mapreg_map()?


and just go with whatever the driver wants,


 You mean:

For pci_mapreg_map():

Don't check prefetchable bit and don't set BUS_SPACE_MAP_PREFETCHABLE
in pci_mapreg_map() like OpenBSD.

For each driver side:

Don't use pci_mapreg_map() and use pci_mapreg_info() and
bus_space_map() to use an area with BUS_SPACE_MAP_PREFETCHABLE
(Plan D in my original mail)

Right? Let call this plan E.

The plan E is OK, but it makes code duplication and the come won't
be simple. To make code simple,


 A) modify pci_mapreg_map(). Stop setting BUS_SPACE_MAP_PREFETCHABLE
by defalut when prefetchable bit is set. If a driver really know
the whole area of the BAR is prefetchable, set BUS_SPACE_MAP_PREFETCHABLE
in the 4th argument(busflags) of pci_mapreg_map(). pci_mapreg_map()
check for both the 4th argument and the prefetchable bit, it sets
BUS_SPACE_MAP_PREFETCHABLE only when both bits are set.

 B) Don't modify pci_mapreg_ma(). Add new API like FreeBSD's pmap_change_attr()
and use it when a driver want.

 C) make new function pci_mapreg_map_wc()

 D) Keep. If a driver know a BAR has prefetchable bit and not the
whole area is prefetchable, do pci_mem_find() -> drop 
BUS_SPACE_MAP_PREFETCHABLE
and do bus_space_map().


   E) Don't check prefetchable bit and don't set BUS_SPACE_MAP_PREFETCHABLE
  in any case in pci_mapreg_map() like OpenBSD.
  Don't use pci_mapreg_map() and use pci_mapreg_info() and
  bus_space_map() if a driver want to use an area with
  BUS_SPACE_MAP_PREFETCHABLE.
 
   F) Any other solution(s).


Plan A, B and C are useful than D and E to make code simple.


since you ran into
devices which have the bit set and shouldn't, and I have seen graphics
chips which don't set it for their video memory apertures.


Even if we choose A or E, we have to check all driver
which already use pci_mapreg_map() and modify some of
them.

For A:
- BAR has no prefetchable bit:
not required to modify
- BAR has prefetchable bit and BUS_SPACE_MAP_PREFETCHABLE can't be used:
not required to modify
- really prefetchable:
set BUS_SPACE_MAP_PREFETCHABLE in the 4th argument

For D:
- BAR has no prefetchable bit:
not required to modify
- BAR has prefetchable bit and BUS_SPACE_MAP_PREFETCHABLE can't be used:
not required to modify
- really prefetchable:
Extract it with pci_mapreg_info() and bus_space_map()



have fun
Michael




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


PCI BAR's prefetchable bit and pci_mapreg_map()

2017-03-22 Thread Masanobu SAITOH

 Hi.

 As some people know, our pci(9) had a problem with devices that
a BAR has prefetchable bit. It makes unexpected behavior on read/
write. To avoid this problem some device drivers have common hack.
Instead of using pci_mapreg_map(), it uses pci_mapreg_info() and
pass the result to bus_space_map() without BUS_SPACE_MAPPREFETCHABLE.
You can see it in:
pci/if_fxp_pci.c
pci/if_bge.c
pci/ixgbe.c
pci/ixv.c
x86/pci/msipic.c

Example in if_bge.c:

http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/if_bge.c.diff?r1=1.272&r2=1.275&f=h

 I found the cause in PCIe's specification. It's required to pay
money to get the specification, but it's not required for ECN and
Errata. You can see the information about prefetchable bit in
"Errata for the PCI Express(R) Base Specification Revision 2.1"
(November 18, 2010):


https://pcisig.com/specifications/pciexpress/specifications?field_technology_value%5B%5D=express&field_document_type_value%5B%5D=errata&speclib=

See

1.3.2.2 PCI Express Endpoint Rules

7.5.2.1 Base Address Register (offset 20h - 24h)

IMPLEMENTATION NOTE in 7.5.2.1 "Additional Guidance on the Prefetchable
Bit in Memory Space BARs" (< very important)

It says that the prefetchable bit can be set even if the range
include read side effect or effect of write combine...

 Our pci_map.c::pci_mapreg_map() first does pci_mem_find() and
the result is used for does bus_space_map() with BUS_SPACE_MAP_PREFETCHABLE
if a BAR has prefetchable bit. For x86, it result in adding
PMAP_WRITE_COMBINE in the map. cardbus_mem_find() does the same thing.

On other OSes:

 Linux:
It has ioremap() and ioremap_wc(). ioremap_wc() is used
in a driver only if the driver know the whole area
specified in a BAR is really prefetchable. ioremap() doesn't
set the area prefetchable.

 FreeBSD:
 After mapping a area with bus_alloc_resources_any(), if
the driver know the whole area is really prefetchable,
pmap_change_attr() is called with PAT_WRITE_COMBINE.

 OpenBSD:
 It ignores prefetchable bit in pci_mapreg_map()

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/dev/pci/pci_map.c.diff?r1=1.28&r2=1.29&f=h


 So what should we do?

 A) modify pci_mapreg_map(). Stop setting BUS_SPACE_MAP_PREFETCHABLE
by defalut when prefetchable bit is set. If a driver really know
the whole area of the BAR is prefetchable, set BUS_SPACE_MAP_PREFETCHABLE
in the 4th argument(busflags) of pci_mapreg_map(). pci_mapreg_map()
check for both the 4th argument and the prefetchable bit, it sets
BUS_SPACE_MAP_PREFETCHABLE only when both bits are set.

 B) Don't modify pci_mapreg_ma(). Add new API like FreeBSD's pmap_change_attr()
and use it when a driver want.

 C) make new function pci_mapreg_map_wc()

 D) Keep. If a driver know a BAR has prefetchable bit and not the
whole area is prefetchable, do pci_mem_find() -> drop 
BUS_SPACE_MAP_PREFETCHABLE
and do bus_space_map().

 E) Any other solution(s).

 Usually, framebuffer of video really uses prefetchable access.
Not so many driver use pci_mapreg_map() and use pci_mem_find() and
bus_space_map(), so it's not difficult to change them.

I prefer A.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Changing the return value of xxx_attach() from void to int. (take 2)

2016-07-20 Thread Masanobu SAITOH

 Hello, all.

 Thanks all people who replied to my proposal. One of my fault is that
I didn't show any example code. I made patches. As jnemeth@ said, my
change is straightforward and the change should not cause any catastrophic
problem.



 In my last mail,


c) We can add the code to fallback to lower priority device driver
  like ukphy(4), ugen(4) or others.


 This change is not easy (though it also not difficult.), so in this change
doesn't include it. The fallback code will be a future plan.


Current patches:

0) Caller side (sys/device.h, kern/subr_autoconf.c and config(1))

http://www.netbsd.org/~msaitoh/attach-caller-20160720.dif

1) Device drivers

http://www.netbsd.org/~msaitoh/attach-ddrv-20160720.dif

2) detach related changes of wm(4) for an example

http://www.netbsd.org/~msaitoh/attach-wm-20160720.dif

 "./build.sh release" finishes without any error on x86 and the kernel works.

Details:


===
RCS file: /cvsroot/src/sys/sys/device.h,v
retrieving revision 1.149
diff -u -p -r1.149 device.h
--- sys/sys/device.h19 Jun 2016 09:35:06 -  1.149
+++ sys/sys/device.h20 Jul 2016 05:13:49 -
@@ -313,7 +313,7 @@ struct cfattach {
size_tca_devsize;   /* size of dev data (for alloc) */
int   ca_flags; /* flags for driver allocation etc */
int (*ca_match)(device_t, cfdata_t, void *);
-   void(*ca_attach)(device_t, device_t, void *);
+   int (*ca_attach)(device_t, device_t, void *);
int (*ca_detach)(device_t, int);
int (*ca_activate)(device_t, devact_t);
/* technically, the next 2 belong into "struct cfdriver" */
@@ -401,7 +401,7 @@ typedef int (*cfprint_t)(void *, const c
  * Pseudo-device attach information (function + number of pseudo-devs).
  */
 struct pdevinit {
-   void(*pdev_attach)(int);
+   int (*pdev_attach)(int);
int pdev_count;
 };



 It changes the return value from void to int for both nomal and pseudo
device driver.


Index: sys/kern/subr_autoconf.c
===
RCS file: /cvsroot/src/sys/kern/subr_autoconf.c,v
retrieving revision 1.247
diff -u -p -r1.247 subr_autoconf.c
--- sys/kern/subr_autoconf.c19 Jul 2016 07:44:03 -  1.247
+++ sys/kern/subr_autoconf.c20 Jul 2016 05:13:48 -
@@ -1545,6 +1545,7 @@ config_attach_loc(device_t parent, cfdat
device_t dev;
struct cftable *ct;
const char *drvname;
+   int rv;

dev = config_devalloc(parent, cf, locs);
if (!dev)
@@ -1595,10 +1596,15 @@ config_attach_loc(device_t parent, cfdat
}
device_register(dev, aux);

-   /* Let userland know */
-   devmon_report_device(dev, true);
-
-   (*dev->dv_cfattach->ca_attach)(parent, dev, aux);
+   rv = (*dev->dv_cfattach->ca_attach)(parent, dev, aux);
+   if (rv == 0) {
+   /* Let userland know */
+   devmon_report_device(dev, true);
+   } else {
+   aprint_error("%s: %s: ERROR: failed in attach (rv = %d)\n",
+   __func__, device_xname(dev), rv);
+   dev->dv_flags &= ~DVF_ACTIVE;
+   }

if (!device_pmf_is_registered(dev))
aprint_debug_dev(dev, "WARNING: power management not "


 This part is the main change. We can get three benefit from this change.

1) We can call devmon_report_device() only when an ca_attach() call
   succeeded.

2) We can show a common error message when ca_attach() failed.
   This change has some benefits.

Some drivers don't call aprint_error*() and just return.
We couldn't know fails from such type of drivers. With
this change, we can know it for sure.

Some drivers' error messages aren't clear. It's difficult
for us to know that a ca_attach() call failed in
(the middle of) the attach function. With this change,
we can know it for sure.

mailaprint_error*() increments aprint_error_count. It's
the only difference between aprint_normal*() and
aprint_error*(). We couldn't know what driver's attach
function failed from the caller side. And, some drivers'
attach function call aprint_error*() more than one time,
so the value of aprint_error_count is not the same as
the number of ca_attach()'s fail. With this change,
we can know what driver's ca_attach() failed. And,
if we add a new counter like aprint_error_count, we
can know the exact number of ca_attach()'s fail. It
would reduce the existence value of aprint_error*().
Note that small number of device driver call
 

Re: printing aprint_error_count

2016-07-19 Thread Masanobu SAITOH

On 2016/07/15 18:52, Martin Husemann wrote:

On Fri, Jul 15, 2016 at 06:30:23PM +0900, Masanobu SAITOH wrote:

 Or enclose with #ifdef DIAGNOSTIC?


Assuming we do not use aprint_error when there is no "error",


Some drivers use aprint_error() to warn a problem and continue
attaching.


hiding it for
non-DIAGNOSTIC kernels does not make sense to me (at least in the !silent
or !quiet case).


Me neither though some aprint_error() might make user nervous.


I wonder if we should make the error count available via a read-only sysctl,
so userland could act on it later (especially in quiet or silent boots).


 It would be good, but if we change the return value of the
attach function from void to int(or others), it will become
less required.

 I'll commit without #ifdef DIAGNOSTIC.


Martin




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: printing aprint_error_count

2016-07-15 Thread Masanobu SAITOH

On 2016/07/15 16:45, Masanobu SAITOH wrote:

Hi.

 aprint_error*() increments aprint_error_count.
At the end of config_finalize():


KERNEL_UNLOCK_ONE(NULL);

errcnt = aprint_get_error_count();
if ((boothowto & (AB_QUIET|AB_SILENT)) != 0 &&
(boothowto & AB_VERBOSE) == 0) {
mutex_enter(&config_misc_lock);
if (config_do_twiddle) {
config_do_twiddle = 0;
printf_nolog(" done.\n");
}
mutex_exit(&config_misc_lock);
if (errcnt != 0) {
printf("WARNING: %d error%s while detecting hardware; "
"check system log.\n", errcnt,
errcnt == 1 ? "" : "s");
}
}
}


I think it's worth to print this message even if AB_QUIET
and AB_SILENT are not set.

Index: subr_autoconf.c
===
RCS file: /cvsroot/src/sys/kern/subr_autoconf.c,v
retrieving revision 1.243
diff -u -p -r1.243 subr_autoconf.c
--- subr_autoconf.c11 Jul 2016 07:42:13 -1.243
+++ subr_autoconf.c15 Jul 2016 06:43:39 -
@@ -2182,11 +2182,11 @@ config_finalize(void)
 printf_nolog(" done.\n");
 }
 mutex_exit(&config_misc_lock);
-if (errcnt != 0) {
-printf("WARNING: %d error%s while detecting hardware; "
-"check system log.\n", errcnt,
-errcnt == 1 ? "" : "s");
-}
+}
+if (errcnt != 0) {
+printf("WARNING: %d error%s while detecting hardware; "
+"check system log.\n", errcnt,
+errcnt == 1 ? "" : "s");
 }
 }

I think it's not required to check AB_VERBOSE


 Or enclose with #ifdef DIAGNOSTIC?


because
not so many people set this flag and it makes difficult
to notice problem(s). Is it OK to commit?

 On my machines ichpcib's TCO timer and acpiec count error.


acpiec0 at acpi0 (H_EC, PNP0C09-1)
acpiec0: unable to evaluate _GPE: AE_NOT_FOUND

(snip)

tco0 at ichlpcib0: TCO (watchdog) timer configured.
tco0: TCO timer reboot disabled by hardware; hope SMBIOS properly handles it.






--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



printing aprint_error_count

2016-07-15 Thread Masanobu SAITOH

Hi.

 aprint_error*() increments aprint_error_count.
At the end of config_finalize():


KERNEL_UNLOCK_ONE(NULL);

errcnt = aprint_get_error_count();
if ((boothowto & (AB_QUIET|AB_SILENT)) != 0 &&
(boothowto & AB_VERBOSE) == 0) {
mutex_enter(&config_misc_lock);
if (config_do_twiddle) {
config_do_twiddle = 0;
printf_nolog(" done.\n");
}
mutex_exit(&config_misc_lock);
if (errcnt != 0) {
printf("WARNING: %d error%s while detecting hardware; "
"check system log.\n", errcnt,
errcnt == 1 ? "" : "s");
}
}
}


I think it's worth to print this message even if AB_QUIET
and AB_SILENT are not set.

Index: subr_autoconf.c
===
RCS file: /cvsroot/src/sys/kern/subr_autoconf.c,v
retrieving revision 1.243
diff -u -p -r1.243 subr_autoconf.c
--- subr_autoconf.c 11 Jul 2016 07:42:13 -  1.243
+++ subr_autoconf.c 15 Jul 2016 06:43:39 -
@@ -2182,11 +2182,11 @@ config_finalize(void)
printf_nolog(" done.\n");
}
mutex_exit(&config_misc_lock);
-   if (errcnt != 0) {
-   printf("WARNING: %d error%s while detecting hardware; "
-   "check system log.\n", errcnt,
-   errcnt == 1 ? "" : "s");
-   }
+   }
+   if (errcnt != 0) {
+   printf("WARNING: %d error%s while detecting hardware; "
+   "check system log.\n", errcnt,
+   errcnt == 1 ? "" : "s");
}
 }

I think it's not required to check AB_VERBOSE because
not so many people set this flag and it makes difficult
to notice problem(s). Is it OK to commit?

 On my machines ichpcib's TCO timer and acpiec count error.


acpiec0 at acpi0 (H_EC, PNP0C09-1)
acpiec0: unable to evaluate _GPE: AE_NOT_FOUND

(snip)

tco0 at ichlpcib0: TCO (watchdog) timer configured.
tco0: TCO timer reboot disabled by hardware; hope SMBIOS properly handles it.



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: wm0 stopped receiving after mpt0 timeout recovery

2016-07-04 Thread Masanobu SAITOH

On 2016/07/05 1:40, Edgar Fuß wrote:

wm0 seem to fail permanently whenever it's interrupts aren't serviced
for some time.

For which value of ``permanently''? Do I need to buy a new NIC [Fe]?
Does something like ifconfig up/down help?


The mpt recovery code (like e.g. the wd recovery code) can keep the machine
on a high level interrupt level for some time, locking out wm interrupts.

Looking into how to fix it, I don't see where it does that. mpt_timeout()
calls mpt_intr(), which is just the same routine as passed to
pci_intr_establish(). So I don't seee the recovery (I'm talking of the ``see
if something happened with no interrupt'' recovery, not my more elaborate
recovery soft-resetting the IOC (a variant of which buhrow@ committed so it's
in 7) locking out interrupts any longer than a normal interrupt service will.



Please show the full dmesg.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Changing the return value of xxx_attach() from void to int.

2016-06-23 Thread Masanobu SAITOH

 Hi, all.

 As you know, the return value of device driver's attach function is void.
I've thought that we should change it to int for many years. I believe I'm
not the only person.

 xxx_attach() may fail the following cases:

got unexpected behavior.

resource allocation error.

driver is really broken.

some others.

 xxx_attach() is void, so the caller can't know the fail. It makes some
problems:

a) The OS may touch the broken device while the OS is running.
   It may causes a panic.

b) The shutdown sequence calls xxx_shutdown() even if
   it's not really attached. It may causes panic. We have met
   this bugs many times.

c) To prevent b), we have added extra code into xxx_shutdown().
   It wastes our time and the code become big and complex.

d) Resource leak. For example, A device_t structure is allocated
   by caller side.

 If we change the return value to int or any other type's value, we can
do the following things.

a) Don't register failed device to avoid problem.

b) If we don't do a), we can add code to not to call xxx_shutdown()
   for broken devices instead.

c) We can add the code to fallback to lower priority device driver
  like ukphy(4), ugen(4) or others.

d) (add some other good features?)

 What do you think about this change? If you're OK, I'll change _ALL_ device
drivers' attach function first by the following way:

0) Change return value to int.

1) Change "return;" to "return 0;" or "return -1;"
   (or bool value or Exxx. See below)

2) I won't modify the caller side for the time being.

In this way, it won't break anything because the caller doesn't check the
value. Even if I mistakenly modified the return value and we add code to
check the return value, it won't be a big problem because almost all drivers
don't go into the failure paths. So, I think it's not required to use new
CFATTACH_() because it makes both the driver and the caller side be
complex.

 Now I'm wondering if I should return "-1" or Exxx. FreeBSD returns Exxx
(e.g. ENOMEN, EIO, ENXIO, etc.). I don't know which one is better.

Changing to -1 is easier than Exxx because it's not required
to wonder what Exxx I should choose.

Changing to Exxx may be good if we check the error code and
do something depend on the value though I can't imagine anything now.

-1 and Exxx can be mixed because both are int. We can detect
error with "if (rv != 0)" even if it's mixed.

 Any objection, advice or idea?

 Thanks.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: PCI extended configuration support

2015-10-01 Thread Masanobu SAITOH
On 2015/10/01 22:59, Christos Zoulas wrote:
> On Oct 1,  7:21pm, msai...@execsw.org (Masanobu SAITOH) wrote:
> -- Subject: Re: PCI extended configuration support
> 
> |  pci_conf_read() have to call chip dependent function. I think
> | it's not good idea to call chip dependent function in pci_conf_size()
> | and ((c)->pci_conf_read)() twice. And, NOT all archs have pc_conf_read().
> 
> Ok, I agree.
> 
> |  Final diff:
> |
> |  
> http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/20150929-nbsd-pci-extconf-support.diff
> |
> 
> LGTM!
> 
> christos

 Committed.

 Thanks!!!

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PCI extended configuration support

2015-10-01 Thread Masanobu SAITOH
On 2015/09/28 22:41, Christos Zoulas wrote:
> On Sep 28,  7:51pm, msai...@execsw.org (Masanobu SAITOH) wrote:
> -- Subject: Re: PCI extended configuration support
> 
> | On 2015/09/27 11:03, Christos Zoulas wrote:
> | > In article <5604d1be.3080...@execsw.org>,
> | > Masanobu SAITOH   wrote:
> | >> On 2015/09/11 19:44, Masanobu SAITOH wrote:
> | >>
> | >> More machines support the extended configuration area than before.
> | >>
> | >>   Is it OK to commit?
> | >
> | > LGTM, but since this is not performance critical can't we centralize the
> 
> | > range test and do it in a wrapper?
> |
> |  We have no good idea... :-(
> 
> I meant instead of editing each of the drivers edit the pci_machdep.h file
> of each arch from:
> 
> #define pci_conf_read(c, t, r)  \
>  (*(c)->pc_conf_read)((c)->pc_conf_v, (t), (r))
> #define pci_conf_write(c, t, r, v)  \
>  (*(c)->pc_conf_write)((c)->pc_conf_v, (t), (r), (v))
> 
> to:
> 
> static __inline pcireg_t pci_conf_read(void *_cpv, pcitag_t _tag, int 
> _offset) {
>   if ((unsigned int)_offset >= PCI_CONF_SIZE)
>   return (pcireg_t)-1;
>   return *((c)->pc_conf_read)(
>   ((struct _pci_chipset *)_cpv)->pc_conf_v, _tag, _offset);
> }
> 
> static __inline void pci_conf_write(void *_cpv, pcitag_t _tag, int _offset) {
>   if ((unsigned int)_offset >= PCI_CONF_SIZE)
>   return;
>   return *((c)->pc_conf_write)(
>   ((struct _pci_chipset *)_cpv)->pc_conf_v, _tag, _offset);
> }
> 
> Or you could even make a macro of it in pci_var.h like:
> 
> #define PCI_ARCH_CONF_READ(c, t, r) \
>   do { \
>   if ((unsigned int)(r) >= PCI_CONF_SIZE) \
>   return (pcireg_t) -1; \
>   return *((c)->pc_conf_read)((c), (t), (r)); \
>   while (/*CONSTCOND*/0)
> 
> And then change the macro in arch/include/pci_machdep.h to:
> 
> #define pci_conf_read(c, t, r) PCI_ARCH_CONF_READ(c, t, r)
> 
> Or you could just centralize the machdep macro to pci_var.h, and put it
> around ifdefs so that the machdep code can override it:
> 
> #ifndef pci_conf_read
> #define pci_conf_read(c, t, r) \
>   do { \
>   if ((unsigned int)(r) >= PCI_CONF_SIZE) \
>   return (pcireg_t) -1; \
>   return *((c)->pc_conf_read)((c), (t), (r)); \
>   while (/*CONSTCOND*/0)
> #endif

 The size of accessible area is not fixed and it's machine
(SoC or implement of PCI bus) dependent, so it should be done
in ((c)->pc_conf_read)(). Or use function like:

if ((unsigned int)(r) >= pci_conf_size()) \
return (pcireg_t) -1; \

 pci_conf_read() have to call chip dependent function. I think
it's not good idea to call chip dependent function in pci_conf_size()
and ((c)->pci_conf_read)() twice. And, NOT all archs have pc_conf_read().

 I think writing pci's commom API function with a macro is not good for
kernel module.

> | > Plus this can't be negative so perhaps
> | > it should be unsigned or checked for < 0 too??
> | >
> | > + if (reg >= PCI_CONF_SIZE)
> | > + return (pcireg_t) -1;
> | > +
> |
> |  Done.
> 
> I think you missed some...

 Fixed. Thanks.

> | > Finally a few more constants could be defined instead
> | > of hard-coded values
> | >
> | > + switch (pcie_devtype) {
> | > + case 0x4:   /* Root Port of PCI Express Root Complex */
> | > + case 0xa:   /* Root Complex Event Collector */
> |
> |  Done.
> 
> Thanks!
> 
> christos

 Final diff:

 
http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/20150929-nbsd-pci-extconf-support.diff


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PCI extended configuration support

2015-09-28 Thread Masanobu SAITOH
On 2015/09/27 11:03, Christos Zoulas wrote:
> In article <5604d1be.3080...@execsw.org>,
> Masanobu SAITOH   wrote:
>> On 2015/09/11 19:44, Masanobu SAITOH wrote:
>>
>> More machines support the extended configuration area than before.
>>
>>   Is it OK to commit?
> 
> LGTM, but since this is not performance critical can't we centralize the
> range test and do it in a wrapper?

 We have no good idea... :-(

> Plus this can't be negative so perhaps
> it should be unsigned or checked for < 0 too??
> 
> + if (reg >= PCI_CONF_SIZE)
> + return (pcireg_t) -1;
> +

 Done.

> Finally a few more constants could be defined instead
> of hard-coded values
> 
> + switch (pcie_devtype) {
> + case 0x4:   /* Root Port of PCI Express Root Complex */
> + case 0xa:   /* Root Complex Event Collector */

 Done.

> christos

 Updated patch:


http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/20150928-nbsd-pci-extconf-support.diff

OK?

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PCI extended configuration support

2015-09-24 Thread Masanobu SAITOH

On 2015/09/11 19:44, Masanobu SAITOH wrote:

  Hi, Jonathan.

On 2015/09/09 23:02, Jonathan A. Kollasch wrote:

On Tue, Sep 08, 2015 at 12:58:31PM +0900, MAEKAWA Masahide wrote:

On 2015/09/08 0:59, Jonathan A. Kollasch wrote:

   - abuse of the pcitag_t  (I have patches that fix this.)


Your patches? Where? your local?


Attached.  This has been successfully tested on a machine that uses
Mode 2 configuration access.  Mode 2 is very rare nowadays, and is being
misdetected more often now that firmware is placing IO ports up in the
0xC000ish range.  The Mode 2 tag is incapable of representing a complete
PCI Bus,Device,Function tuple, and thus has no place being a generic
representation of a BDF for use by any configuration access method.
This patch makes the x86 pcitag_t a simple BDF.


  Thanks. Why you haven't committed that change? I think it's OK to commit.


   - pci_conf_size() is unnecessary


Why?


Because there's no need to care.  We already support extended
configuration space on some evbmips/arm, and haven't needed it yet.
Just read back 0x and ignore writes if extended configuration is
inaccessible.  No need to bloat the API and kernel.


 You mean the range check should be modified in all MD part?
And also, MI driver should check the ability of accessing extended
configuration area with:

 a) the return value of pci_get_ext_capability()
or
 b) write code in each driver

Only small number of driver will be required to use b), so it might
be OK to provide MI function to do it now.

And,


 - too much ACPI dependence, there's nothing that says you have to use
   ACPI MCFG to get this information


  I think updated patch solved the prolbem (see below)


 - possibility of too much KVA usage on 32-bit kernels

 - on-demand bus mapping may happen in inappropriate (interrupt) context


  In old patch, all bus area which was accessed were mapped and never unmapped.
In updated patch, configuration areas are mapped when a pci bus are attached 
(pci).
And then, if there is no any devices in the bus, it's unmapped. Is it OK?


 - x86 bus_space_read/write functions do not use the %eax/ax/al register,
 as required by the AMD documentation for Family 10h's (and others) extended
 configuration space access.


  We have not known it. Thanks. It's fixed in the new patch.

 
http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/20150911-nbsd-pci-extconf-support.diff

Could you check again?

  And, I suspect that you have other diffs locally, could we share your diff?


Jonathan Kollasch



Updated patch:

http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/20150925-nbsd-pci-extconf-support.diff

More machines support the extended configuration area than before.

 Is it OK to commit?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: PCI extended configuration support

2015-09-11 Thread Masanobu SAITOH

 Hi, Jonathan.

On 2015/09/09 23:02, Jonathan A. Kollasch wrote:

On Tue, Sep 08, 2015 at 12:58:31PM +0900, MAEKAWA Masahide wrote:

On 2015/09/08 0:59, Jonathan A. Kollasch wrote:

   - abuse of the pcitag_t  (I have patches that fix this.)


Your patches? Where? your local?


Attached.  This has been successfully tested on a machine that uses
Mode 2 configuration access.  Mode 2 is very rare nowadays, and is being
misdetected more often now that firmware is placing IO ports up in the
0xC000ish range.  The Mode 2 tag is incapable of representing a complete
PCI Bus,Device,Function tuple, and thus has no place being a generic
representation of a BDF for use by any configuration access method.
This patch makes the x86 pcitag_t a simple BDF.


 Thanks. Why you haven't committed that change? I think it's OK to commit.


   - pci_conf_size() is unnecessary


Why?


Because there's no need to care.  We already support extended
configuration space on some evbmips/arm, and haven't needed it yet.
Just read back 0x and ignore writes if extended configuration is
inaccessible.  No need to bloat the API and kernel.


 You mean the range check should be modified in all MD part?
And also, MI driver should check the ability of accessing extended
configuration area with:

a) the return value of pci_get_ext_capability()
or
b) write code in each driver

Only small number of driver will be required to use b), so it might
be OK to provide MI function to do it now.

And,


 - too much ACPI dependence, there's nothing that says you have to use
   ACPI MCFG to get this information


 I think updated patch solved the prolbem (see below)


 - possibility of too much KVA usage on 32-bit kernels

 - on-demand bus mapping may happen in inappropriate (interrupt) context


 In old patch, all bus area which was accessed were mapped and never unmapped.
In updated patch, configuration areas are mapped when a pci bus are attached 
(pci).
And then, if there is no any devices in the bus, it's unmapped. Is it OK?


 - x86 bus_space_read/write functions do not use the %eax/ax/al register,
 as required by the AMD documentation for Family 10h's (and others) extended
 configuration space access.


 We have not known it. Thanks. It's fixed in the new patch.


http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/20150911-nbsd-pci-extconf-support.diff

Could you check again?

 And, I suspect that you have other diffs locally, could we share your diff?


Jonathan Kollasch




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: Wake-on-LAN no longer working in netbsd-7

2015-09-10 Thread Masanobu SAITOH

On 2015/09/10 17:24, Martin Husemann wrote:

On Thu, Sep 10, 2015 at 10:21:01AM +0200, Marc Balmer wrote:

I used to start a netbsd-6 machine by sending Wake-on-LAN frames wo it.

After upgrading that machine to netbsd-7, it ignores these WoL packages and
does not start.

Its an i386 with bge(4) interface (a HP micro server).


 Which generation? G8 or others?
Could you show me the dmesg?



Something must have changed in how netbsd shutsdown -p a machine.


Check for acpi errors in dmesg and connect a console to check for errors
printed during shutdown. It should end in something like "entering ACPI sleep
state 5".

Martin




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


PCI extended configuration support

2015-09-07 Thread Masanobu SAITOH
 Hi, all.

 nonaka@ wrote code to access PCI extended configuration area.
Currently, the diff supoorts only on x86.


http://ftp.netbsd.org/pub/NetBSD/misc/nonaka/tmp/nbsd-pci-extconf-support.diff

Is it OK to commit?

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Tester(s) needed: ixv(4)

2015-08-18 Thread Masanobu SAITOH

Hi, Bert.

On 2015/08/18 4:09, Bert Kiers wrote:

On Mon, Aug 17, 2015 at 05:38:09PM +0200, Bert Kiers wrote:


# uname -a
NetBSD  7.99.21 NetBSD 7.99.21 (GENERIC) #0: Mon Aug 17 17:13:23 CEST 2015  
ki...@shell.boppelans.net:/tmp/obj25032/sys/arch/amd64/compile/GENERIC amd64

# dmesg|grep ixv
ixv0 at pci0 dev 3 function 0: Intel(R) PRO/10GbE Virtual Function Network 
Driver, Version - 1.1.4
ixv0: clearing prefetchable bit
ixv0: Using MSIX interrupts with 3 vectors
ixv0: for TX/RX, interrupting at msix0 vec 0, bound queue 0 to cpu 0
ixv0: for link, interrupting at msix0 vec 1, affinity to cpu 1

# ifconfig ixv0
ixv0: flags=8802 mtu 1500
capabilities=bff80
enabled=0
address: 00:00:00:00:00:00
media: Ethernet autoselect
status: no carrier


no carrier is correct at the moment; but no MAC-address :(


After connecting the port to a switch, ifconfig still shows no carrier.
"ifconfig ixv0 up" does not help.  Doing "ifconfig eth2 up" on the host
does not help.  It is up on the switch.

Btw, this is with 6.1.5 userland.  I don't think that matters.


 Both MAC address and link status use communication between PF and
VF. I suspect that communication is broken.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Tester(s) needed: ixv(4)

2015-08-13 Thread Masanobu SAITOH

On 2015/08/13 18:14, Bert Kiers wrote:

On Thu, Aug 13, 2015 at 02:16:27PM +0900, Masanobu SAITOH wrote:

  Hi, all.

  I've commited change to support ixv(4), Intel 10G Ethernet's virtual
function. The reason why this driver had not been compilable is that
it supports MSI-X only.

  Currently, I have not environment to test this driver. Could some
people who are familiar with it test with the latest -current
(and fix bugs)?


Sounds interesting, but what is it?  Some advanced function of Intel
X540?


 82599 and newer chips has SR-IOV (Single Root I/O Virtualization) function.
It can generate vritual function(VF)s up to 63 VFs. Each VF can pass
through to an guest OS. Link status, statistics and some others are
managed in the physical function(PF) of the device.


I have read
http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/dev/pci/ixgbe/ixv.c?only_with_tag=MAIN

but

http://netbsd.gw.com/cgi-bin/man-cgi?ixv++NetBSD-current is empty.


 Because the original FreeBSD's ixgbe/ixv drivers has ixgbe.4 only :-|


I have X540 (and X710), but don't know how to test.


It's required to use MSI/MSI-X function and SR-IOV. Currently,
x86 except XEN support MSI/MSI-X function. And, to control SR-IOV,
it's required to access the PCIe's extended configuration area.
It's not supported in NetBSD yet (though nonaka@ is working on it).
So, we can't use ixv(4) via NetBSD/xen DOM0 :-|
Amazon EC2's C3 and C4 instance support ixv(ixgbevf in linux), but
it can't be used because NetBSD/xen doesn't support MSI/MSI-X.

 NetBSD's ixv(4) can be test with Linux KVM or VMware vSphere.
Some people other than me must be familar with them :)

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Tester(s) needed: ixv(4)

2015-08-12 Thread Masanobu SAITOH
 Hi, all.

 I've commited change to support ixv(4), Intel 10G Ethernet's virtual
function. The reason why this driver had not been compilable is that
it supports MSI-X only.

 Currently, I have not environment to test this driver. Could some
people who are familiar with it test with the latest -current
(and fix bugs)?

 Regards.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: argument of pci_msi[x]_count()

2015-08-10 Thread Masanobu SAITOH
Hi, Christos.

On 2015/08/07 18:53, Christos Zoulas wrote:
> In article <55c349bf.2000...@execsw.org>,
> Masanobu SAITOH   wrote:
>> Hi, all.
>>
>> Currently, pci_msi_count() and pci_msix_count() take one
>> pci_attach_args argument.
>> These functions may be used in other than attach function. So, it might be
>> better to use pci_chipset_tag_t and pcitag_t.
>>
>> Is the following diff better than current specification?
> 
> Ok, but this makes them different than their alloc counterparts. Should
> we change those too?

Almost all pci related functions don't take pci_attach_args as
an argument. Some functions are required to take it because
some elements(e.g. pa_iot and pa_memt) in the structure are required.

 Before introducing MSI/MSI-X API, the following functions take
pci_attach_args as an argument.

pci_find_device()
pci_mapreg_map()
pci_mapreg_submap()
pci_intr_map()
pci_aprint_devinfo()
pci_attach_hook()

Some above functions refers pa_iot and pa_memt in them.

And then, the following functions which take pci_attach_args
as an argument are added:

pci_intx_alloc()
pci_intr_alloc()
pci_msi_count()
pci_msi_alloc()
pci_msi_alloc_exact()
pci_msix_count()
pci_msix_alloc()
pci_msix_alloc_exact()
pci_msix_alloc_map()

 pci_intr_map() takes pci_attach_args, so it's consistent.
In reality, it's not required for x86 to take pci_attach_args.
The MSI/MSI-X related "alloc" function might not required
to take pci_attach_args, but I think it's ok to keep current
API for the consistency and possibility of using pci_attach_args
on other archs.

 So, is it ok to change pci_msi[x]_count() only?


> christos
> 


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


argument of pci_msi[x]_count()

2015-08-06 Thread Masanobu SAITOH
 Hi, all.

 Currently, pci_msi_count() and pci_msix_count() take one pci_attach_args 
argument.
These functions may be used in other than attach function. So, it might be
better to use pci_chipset_tag_t and pcitag_t.

 Is the following diff better than current specification?


Index: share/man/man9/pci_msi.9
===
RCS file: /cvsroot/src/share/man/man9/pci_msi.9,v
retrieving revision 1.6
diff -u -p -r1.6 pci_msi.9
--- share/man/man9/pci_msi.924 Jul 2015 07:40:58 -  1.6
+++ share/man/man9/pci_msi.96 Aug 2015 11:24:57 -
@@ -44,7 +44,8 @@
 .Nd PCI MSI{,-X} manipulation functions
 .Sh SYNOPSIS
 .Ft int
-.Fn pci_msi_count "struct pci_attach_args *pa"
+.Fn pci_msi_count "pci_chipset_tag_t pc" \
+"pcitag_t tag"
 .Ft int
 .Fn pci_msi_alloc  "struct pci_attach_args *pa" \
 "pci_intr_handle_t **ihps" "int *count"
@@ -52,7 +53,8 @@
 .Fn pci_msi_alloc_exect "struct pci_attach_args *pa" \
 "pci_intr_handle_t **ihps" "int count"
 .Ft int
-.Fn pci_msix_count "struct pci_attach_args *pa"
+.Fn pci_msix_count "pci_chipset_tag_t pc" \
+"pcitag_t tag"
 .Ft int
 .Fn pci_msix_alloc  "struct pci_attach_args *pa" \
 "pci_intr_handle_t **ihps" "int *count"
Index: sys/arch/x86/include/pci_machdep_common.h
===
RCS file: /cvsroot/src/sys/arch/x86/include/pci_machdep_common.h,v
retrieving revision 1.19
diff -u -p -r1.19 pci_machdep_common.h
--- sys/arch/x86/include/pci_machdep_common.h   21 Jul 2015 03:10:42 -  
1.19
+++ sys/arch/x86/include/pci_machdep_common.h   6 Aug 2015 11:24:57 -
@@ -145,14 +145,14 @@ int   pci_intr_alloc(const struct pci_att
pci_intr_handle_t **, int *, pci_intr_type_t);
 
 /* experimental MSI support */
-intpci_msi_count(const struct pci_attach_args *);
+intpci_msi_count(pci_chipset_tag_t, pcitag_t);
 intpci_msi_alloc(const struct pci_attach_args *,
pci_intr_handle_t **, int *);
 intpci_msi_alloc_exact(const struct pci_attach_args *,
pci_intr_handle_t **, int);
 
 /* experimental MSI-X support */
-intpci_msix_count(const struct pci_attach_args *);
+intpci_msix_count(pci_chipset_tag_t, pcitag_t);
 intpci_msix_alloc(const struct pci_attach_args *,
pci_intr_handle_t **, int *);
 intpci_msix_alloc_exact(const struct pci_attach_args *,
Index: sys/arch/x86/pci/msipic.c
===
RCS file: /cvsroot/src/sys/arch/x86/pci/msipic.c,v
retrieving revision 1.4
diff -u -p -r1.4 msipic.c
--- sys/arch/x86/pci/msipic.c   8 May 2015 04:27:48 -   1.4
+++ sys/arch/x86/pci/msipic.c   6 Aug 2015 11:24:58 -
@@ -620,7 +620,7 @@ msipic_construct_msix_pic(const struct p
int bir, bar, err, off, table_nentry;
char pic_name_buf[MSIPICNAMEBUF];
 
-   table_nentry = pci_msix_count(pa);
+   table_nentry = pci_msix_count(pa->pa_pc, pa->pa_tag);
if (table_nentry == 0) {
DPRINTF(("MSI-X table entry is 0.\n"));
return NULL;
Index: sys/arch/x86/pci/pci_intr_machdep.c
===
RCS file: /cvsroot/src/sys/arch/x86/pci/pci_intr_machdep.c,v
retrieving revision 1.35
diff -u -p -r1.35 pci_intr_machdep.c
--- sys/arch/x86/pci/pci_intr_machdep.c 24 Jul 2015 06:49:58 -  1.35
+++ sys/arch/x86/pci/pci_intr_machdep.c 6 Aug 2015 11:24:58 -
@@ -490,7 +490,7 @@ pci_intr_alloc(const struct pci_attach_a
 
/* try MSI-X */
if (msix_count == -1) /* use hardware max */
-   msix_count = pci_msix_count(pa);
+   msix_count = pci_msix_count(pa->pa_pc, pa->pa_tag);
if (msix_count > 0) {
error = pci_msix_alloc_exact(pa, ihps, msix_count);
if (error == 0) {
@@ -503,7 +503,7 @@ pci_intr_alloc(const struct pci_attach_a
 
/* try MSI */
if (msi_count == -1) /* use hardware max */
-   msi_count = pci_msi_count(pa);
+   msi_count = pci_msi_count(pa->pa_pc, pa->pa_tag);
if (msi_count > 0) {
error = pci_msi_alloc_exact(pa, ihps, msi_count);
if (error == 0) {
Index: sys/arch/x86/pci/pci_msi_machdep.c
===
RCS file: /cvsroot/src/sys/arch/x86/pci/pci_msi_machdep.c,v
retrieving revision 1.6
diff -u -p -r1.6 pci_msi_machdep.c
--- sys/arch/x86/pci/pci_msi_machdep.c  22 Jun 2015 03:57:01 -  1.6
+++ sys/arch/x86/pci/pci_msi_machdep.c  6 Aug 2015 11:24:58 -
@@ -474,16 +474,12 @@ x86_pci_msix_disestablish(pci_chipset_ta
  * return 0 if the device does not support MSI
  */
 int
-pci_msi_count(const struct pci_attach_args *pa)
+pci_msi_count(pci_chipset_tag_t pc, pcitag_t tag)
 {
-   pci_chipset_tag_t pc;
-   pcitag_t 

Re: PR/49328 CVS commit: src/sys/dev/pci/ixgbe

2015-02-17 Thread Masanobu SAITOH

On 2015/02/17 19:06, 6b...@6bone.informatik.uni-leipzig.de wrote:

On Wed, 4 Feb 2015, Christos Zoulas wrote:
...

christos


I have tested NetBSD 7.99.x

The problem is now solved. The PR can be closed.


Regards
Uwe


 Thanks. I'll send the pullup requet to netbsd-7.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ixg(4) mutex problem (PR#49328)

2015-02-05 Thread Masanobu SAITOH

On 2015/01/29 16:31, Masanobu SAITOH wrote:
(snip)

  Some of Ethernet drivers avoid using splnet()/splx() and use mutexes.
wm, bnx, ixg, powerpc/booke/dev/pq3etsec.c and others. I think it's better
to know the common solution and puse the way.


 I wrote the following page:


https://wiki.netbsd.org/users/msaitoh/Comparison_of_implementations_of_Ethernet_drivers/

Please add, modify and fix this page :-)

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: if_wm between netbsd-6 and netbsd-7 issue

2015-02-03 Thread Masanobu SAITOH

On 2015/01/29 20:39, Edgar Fuß wrote:

EF> I even get a message "Please update the Bootagent", whatever that means,
EF> exactly.
MS> That code is a workaround code for old boot rom which doesn't release
MS> semaphore.
Ah, thanks. Do you happen to know how to perform that update using NetBSD?


bootutil.exe? though I've not used.

https://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=19186

 Andother solution is to make that message DPRINTF()-based and
forget about it :-)
That message was taken from FreeBSD and it uses DEBUGOUT():


http://svnweb.freebsd.org/base/head/sys/dev/e1000/e1000_82571.c?annotate=267935#l431


MS> I have commited the change now. Thank you all.
Thank you for fixing it in the first place!


 You're welcome.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PR/49328 CVS commit: src/sys/dev/pci/ixgbe

2015-02-03 Thread Masanobu SAITOH
On 2015/02/04 14:08, Christos Zoulas wrote:
> On Feb 4,  1:49pm, msai...@execsw.org (Masanobu SAITOH) wrote:
> -- Subject: Re: PR/49328 CVS commit: src/sys/dev/pci/ixgbe
> 
> |  Hi, all.
> |
> |  I fixed and committed some smallproblem to -current.
> |
> |  The rest is:
> |
> | - Revert ixgbe_netbsd.c rev. 1.2
> | - make CORE_LOCK adaptive
> | - Release spin lock while reinitializing the jumb buffer structure.
> |
> |  Is it OK to commit?
> 
> Does it work with LOCKDEBUG?

 Yes, it works with LOCKDEBUG.

ifconfig up works
ifconfig down works
ifconfig up <-> down repeatedly works while forwarding works
reboot works
reboot while forwarding works

>If yes, go for it.
> 
> christos
> 


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: PR/49328 CVS commit: src/sys/dev/pci/ixgbe

2015-02-03 Thread Masanobu SAITOH
 Hi, all.

 I fixed and committed some smallproblem to -current.

 The rest is:

- Revert ixgbe_netbsd.c rev. 1.2
- make CORE_LOCK adaptive
- Release spin lock while reinitializing the jumb buffer structure.

 Is it OK to commit?


Index: ixgbe.c
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe.c,v
retrieving revision 1.17
diff -u -r1.17 ixgbe.c
--- ixgbe.c 4 Feb 2015 04:37:13 -   1.17
+++ ixgbe.c 4 Feb 2015 04:38:10 -
@@ -3933,12 +3933,16 @@
/* Free current RX buffer structs and their mbufs */
ixgbe_free_receive_ring(rxr);
 
+   IXGBE_RX_UNLOCK(rxr);
+
/* Now reinitialize our supply of jumbo mbufs.  The number
 * or size of jumbo mbufs may have changed.
 */
ixgbe_jcl_reinit(&adapter->jcl_head, rxr->ptag->dt_dmat,
2 * adapter->num_rx_desc, adapter->rx_mbuf_sz);
 
+   IXGBE_RX_LOCK(rxr);
+
/* Configure header split? */
if (ixgbe_header_split)
rxr->hdr_split = TRUE;
Index: ixgbe.h
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe.h,v
retrieving revision 1.1
diff -u -r1.1 ixgbe.h
--- ixgbe.h 12 Aug 2011 21:55:29 -  1.1
+++ ixgbe.h 4 Feb 2015 04:38:10 -
@@ -476,7 +476,7 @@
 
 
 #define IXGBE_CORE_LOCK_INIT(_sc, _name) \
-mutex_init(&(_sc)->core_mtx, MUTEX_DEFAULT, IPL_NET)
+mutex_init(&(_sc)->core_mtx, MUTEX_DEFAULT, IPL_SOFTNET)
 #define IXGBE_CORE_LOCK_DESTROY(_sc)  mutex_destroy(&(_sc)->core_mtx)
 #define IXGBE_TX_LOCK_DESTROY(_sc)mutex_destroy(&(_sc)->tx_mtx)
 #define IXGBE_RX_LOCK_DESTROY(_sc)mutex_destroy(&(_sc)->rx_mtx)
Index: ixgbe_netbsd.c
===
RCS file: /cvsroot/src/sys/dev/pci/ixgbe/ixgbe_netbsd.c,v
retrieving revision 1.2
diff -u -r1.2 ixgbe_netbsd.c
--- ixgbe_netbsd.c  28 Jan 2015 00:30:25 -  1.2
+++ ixgbe_netbsd.c  4 Feb 2015 04:38:10 -
@@ -56,7 +56,7 @@
 
*dtp = NULL;
 
-   if ((dt = kmem_zalloc(sizeof(*dt), KM_NOSLEEP)) == NULL)
+   if ((dt = kmem_zalloc(sizeof(*dt), KM_SLEEP)) == NULL)
return ENOMEM;
 
dt->dt_dmat = dmat;
@@ -136,19 +136,19 @@
ixgbe_extmem_t *em;
int nseg, rc;
 
-   em = kmem_zalloc(sizeof(*em), KM_NOSLEEP);
+   em = kmem_zalloc(sizeof(*em), KM_SLEEP);
 
if (em == NULL)
return NULL;
 
rc = bus_dmamem_alloc(dmat, size, PAGE_SIZE, 0, &em->em_seg, 1, &nseg,
-   BUS_DMA_NOWAIT);
+   BUS_DMA_WAITOK);
 
if (rc != 0)
goto post_zalloc_err;
 
rc = bus_dmamem_map(dmat, &em->em_seg, 1, size, &em->em_vaddr,
-   BUS_DMA_NOWAIT);
+   BUS_DMA_WAITOK);
 
if (rc != 0)
goto post_dmamem_err;


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


ixg(4) mutex problem (PR#49328)

2015-01-29 Thread Masanobu SAITOH

 Hi.

 Uwe Toenjes found a bug in ixg(4). See:

http://mail-index.netbsd.org/current-users/2014/10/11/msg025932.html

Then, PR#49328 was submitted by him.

http://gnats.netbsd.org/49328

This problem is can be reproduced when

LOCKDEBUG is defined
ifconfig ixg0 up

ixg(4) take a spin mutex before accessing MAC core:

-- ixgbe.h ---
#define IXGBE_CORE_LOCK_INIT(_sc, _name) \
mutex_init(&(_sc)->core_mtx, MUTEX_DEFAULT, IPL_NET)

--- ixgbe.c::ixbge_ioctl() --
case SIOCADDMULTI:
case SIOCDELMULTI:
case SIOCSIFFLAGS:
case SIOCSIFMTU:
default:
if ((error = ether_ioctl(ifp, command, data)) != ENETRESET)
return error;
if ((ifp->if_flags & IFF_RUNNING) == 0)
;
else if (command == SIOCSIFCAP || command == SIOCSIFMTU) {
IXGBE_CORE_LOCK(adapter);
ixgbe_init_locked(adapter);
IXGBE_CORE_UNLOCK(adapter);

In this case, IXGBE_CORE_LOCK() is used to prevent interrupt.

I borrowed an old Intel 10G card(82598 based) from my colleague and tested.
I got a panic like as follows:


Mutex error: lockdebug_barrier: spin lock held

lock address : 0x80001d0eaa08 type :   spin
initialized  : 0x805cb716
shared holds :  0 exclusive:  1
shares wanted:  0 exclusive:  0
current cpu  :  7 last held:  7
current lwp  : 0xfe840d2049c0 last held: 0xfe840d2049c0
last locked* : 0x805ca63e unlocked : 00
owner field  : 0x00010600 wait/spin:0/1

panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
cpu7: Begin traceback...
vpanic() at netbsd:vpanic+0x13c
snprintf() at netbsd:snprintf
lockdebug_more() at netbsd:lockdebug_more
mutex_enter() at netbsd:mutex_enter+0x45e
pmap_growkernel() at netbsd:pmap_growkernel+0x2d
uvm_map_prepare() at netbsd:uvm_map_prepare+0x24d
uvm_map() at netbsd:uvm_map+0x97
uvm_km_alloc() at netbsd:uvm_km_alloc+0xec
_bus_dmamem_map.isra.7() at netbsd:_bus_dmamem_map.isra.7+0x69
ixgbe_jcl_reinit() at netbsd:ixgbe_jcl_reinit+0x180
ixgbe_init_locked() at netbsd:ixgbe_init_locked+0x510
ixgbe_init() at netbsd:ixgbe_init+0x22
ether_ioctl() at netbsd:ether_ioctl+0xc8
ixgbe_ioctl() at netbsd:ixgbe_ioctl+0x41
in6_update_ifa1() at netbsd:in6_update_ifa1+0x70e
in6_update_ifa() at netbsd:in6_update_ifa+0x36
in6_ifattach_linklocal() at netbsd:in6_ifattach_linklocal+0x2eb
in6_ifattach() at netbsd:in6_ifattach+0x15d
in6_if_up() at netbsd:in6_if_up+0xf
ifioctl_common() at netbsd:ifioctl_common+0x343
ether_ioctl() at netbsd:ether_ioctl+0x162
ixgbe_ioctl() at netbsd:ixgbe_ioctl+0x41
doifioctl() at netbsd:doifioctl+0x493
soo_ioctl() at netbsd:soo_ioctl+0x2af
sys_ioctl() at netbsd:sys_ioctl+0x17e
syscall() at netbsd:syscall+0x9a
--- syscall (number 54) ---
7f7ff6cced0a:
cpu7: End traceback...
--

 ifconfig ixg0 up
 -> ioctl()
   -> take spin mutex
   -> alloc a lot of dmamem and dmamap.
   -> it got a shortage of the kernel address space and called pmap_growkernel()
 -> tried to get adaptive mutex.
   -> panic

 My stack trace is different from Uew's traces, but the root of the
problem is the same.

 How should we fix this problem?

 Some of Ethernet drivers avoid using splnet()/splx() and use mutexes.
wm, bnx, ixg, powerpc/booke/dev/pq3etsec.c and others. I think it's better
to know the common solution and puse the way.

 I'm not so familiar with mutex and I have no experience using mutex...
Any patches are welcome. I'll try them if you send me :-)

 Thanks.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)



Re: if_wm between netbsd-6 and netbsd-7 issue

2015-01-28 Thread Masanobu SAITOH

On 2015/01/29 3:16, Edgar Fuß wrote:

Could you try the following patch?

Ah, looks much better, thanks!
I haven't tried to actually use the interface, though.

I even get a message "Please update the Bootagent", whatever that means, 
exactly.


 That code is a workaround code for old boot rom which doesn't release 
semaphore.

 I have commited the change now. Thank you all.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: if_wm between netbsd-6 and netbsd-7 issue

2015-01-27 Thread Masanobu SAITOH

Hi, all


On 2015/01/12 22:38, Edgar Fuß wrote:

I have a similar problem with a "IBM PRO/1000 PT Dual Port Server Adapter Low 
Profile - 39Y6128" (in a HP MicroServer) that was at least recognized with 6, but on 
7, I get:

wm0 at pci2 dev 0 function 0: Intel PRO/1000 PT (82571EB) (rev. 0x06)
wm0: interrupting at ioapic0 pin 18
wm0: PCI-Express bus
wm0: could not acquire SWSM SMBI
wm0: wm_nvm_acquire: failed to get semaphore
wm0: could not acquire SWSM SMBI
wm0: wm_nvm_acquire: failed to get semaphore
wm0: No EEPROM
wm0: unable to read Ethernet address
wm1 at pci2 dev 0 function 1: Intel PRO/1000 PT (82571EB) (rev. 0x06)
wm1: interrupting at ioapic0 pin 19
wm1: PCI-Express bus
wm1: could not acquire SWSM SMBI
wm1: wm_nvm_acquire: failed to get semaphore
wm1: could not acquire SWSM SMBI
wm1: wm_nvm_acquire: failed to get semaphore
wm1: No EEPROM
wm1: unable to read Ethernet address


Could you try the following patch?

Index: if_wm.c
===
RCS file: /cvsroot/src/sys/dev/pci/if_wm.c,v
retrieving revision 1.309
diff -u -r1.309 if_wm.c
--- if_wm.c 16 Jan 2015 10:36:14 -  1.309
+++ if_wm.c 27 Jan 2015 09:11:03 -
@@ -1844,7 +1844,7 @@
case WM_T_82571:
case WM_T_82572:
reg = CSR_READ(sc, WMREG_SWSM2);
-   if ((reg & SWSM2_LOCK) != 0) {
+   if ((reg & SWSM2_LOCK) == 0) {
CSR_WRITE(sc, WMREG_SWSM2, reg | SWSM2_LOCK);
force_clear_smbi = true;
} else



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ixg(4) performances

2014-09-03 Thread Masanobu SAITOH

On 2014/09/04 11:24, Masanobu SAITOH wrote:

On 2014/09/04 0:40, Emmanuel Dreyfus wrote:

On Wed, Sep 03, 2014 at 04:11:29PM +0200, Bert Kiers wrote:

NetBSD 6.1 says:
vendor 0x8086 product 0x1528 (ethernet network, revision 0x01) at pci1 dev 0 
function 0 not configured


In src/sys/dev/pci/ixgbe/ we know about producct Id 0x1529 and 0x152A but
not 0x1528. But this can probably be easily borrowed from FreeBSD:
http://svnweb.freebsd.org/base/head/sys/dev/ixgbe/

They call it IXGBE_DEV_ID_X540T, You can try to add in ixgbe_type.h:
#define IXGBE_DEV_ID_82599_X540T 0x1528

Then in ixgbe.c add a IXGBE_DEV_ID_82599_X540T line in
ixgbe_vendor_info_array[]

In ixgbe_82599.c you need a case IXGBE_DEV_ID_82599_X540T
next to case IXGBE_DEV_ID_82599_BACKPLANE_FCOE if media
is indeed backplane. Otherwise add it at the appropriate place
in the switch statement.

And finally you need to add a case IXGBE_DEV_ID_82599_X540T
next to case IXGBE_DEV_ID_82599_BACKPLANE_FCOE in ixgbe_api.c


Our ixg(4) driver doesn't support X520. At least there is no


s/X520/X540/


file ixgbe/ixgbe_x540.c of FreeBSD.


 BusFreeBSD   NetBSD
  82597PCI-Xixgbdge
  82598PCIeixgbeixg
  82599(X520)PCIeixgbeixg
  X540PCIeixgbe(ixg)(not yet)





--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ixg(4) performances

2014-09-03 Thread Masanobu SAITOH

On 2014/09/04 0:40, Emmanuel Dreyfus wrote:

On Wed, Sep 03, 2014 at 04:11:29PM +0200, Bert Kiers wrote:

NetBSD 6.1 says:
vendor 0x8086 product 0x1528 (ethernet network, revision 0x01) at pci1 dev 0 
function 0 not configured


In src/sys/dev/pci/ixgbe/ we know about producct Id 0x1529 and 0x152A but
not 0x1528. But this can probably be easily borrowed from FreeBSD:
http://svnweb.freebsd.org/base/head/sys/dev/ixgbe/

They call it IXGBE_DEV_ID_X540T, You can try to add in ixgbe_type.h:
#define IXGBE_DEV_ID_82599_X540T 0x1528

Then in ixgbe.c add a IXGBE_DEV_ID_82599_X540T line in
ixgbe_vendor_info_array[]

In ixgbe_82599.c you need a case IXGBE_DEV_ID_82599_X540T
next to case IXGBE_DEV_ID_82599_BACKPLANE_FCOE if media
is indeed backplane. Otherwise add it at the appropriate place
in the switch statement.

And finally you need to add a case IXGBE_DEV_ID_82599_X540T
next to case IXGBE_DEV_ID_82599_BACKPLANE_FCOE in ixgbe_api.c


Our ixg(4) driver doesn't support X520. At least there is no
file ixgbe/ixgbe_x540.c of FreeBSD.


Bus FreeBSD NetBSD
 82597  PCI-X   ixgbdge
 82598  PCIeixgbe   ixg
 82599(X520)PCIeixgbe   ixg
 X540   PCIeixgbe   (ixg)(not yet)


--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: FW: ixg(4) performances

2014-08-31 Thread Masanobu SAITOH
Hi, Emmanuel.

On 2014/09/01 11:10, Emmanuel Dreyfus wrote:
> Terry Moore  wrote:
> 
>> Since you did a dword read, the extra 0x9 is the device status register.
>> This makes me suspicious as the device status register is claiming that you
>> have "unsupported request detected)" [bit 3] and "correctable error
>> detected" [bit 0].  Further, this register is RW1C for all these bits -- so
>> when you write 94810, it should have cleared the 9 (so a subsequent read
>> should have returned 4810).
>>
>> Please check.
> 
> You are right;
> # pcictl /dev/pci5 read -d 0 -f 1 0xa8
> 00092810
> # pcictl /dev/pci5 write -d 0 -f 1 0xa8 0x00094810
> # pcictl /dev/pci5 read -d 0 -f 1 0xa8
> 4810
> 
>> Might be good to post a "pcictl dump" of your device, just to expose all the
>> details.
> 
> It explicitely says 2.5 Gb/s x 8 lanes
> 
> # pcictl /dev/pci5 dump -d0 -f 1
> PCI configuration registers:
>Common header:
>  0x00: 0x10fb8086 0x00100107 0x0201 0x00800010
> 
>  Vendor Name: Intel (0x8086)
>  Device Name: 82599 (SFI/SFP+) 10 GbE Controller (0x10fb)
>  Command register: 0x0107
>I/O space accesses: on
>Memory space accesses: on
>Bus mastering: on
>Special cycles: off
>MWI transactions: off
>Palette snooping: off
>Parity error checking: off
>Address/data stepping: off
>System error (SERR): on
>Fast back-to-back transactions: off
>Interrupt disable: off
>  Status register: 0x0010
>Interrupt status: inactive
>Capability List support: on
>66 MHz capable: off
>User Definable Features (UDF) support: off
>Fast back-to-back capable: off
>Data parity error detected: off
>DEVSEL timing: fast (0x0)
>Slave signaled Target Abort: off
>Master received Target Abort: off
>Master received Master Abort: off
>Asserted System Error (SERR): off
>Parity error detected: off
>  Class Name: network (0x02)
>  Subclass Name: ethernet (0x00)
>  Interface: 0x00
>  Revision ID: 0x01
>  BIST: 0x00
>  Header Type: 0x00+multifunction (0x80)
>  Latency Timer: 0x00
>  Cache Line Size: 0x10
> 
>Type 0 ("normal" device) header:
>  0x10: 0xdfe8000c 0x 0xbc01 0x
>  0x20: 0xdfe7c00c 0x 0x 0x00038086
>  0x30: 0x 0x0040 0x 0x0209
> 
>  Base address register at 0x10
>type: 64-bit prefetchable memory
>base: 0xdfe8, not sized
>  Base address register at 0x18
>type: i/o
>base: 0xbc00, not sized
>  Base address register at 0x1c
>not implemented(?)
>  Base address register at 0x20
>type: 64-bit prefetchable memory
>base: 0xdfe7c000, not sized
>  Cardbus CIS Pointer: 0x
>  Subsystem vendor ID: 0x8086
>  Subsystem ID: 0x0003
>  Expansion ROM Base Address: 0x
>  Capability list pointer: 0x40
>  Reserved @ 0x38: 0x
>  Maximum Latency: 0x00
>  Minimum Grant: 0x00
>  Interrupt pin: 0x02 (pin B)
>  Interrupt line: 0x09
> 
>Capability register at 0x40
>  type: 0x01 (Power Management, rev. 1.0)
>Capability register at 0x50
>  type: 0x05 (MSI)
>Capability register at 0x70
>  type: 0x11 (MSI-X)
>Capability register at 0xa0
>  type: 0x10 (PCI Express)
> 
>PCI Message Signaled Interrupt
>  Message Control register: 0x0180
>MSI Enabled: no
>Multiple Message Capable: no (1 vector)
>Multiple Message Enabled: off (1 vector)
>64 Bit Address Capable: yes
>Per-Vector Masking Capable: yes
>  Message Address (lower) register: 0x
>  Message Address (upper) register: 0x
>  Message Data register: 0x
>  Vector Mask register: 0x
>  Vector Pending register: 0x
> 
>PCI Power Management Capabilities Register
>  Capabilities register: 0x4823
>Version: 1.2
>PME# clock: off
>Device specific initialization: on
>3.3V auxiliary current: self-powered
>D1 power management state support: off
>D2 power management state support: off
>PME# support: 0x09
>  Control/status register: 0x2000
>Power state: D0
>PCI Express reserved: off
>No soft reset: off
>PME# assertion disabled
>PME# status: off
> 
>PCI Express Capabilities Register
>  Capability version: 2
>  Device type: PCI Express Endpoint device
>  Interrupt Message Number: 0
>  Link Capabilities Register: 0x00027482
>Maximum Link Speed: unknown 2 value
>Maximum Link Width: x8 lanes
>Port Number: 0
>  Link Status Register: 0x1081
>Negotiated Link Speed: 2.5Gb/s
*

Which Version of NetBSD are you using?

 I committed some changes fixing Gb/s

Re: Marvell 88SE9230 AHCI?

2014-07-23 Thread Masanobu SAITOH

On 2014/07/24 14:17, Masanobu SAITOH wrote:

On 2014/07/24 14:16, Masanobu SAITOH wrote:

On 2014/07/24 1:31, Manuel Bouyer wrote:

On Wed, Jul 23, 2014 at 08:46:34AM -0400, Thor Lancelot Simon wrote:

I just ordered a Supermicro low-power rackmount x86 machine which -- I
discovered after ordering it -- uses a Marvell 88SE9230 as a SATA controller.

>From FreeBSD's ahci.c, I see this is approximately an AHCI compatible
controller but that it has some quirks.\


I noticed that quirk, too. I checked their change one month ago.
Perhaps the following changes might help:


http://svnweb.freebsd.org/base/head/sys/dev/ahci/ahci.c?r1=85&r2=224498


 Is anyone using NetBSD with this
controller or the similar 88SE9128, 9128, or 9130?


There is an entry 88SE9128 in netbsd-6's ahcisata_pci.c, so at last this
one works I guess.


  I have SuperMicro X9SBAA-F which has 88SE9128.


  s/88SE9128/88SE9230/


I tested with two different disks. One works fine
but another didn't work with "IDENTIFY faied" message.
I've not investigated the detail yet.







--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Marvell 88SE9230 AHCI?

2014-07-23 Thread Masanobu SAITOH

On 2014/07/24 14:16, Masanobu SAITOH wrote:

On 2014/07/24 1:31, Manuel Bouyer wrote:

On Wed, Jul 23, 2014 at 08:46:34AM -0400, Thor Lancelot Simon wrote:

I just ordered a Supermicro low-power rackmount x86 machine which -- I
discovered after ordering it -- uses a Marvell 88SE9230 as a SATA controller.

>From FreeBSD's ahci.c, I see this is approximately an AHCI compatible
controller but that it has some quirks.  Is anyone using NetBSD with this
controller or the similar 88SE9128, 9128, or 9130?


There is an entry 88SE9128 in netbsd-6's ahcisata_pci.c, so at last this
one works I guess.


  I have SuperMicro X9SBAA-F which has 88SE9128.


 s/88SE9128/88SE9230/


I tested with two different disks. One works fine
but another didn't work with "IDENTIFY faied" message.
I've not investigated the detail yet.




--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: Marvell 88SE9230 AHCI?

2014-07-23 Thread Masanobu SAITOH

On 2014/07/24 1:31, Manuel Bouyer wrote:

On Wed, Jul 23, 2014 at 08:46:34AM -0400, Thor Lancelot Simon wrote:

I just ordered a Supermicro low-power rackmount x86 machine which -- I
discovered after ordering it -- uses a Marvell 88SE9230 as a SATA controller.

>From FreeBSD's ahci.c, I see this is approximately an AHCI compatible
controller but that it has some quirks.  Is anyone using NetBSD with this
controller or the similar 88SE9128, 9128, or 9130?


There is an entry 88SE9128 in netbsd-6's ahcisata_pci.c, so at last this
one works I guess.


 I have SuperMicro X9SBAA-F which has 88SE9128.
I tested with two different disks. One works fine
but another didn't work with "IDENTIFY faied" message.
I've not investigated the detail yet.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


AsiaBSDCon 2014 WIP session: Improving MII PHY

2014-03-18 Thread Masanobu SAITOH
 And,

 I put my slide of AsiaBSDCon 2014 Working In Progress session at:

http://www.netbsd.org/~msaitoh/Improving_MII_PHY.pdf

Thanks.

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: 4byte aligned com(4) and PCI_MAPREG_TYPE_MEM

2014-02-16 Thread Masanobu SAITOH

Hi.

(2014/02/15 1:58), Izumi Tsutsui wrote:

I'd suggest to clarify what problems you are trying to solve
and consider how it should be solved, before updating your patch.

The problems you mentioned are:
(1) merge initialization of sparse register mappings (with 4 byte stride)


Right.


(2) defer consinit() for puc com to use uvm_km_alloc() in it


Right though the real goal is not to defer consinit but to support
memory mapped I/O in com(4).

And,

  (3) Almost all drivers don't clear "struct com_regs regs". It's not
 a real bug but should be cleard for further change.


Your patch is trying to solve them by:
(1) change COM_INIT_REGS() (and comprobe1()) APIs to pass stride byte
(2) add an MI "extent_initted" global variable and check it in MD consinit()

My vote is:
(1) leave existing APIs and handle the quirk in MD attachment
(2) add an x86 specific MD variable to defer consinit() till cpu_startup()


 I think "static int iniited" variable is used to do that, but there
is no way to know whether extent_init() (or uvm_init())  is called
or not.


Because:
(1)
  - it's really pain to change the MI APIs
(so many attachments and most of them will rarely be tested unfortunately)
  - only three or four attachments can share the new API
while such embedded devices often might have more other quirks


 Before writing that patch, I wrote another patch which did't change
the arguments of COM_INIT_REGS(). We have two choices

a) Use COM_INIT_REGS() for all drivers.
b) Copy COM_INIT_REGS() and modify to support 4 byte stride.

I couldn't decice which one was better or not. Now I think not change
the argument and make a new macro is better than adding new argument,
because considering the byte order is not required for 1 byte slide
devices.


  - even if stride handling is really necessary in MI part,
it's much better to prepare new wrap functions,
like wdcprobe1() and wdcprobe() in wdc.c
(i.e. prepare a new COM_INIT_REGS_STRIDE() macro with a new arg
 and make exiting COM_INIT_REGS() macro use it)

(2)
  - it's unclear what functions actually require the extent_init()
(I guess uvm_init() is enough to call uvm_km_alloc())


 Me neither... I had thought that uvm_init() was enough, but someone
advived me that exten_init() should be called.


  - in general MI code assumes that console devices are properly
mapped by MD bootstrap or its firmware


 Yes. But, it's little hard to make such mapping as PCI bus_space_tag
and handle in eary boot stage.


  - some ports already has MD flags to switch malloc(9) or static
memory in early device mappings and initialization


 It can be used, but IMHO, checking whether bus_space_map() can be
called is more generic than it.



Just my two cents,

---
Izumi Tsutsui



 Please give me a little time to write another patch.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: RFC: stop having a single global page size

2014-01-31 Thread Masanobu SAITOH

(2014/01/31 18:14), Martin Husemann wrote:

On Fri, Jan 31, 2014 at 12:59:15AM -0800, Matt Thomas wrote:

Why would anyone want this?  Say you have a system in which the MMU can have
per translation table page sizes.  a 16KB page size might be desirable for the
kernel and for LP64 processes.  If you are running a ILP32 process, possibly
of an older architecture, you might want to use a 4KB page size.


I guess another usage is a (huge) framebuffer/aperture mmap'd by the
userland driver, where you could easily save a lot TLB entries by using
1 MB pages (or bigger) instead of 4/8k ones. I think Solaris does this.

Martin


 and so called the super pages that FreeBSD does.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ~5 percent kernel performance loss in the last 3 weeks

2013-11-19 Thread Masanobu SAITOH

(2013/11/20 0:02), Christoph Badura wrote:

After updating my -current kernel from 6.99.24 to 6.99.27 so I could
commit my ubsec(4) changes I noticed that under 6.99.27 I get between
3 and 8 percent less throughput on accelerated crypto ops.

Note that I am using the exact same ubsec(4) code[1] with both kernels, so
I think it is unlikely a problem with ubsec(4).
I did not change userland.

The old kernel is 6.99.24 from Oct, 27th and the 6.99.27 is from Nov, 17th.
The machine is an ML110 G6 w/32G RAM and Intel Xeon X3430 4 core @2.4 GHz
running amd64.

There's nothing obvious to me in the dmesg diff that would give me a clue:

--- dmesg.6.99.24   2013-11-19 15:44:24.0 +0100
+++ dmesg.6.99.27   2013-11-19 15:39:29.0 +0100
@@ -4,7 +4,7 @@
  Copyright (c) 1982, 1986, 1989, 1991, 1993
  The Regents of the University of California.  All rights reserved.

-NetBSD 6.99.24 (GENERIC) #4: Mon Oct 28 18:58:32 CET 2013
+NetBSD 6.99.27 (GENERIC) #9: Sun Nov 17 17:47:24 CET 2013

bad@flexible-demeanour:/home/bad/work/nb/src/sys/arch/amd64/compile/GENERIC
  total memory = 32759 MB
  avail memory = 31792 MB
@@ -133,11 +133,14 @@
  acpicpu0: T5: FFH, lat   1 us, pow   285 mW,  38 %
  acpicpu0: T6: FFH, lat   1 us, pow   190 mW,  25 %
  acpicpu0: T7: FFH, lat   1 us, pow95 mW,  13 %
+coretemp0 at cpu0: thermal sensor, 1 C resolution
  acpicpu1 at cpu1: ACPI CPU
+coretemp1 at cpu1: thermal sensor, 1 C resolution
  acpicpu2 at cpu2: ACPI CPU
+coretemp2 at cpu2: thermal sensor, 1 C resolution
  acpicpu3 at cpu3: ACPI CPU
+coretemp3 at cpu3: thermal sensor, 1 C resolution
  timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
-timecounter: Timecounter "TSC" frequency 2394079440 Hz quality 3000


 It's strange.

Could you revert the following change and test again?

and, show me the output of "cpuctl identify 0"

http://mail-index.netbsd.org/source-changes/2013/11/15/msg049232.html

Module Name:src
Committed By:   msaitoh
Date:   Fri Nov 15 08:47:55 UTC 2013

Modified Files:
src/sys/arch/x86/acpi: acpi_cpu_md.c
src/sys/arch/x86/include: specialreg.h
src/sys/arch/x86/pci: amdtemp.c
src/sys/arch/x86/x86: coretemp.c cpu.c cpu_topology.c cpu_ucode_amd.c
cpu_ucode_intel.c est.c identcpu.c intel_busclock.c lapic.c odcm.c
patch.c powernow.c tprof_amdpmi.c tprof_pmi.c tsc.c viac7temp.c
src/usr.sbin/cpuctl/arch: i386.c

Log Message:
 Modify some macros and add some new macros for CPU family and model
to reduce code duplication and to avoid bug.

CPUID_TO_STEPPING(cpuid)(not changed)

CPUID_TO_FAMILY(cpuid)  (new)
CPUID_TO_MODEL(cpuid)   (new)

Return the display family and the display model.
The macro names are the same as FreeBSD.

CPUID_TO_BASEFAMILY(cpuid)  (The old name was CPUID2FAMILY)
CPUID_TO_BASEMODEL(cpuid)   (The old name was CPUID2MODEL)

Only for the base field.

CPUID_TO_EXTFAMILY(cpuid)   (The old name was CPUID2EXTFAMILY)
CPUID_TO_EXTMODEL(cpuid)(The old name was CPUID2EXTMODEL)

Only for the extended field.

See http://mail-index.netbsd.org/port-amd64/2013/11/12/msg001978.html


To generate a diff of this commit:
cvs rdiff -u -r1.72 -r1.73 src/sys/arch/x86/acpi/acpi_cpu_md.c
cvs rdiff -u -r1.71 -r1.72 src/sys/arch/x86/include/specialreg.h
cvs rdiff -u -r1.17 -r1.18 src/sys/arch/x86/pci/amdtemp.c
cvs rdiff -u -r1.30 -r1.31 src/sys/arch/x86/x86/coretemp.c
cvs rdiff -u -r1.105 -r1.106 src/sys/arch/x86/x86/cpu.c
cvs rdiff -u -r1.7 -r1.8 src/sys/arch/x86/x86/cpu_topology.c \
src/sys/arch/x86/x86/powernow.c
cvs rdiff -u -r1.6 -r1.7 src/sys/arch/x86/x86/cpu_ucode_amd.c \
src/sys/arch/x86/x86/viac7temp.c
cvs rdiff -u -r1.3 -r1.4 src/sys/arch/x86/x86/cpu_ucode_intel.c \
src/sys/arch/x86/x86/tprof_amdpmi.c
cvs rdiff -u -r1.27 -r1.28 src/sys/arch/x86/x86/est.c
cvs rdiff -u -r1.37 -r1.38 src/sys/arch/x86/x86/identcpu.c
cvs rdiff -u -r1.16 -r1.17 src/sys/arch/x86/x86/intel_busclock.c
cvs rdiff -u -r1.46 -r1.47 src/sys/arch/x86/x86/lapic.c
cvs rdiff -u -r1.2 -r1.3 src/sys/arch/x86/x86/odcm.c
cvs rdiff -u -r1.21 -r1.22 src/sys/arch/x86/x86/patch.c
cvs rdiff -u -r1.12 -r1.13 src/sys/arch/x86/x86/tprof_pmi.c
cvs rdiff -u -r1.32 -r1.33 src/sys/arch/x86/x86/tsc.c
cvs rdiff -u -r1.49 -r1.50 src/usr.sbin/cpuctl/arch/i386.c







  uhub0 at usb0: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
  uhub0: 2 ports with 2 removable, self powered
  uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
@@ -175,7 +178,9 @@
  audio0 at pad0: half duplex, playback, capture
  boot device: wd0
  root on wd0a dumps on wd0b
+/: replaying log to memory
  root file system type: ffs
+/: replaying log to disk
  ipmi0: version 2.0 interface KCS iobase 0xca2/2 spacing 1
  wsdisplay0: screen 1 added (80x25, vt100 emulation)
  wsdisplay0: screen 2 added (80x25, vt100 emulation)

Results from "openssl speed -e

Re: broadcom 57766 and jumbo frames

2013-07-04 Thread Masanobu SAITOH
Hello, Emmanuel.

(2013/07/04 12:35), Emmanuel Dreyfus wrote:
> SAITOH Masanobu  wrote:
> 
>> It's time for me to buy iMac... ;-)
> 
> Did you have a look at the 57766 commit in Linux? There are a few 57766
> and  57665 specific things:
> 
> http://code.google.com/p/linux-picosam9g45/source/diff?spec=svn55086ad95
> d740577def0b4e6ecc2c0ae9b0d6dec&r=55086ad95d740577def0b4e6ecc2c0ae9b0d6d
> ec&format=side&path=/drivers/net/ethernet/broadcom/tg3.c
> 
> I see the 2kB MMRR (see TG3_TX_BD_DMA_MAX_2K; and
> RDMAC_MODE_JMB_2K_MMRR) which is used only by 57766
> 
> There are also 57765/57766 specific things I do not find in our drivers,
> such as the numbers of rings: 4 Rx and 2 Tx

Thanks.

 Colud you test the following patch?

Index: if_bgereg.h
===
RCS file: /cvsroot/src/sys/dev/pci/if_bgereg.h,v
retrieving revision 1.81
diff -u -r1.81 if_bgereg.h
--- if_bgereg.h 31 May 2013 17:15:19 -  1.81
+++ if_bgereg.h 4 Jul 2013 07:59:12 -
@@ -607,7 +607,9 @@
 #define BGE_MBX_TX_NIC_PROD15_HI   0x03F8
 #define BGE_MBX_TX_NIC_PROD15_LO   0x03FC
 
-#define BGE_TX_RINGS_MAX   4
+#define BGE_TX_RINGS_MAX   1
+#define BGE_TX_RINGS_57765_MAX 2
+#define BGE_TX_RINGS_5717_MAX  4
 #define BGE_TX_RINGS_EXTSSRAM_MAX  16
 #define BGE_RX_RINGS_MAX   16
 
@@ -1510,6 +1512,7 @@
 #define BGE_RDMAMODE_MBUF_SBD_CRPT_ATTN0x2000
 #define BGE_RDMAMODE_FIFO_SIZE_128 0x0002
 #define BGE_RDMAMODE_FIFO_LONG_BURST   0x0003
+#define BGE_RDMAMODE_JMB_2K_MMRR   0x0080
 #defineBGE_RDMAMODE_MULT_DMA_RD_DIS0x0100
 #defineBGE_RDMAMODE_TSO4_ENABLE0x0800
 #defineBGE_RDMAMODE_TSO6_ENABLE0x1000
@@ -2631,7 +2634,8 @@
 #define BGE_5714_FAMILY0x0100
 #define BGE_5700_FAMILY0x0200
 #defineBGE_5717_PLUS   0x0400
-#defineBGE_57765_PLUS  0x0800
-#defineBGE_APE 0x1000
-#defineBGE_CPMU_PRESENT0x2000
+#defineBGE_57765_FAMILY0x0800
+#defineBGE_57765_PLUS  0x1000
+#defineBGE_APE 0x2000
+#defineBGE_CPMU_PRESENT0x4000
 #define BGE_TSO0x8000
Index: if_bge.c
===
RCS file: /cvsroot/src/sys/dev/pci/if_bge.c,v
retrieving revision 1.255
diff -u -r1.255 if_bge.c
--- if_bge.c3 Jul 2013 15:21:35 -   1.255
+++ if_bge.c4 Jul 2013 07:59:13 -
@@ -677,8 +677,9 @@
 #define BGE_IS_5714_FAMILY(sc) ((sc)->bge_flags & BGE_5714_FAMILY)
 #define BGE_IS_575X_PLUS(sc)   ((sc)->bge_flags & BGE_575X_PLUS)
 #define BGE_IS_5755_PLUS(sc)   ((sc)->bge_flags & BGE_5755_PLUS)
-#define BGE_IS_5717_PLUS(sc)   ((sc)->bge_flags & BGE_5717_PLUS)
+#define BGE_IS_57765_FAMILY(sc)((sc)->bge_flags & 
BGE_57765_FAMILY)
 #define BGE_IS_57765_PLUS(sc)  ((sc)->bge_flags & BGE_57765_PLUS)
+#define BGE_IS_5717_PLUS(sc)   ((sc)->bge_flags & BGE_5717_PLUS)
 
 static const struct bge_revision {
uint32_tbr_chipid;
@@ -1941,8 +1942,10 @@
 static int
 bge_init_tx_ring(struct bge_softc *sc)
 {
+   struct ifnet *ifp = &sc->ethercom.ec_if;
int i;
bus_dmamap_t dmamap;
+   bus_size_t maxsegsz;
struct txdmamap_pool_entry *dma;
 
if (sc->bge_flags & BGE_TXRING_VALID)
@@ -1964,10 +1967,18 @@
if (BGE_CHIPREV(sc->bge_chipid) == BGE_CHIPREV_5700_BX)
bge_writembx(sc, BGE_MBX_TX_NIC_PROD0_LO, 0);
 
+   /* Limit DMA segment size for some chips */
+   if ((BGE_ASICREV(sc->bge_chipid) == BGE_ASICREV_BCM57766) &&
+   (ifp->if_mtu <= ETHERMTU))
+   maxsegsz = 2048;
+   else if (BGE_ASICREV(sc->bge_chipid) == BGE_ASICREV_BCM5719)
+   maxsegsz = 4096;
+   else
+   maxsegsz = ETHER_MAX_LEN_JUMBO;
SLIST_INIT(&sc->txdma_list);
for (i = 0; i < BGE_TX_RING_CNT; i++) {
if (bus_dmamap_create(sc->bge_dmatag, BGE_TXDMA_MAX,
-   BGE_NTXSEG, ETHER_MAX_LEN_JUMBO, 0, BUS_DMA_NOWAIT,
+   BGE_NTXSEG, maxsegsz, 0, BUS_DMA_NOWAIT,
&dmamap))
return ENOBUFS;
if (dmamap == NULL)
@@ -2275,8 +2286,7 @@
CSR_WRITE_4(sc, BGE_MODE_CTL, mode_ctl);
}

-   /* XXX Should we use 57765_FAMILY? */
-   if (BGE_IS_57765_PLUS(sc)) {
+   if (BGE_IS_57765_FAMILY(sc)) {
if (sc->bge_chipid == BGE_CHIPID_BCM57765_A0) {
/* Save */
mode_ctl = CSR_READ_4(sc, BGE_MODE_CTL);
@@ -2393,7 +2403,7 @@
BGE_ASICREV(sc->bge_chipid) == BGE_ASICREV_BCM5704)
dma_rw_ctl &= ~BGE_PCIDMARWCTL_MI

Re: broadcom 57766 and jumbo frames

2013-07-03 Thread Masanobu SAITOH

(2013/07/03 15:51), Emmanuel Dreyfus wrote:

On Wed, Jul 03, 2013 at 03:03:11PM +0900, Masanobu SAITOH wrote:

  I tested some cards and the jumbo frame function works if the
BGE_JUMBO_CAPABLE flags is set by the driver.

  Please test if_bge.c rev. 1.254. It added the flag to some chips.
It includes BCM57766, too.


It works with mtu at 1500. If I set mtu to 9000 and send a small ping
(such as 32 bytes), it still works. But when I attempt to transmit a
jumbo frame, by a ping -s 4000, the card drops into a state where no
packet manages to get out anymore. Even a ping -s 32 does not work
anymore.

Sometimes it says:
bge0: watchdog timeout -- resetting
bge0: block failed to stop: reg 0xc00, bit 0x0002


 Thank you for your quick reply.

 Before I committed rev. 1.254, I found a serious bug that all *BSD's
bge have. The follwing patch fixes the behavior of BCM5776[56]. I have
niether 57765 nor 57766.

 Could you try this patch? I'll check the entire code in bge.c again.

Index: if_bge.c
===
RCS file: /cvsroot/src/sys/dev/pci/if_bge.c,v
retrieving revision 1.254
diff -u -r1.254 if_bge.c
--- if_bge.c3 Jul 2013 05:49:36 -   1.254
+++ if_bge.c3 Jul 2013 07:04:57 -
@@ -3388,14 +3388,14 @@
 
 	/* Save chipset family. */

switch (BGE_ASICREV(sc->bge_chipid)) {
-   case BGE_ASICREV_BCM57765:
-   case BGE_ASICREV_BCM57766:
-   sc->bge_flags |= BGE_57765_PLUS;
-   /* FALLTHROUGH */
case BGE_ASICREV_BCM5717:
case BGE_ASICREV_BCM5719:
case BGE_ASICREV_BCM5720:
-   sc->bge_flags |= BGE_5717_PLUS | BGE_5755_PLUS |
+   sc->bge_flags |= BGE_5717_PLUS;
+   /* FALLTHROUGH */
+   case BGE_ASICREV_BCM57765:
+   case BGE_ASICREV_BCM57766:
+   sc->bge_flags |= BGE_57765_PLUS | BGE_5755_PLUS |
BGE_575X_PLUS | BGE_5705_PLUS | BGE_JUMBO_CAPABLE;
/* Jumbo frame on BCM5719 A0 does not work. */
if ((BGE_ASICREV(sc->bge_chipid) == BGE_ASICREV_BCM5719) &&



--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: broadcom 57766 and jumbo frames

2013-07-02 Thread Masanobu SAITOH

Hi, Emmanuel.

(2013/07/02 16:39), Emmanuel Dreyfus wrote:

On Mon, Jul 01, 2013 at 05:33:21PM +0900, Masanobu SAITOH wrote:

  Could you compile a kernel with BGE_DEBUG and show me the dmesg?


Here it is:


Thanks.

 I tested some cards and the jumbo frame function works if the
BGE_JUMBO_CAPABLE flags is set by the driver.

 Please test if_bge.c rev. 1.254. It added the flag to some chips.
It includes BCM57766, too.


Module Name:src
Committed By:   msaitoh
Date:   Wed Jul  3 05:49:36 UTC 2013

Modified Files:
src/sys/dev/pci: if_bge.c

Log Message:
 Add BGE_JUMBO_CAPABLE flag to some chips. Before this commit,
the following chips support the jumbo frame function:

5700
5701
5702
5703
5704

With this commit, the following chip support it, too:

5714
5780
5717
5718
5719 (exclude rev. A0)
5720
57765
57766


To generate a diff of this commit:
cvs rdiff -u -r1.253 -r1.254 src/sys/dev/pci/if_bge.c


 One of the problem is that we have no direct way to know the MTU's
hard limit of each interface. To know the value, we can use SIOCSIFMTU
and check the the result if it is accepted. OpenBSD has SIOCGIFHARDMTU
to know the hard limit directly.

 


Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013
 The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
 The Regents of the University of California.  All rights reserved.

NetBSD 6.99.22 (MACMDD_DEBUG) #0: Tue Jul  2 03:36:01 CEST 2013

manu@enfer:/home6/tmp/macmdd-manu/netbsd-obj/sys/arch/amd64/compile/MACMDD_DEBUG
total memory = 8130 MB
avail memory = 7868 MB
timecounter: Timecounters tick every 10.000 msec
RTC BIOS diagnostic error 
0xff
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
Apple Inc. iMac13,1 (1.0)
mainbus0 (root)
cpu0 at mainbus0 apid 0: Intel(R) Core(TM) i5-3330S CPU @ 2.70GHz, id 0x306a9
cpu1 at mainbus0 apid 2: Intel(R) Core(TM) i5-3330S CPU @ 2.70GHz, id 0x306a9
cpu2 at mainbus0 apid 4: Intel(R) Core(TM) i5-3330S CPU @ 2.70GHz, id 0x306a9
cpu3 at mainbus0 apid 6: Intel(R) Core(TM) i5-3330S CPU @ 2.70GHz, id 0x306a9
ioapic0 at mainbus0 apid 2: pa 0xfec0, version 20, 24 pins
acpi0 at mainbus0: Intel ACPICA 20110623
acpi0: X/RSDT: OemId , AslId <,0113>
acpiecdt0 at acpi0: ACPI Embedded Controller via ECDT
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed0-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
acpiec0 at acpi0 (EC, PNP0C09-0): using acpiecdt0
acpivga0 at acpi0 (GFX0): ACPI Display Adapter
acpiout0 at acpivga0 (LCD, 0x80002400): ACPI Display Output Device
acpiout0: brightness levels: 1 4 6 9 14 19 26 33 40 46 52 58 65 71 78 85
acpivga0: connected output devices:
acpivga0:   0x2400 (acpiout0): Int. Digital Flat Panel, index 0, port 0, head 0
FWHD (INT0800) at acpi0 not configured
LDRC (PNP0C02) at acpi0 not configured
attimer1 at acpi0 (TIMR, PNP0100): io 0x40-0x43,0x50-0x53
SMC (APP0001) at acpi0 not configured
DPPT (APP000C) at acpi0 not configured
ALS0 (ACPI0008) at acpi0 not configured
PDRC (PNP0C02) at acpi0 not configured
acpibut0 at acpi0 (PWRB, PNP0C0C): ACPI Power Button
PNLF (APP0002) at acpi0 not configured
acpibut1 at acpi0 (SLPB, PNP0C0E): ACPI Sleep Button
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 0x8086 product 0x0150 (rev. 0x09)
ppb0 at pci0 dev 1 function 0: vendor 0x8086 product 0x0151 (rev. 0x09)
ppb0: PCI Express 2.0  x16 @ 8.0Gb/s
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
vga0 at pci1 dev 0 function 0: vendor 0x10de product 0x0fd8 (rev. 0xa1)
wsdisplay0 at vga0 kbdmux 1: console (80x25, vt100 emulation)
wsmux1: connecting to wsdisplay0
drm at vga0 not configured
hdaudio0 at pci1 dev 0 function 1: HD Audio Controller
hdaudio0: interrupting at ioapic0 pin 17
hdafg0 at hdaudio0: NVIDIA product 0042
hdafg0: DP00 8ch: Digital Out [Jack]
hdafg0: DP01 8ch: Digital Out [Jack]
hdafg0: 8ch/0ch 48000Hz PCM16*
vendor 0x8086 product 0x1e31 (USB serial bus, interface 0x30, revision 0x04) at 
pci0 dev 20 function 0 not configured
vendor 0x8086 product 0x1e3a (miscellaneous communications, revision 0x04) at 
pci0 dev 22 function 0 not configured
ehci0 at pci0 dev 26 function 0: vendor 0x8086 product 0x1e2d (rev. 0x04)
ehci0: interrupting at ioapic0 pin 23
ehci0: EHCI version 1.0
usb0 at ehci0: USB revision 2.0
hdaudio1 at pci0 dev 27 function 0: HD Audio Controller
hdaudio1: interrupting at ioapic0 pin 22
hdafg1 at hdaudio1: vendor 1013 product 4206
hdafg1: DAC00 2ch: Speaker [Built-In]
hdafg1: DAC01 2ch: HP Out [Jack]
hdafg1: DI

Re: broadcom 57766 and jumbo frames

2013-07-01 Thread Masanobu SAITOH

Hello, Emmanuel.

(2013/06/28 19:31), Emmanuel Dreyfus wrote:

Hi

New iMac have a broadcom 57766, which is handled out of the box by the
bge(4) driver in -current. The driver almosts treats it as a 57765,
which seems almost correct when looking at Linux driver contributed
by Broadcom:
http://code.google.com/p/linux-picosam9g45/source/diff?spec=svn55086ad95d740577def0b4e6ecc2c0ae9b0d6dec&r=55086ad95d740577def0b4e6ecc2c0ae9b0d6dec&format=side&path=/drivers/net/ethernet/broadcom/tg3.c

Unfortunately, jumbo frames do not work either on 57766 nor on 57765
on NetBSD. I tried telling it was jumbo capable without much success.
I also tried to incoporate some tweaks from the Linux commit above,
but it does not help.

It seems impossible to send or receive anything beyond around 1500
bytes with mtu set to 9000. I note that ifconfig still reports ec_enabled=0
I wonder if it is a problem with 57766 specific adjustements, or if it is the
bge(4) driver itelf that is jumbo frame incapable for now.

Anyone had jumbo frames actually working for another bge(4) board?


 As I wrote about wm(4) a few weeks ago, I don't use jumbo frame
function, so I don't know whether other bge's jumb frame function
work or not. Sorry :-|


 I have no any 577xx card, but I have something come to mind and it's
 easy to test, so I'll test with my cards.


 Could you compile a kernel with BGE_DEBUG and show me the dmesg?

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Another MII PHY fix.

2013-06-05 Thread Masanobu SAITOH

 Hi, all.

 Without following patch. Some drivers which don't use link
interrput check the link status only every sc->mii_anegticks.
It's late. The patch is a part of FreeBSD's mii_physubr.c r158649:

 
http://svnweb.freebsd.org/base/head/sys/dev/mii/mii_physubr.c?r1=150756&r2=158649

 Any objection to commit?

Index: mii_physubr.c
===
RCS file: /cvsroot/src/sys/dev/mii/mii_physubr.c,v
retrieving revision 1.76
diff -u -r1.76 mii_physubr.c
--- mii_physubr.c   6 Jun 2013 03:10:48 -   1.76
+++ mii_physubr.c   6 Jun 2013 06:07:47 -
@@ -324,23 +324,43 @@
/*
 * If we're not doing autonegotiation, we don't need to do
 * any extra work here.  However, we need to check the link
-* status so we can generate an announcement if the status
-* changes.
+* status so we can generate an announcement by returning
+* with 0 if the status changes.
 */
if ((IFM_SUBTYPE(ife->ifm_media) != IFM_AUTO) &&
-   (IFM_SUBTYPE(ife->ifm_media) != IFM_1000_T))
+   (IFM_SUBTYPE(ife->ifm_media) != IFM_1000_T)) {
+   /*
+* Reset autonegotiation timer to 0 to make sure
+* the future autonegotiation start with 0.
+*/
+   sc->mii_tick = 0;
return (0);
+   }
 
/* Read the status register twice; BMSR_LINK is latch-low. */
reg = PHY_READ(sc, MII_BMSR) | PHY_READ(sc, MII_BMSR);
if (reg & BMSR_LINK) {
/*
-* See above.
+* Reset autonegotiation timer to 0 in case the link is down
+* in the next tick.
 */
+   sc->mii_tick = 0;
+   /* See above. */
return (0);
}
 
/*
+* mii_tick == 0 means it's the first tick after changing the media or
+* the link became down since the last tick (see above), so return with
+* 0 to update the status.
+*/
+   if (sc->mii_ticks == 0)
+   return (0);
+
+   /* Now increment the tick */
+   sc->mii_ticks++;
+
+   /*
 * Only retry autonegotiation every N seconds.
 */
KASSERT(sc->mii_anegticks != 0);


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: mii_ticks = 0 problem

2013-06-05 Thread Masanobu SAITOH

(2013/06/03 11:41), Masanobu SAITOH wrote:

  Hi.

  The following patch fix a problem that a lot of phys misunderstand
the auto negotiation timer. Usually the autonegotiaton timer is set
to 10(MII_ANEGTICKS_GIGE), but a lot of dirver don't work as expected.

e.g.:

 (mii_ticks == 0)
 ifconfig bge0 media auto up
 (after 2 seconds, the link was up)
 ifconfig bge0 down
 ifconfig bge0 up
 (timer start from 2!!!)

  This problem is not serious but we should fix to work as we expected.

  Currently, some drivers do "sc->mii_ticks = 0;" BEFORE calling
*mii_phy_auto(). Some drivers don't do it. To not to forget it and to
reduce the duplication, I think it's better to move "sc->mii_ticks = 0;"
into *mii_phy_auto().

  Any objection?


Index: atphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/atphy.c,v
retrieving revision 1.12
diff -u -p -r1.12 atphy.c
--- atphy.c 23 Jul 2012 00:09:49 -  1.12
+++ atphy.c 3 Jun 2013 02:34:03 -
@@ -289,7 +289,6 @@ done:
if (sc->mii_ticks <= sc->mii_anegticks)
break;

-   sc->mii_ticks = 0;
atphy_mii_phy_auto(sc);
break;
}
@@ -405,6 +404,7 @@ atphy_mii_phy_auto(struct mii_softc *sc)
  {
uint16_t anar;

+   sc->mii_ticks = 0;
anar = BMSR_MEDIA_TO_ANAR(sc->mii_capabilities) | ANAR_CSMA;
if (sc->mii_flags & MIIF_DOPAUSE)
anar |= ANAR_X_PAUSE_TOWARDS;
Index: brgphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/brgphy.c,v
retrieving revision 1.63
diff -u -p -r1.63 brgphy.c
--- brgphy.c1 Apr 2013 13:41:37 -   1.63
+++ brgphy.c3 Jun 2013 02:34:04 -
@@ -581,6 +581,7 @@ brgphy_mii_phy_auto(struct mii_softc *sc
  {
int anar, ktcr = 0;

+   sc->mii_ticks = 0;
brgphy_loop(sc);
PHY_RESET(sc);

Index: ciphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/ciphy.c,v
retrieving revision 1.19
diff -u -p -r1.19 ciphy.c
--- ciphy.c 12 May 2009 14:31:27 -  1.19
+++ ciphy.c 3 Jun 2013 02:34:04 -
@@ -275,7 +275,6 @@ setit:
if (++sc->mii_ticks <= MII_ANEGTICKS)
break;

-   sc->mii_ticks = 0;
mii_phy_auto(sc, 0);
return (0);
}
Index: mii_physubr.c
===
RCS file: /cvsroot/src/sys/dev/mii/mii_physubr.c,v
retrieving revision 1.75
diff -u -p -r1.75 mii_physubr.c
--- mii_physubr.c   3 Oct 2012 07:08:58 -   1.75
+++ mii_physubr.c   3 Jun 2013 02:34:04 -
@@ -202,6 +202,7 @@ mii_phy_auto(struct mii_softc *sc, int w
struct mii_data *mii = sc->mii_pdata;
struct ifmedia_entry *ife = mii->mii_media.ifm_cur;

+   sc->mii_ticks = 0;
if ((sc->mii_flags & MIIF_DOINGAUTO) == 0) {
/*
 * Check for 1000BASE-X.  Autonegotiation is a bit
@@ -346,7 +347,6 @@ mii_phy_tick(struct mii_softc *sc)
if (++sc->mii_ticks <= sc->mii_anegticks)
return (EJUSTRETURN);

-   sc->mii_ticks = 0;
PHY_RESET(sc);

if (mii_phy_auto(sc, 0) == EJUSTRETURN)
Index: rgephy.c
===
RCS file: /cvsroot/src/sys/dev/mii/rgephy.c,v
retrieving revision 1.29
diff -u -p -r1.29 rgephy.c
--- rgephy.c18 Jul 2010 03:00:39 -  1.29
+++ rgephy.c3 Jun 2013 02:34:04 -
@@ -319,7 +319,6 @@ rgephy_service(struct mii_softc *sc, str
if (sc->mii_ticks <= sc->mii_anegticks)
return 0;

-   sc->mii_ticks = 0;
rgephy_mii_phy_auto(sc);
break;
}
@@ -428,6 +427,7 @@ rgephy_mii_phy_auto(struct mii_softc *mi
  {
int anar;

+   mii->mii_ticks = 0;
rgephy_loop(mii);
rgephy_reset(mii);

Index: urlphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/urlphy.c,v
retrieving revision 1.25
diff -u -p -r1.25 urlphy.c
--- urlphy.c31 Jan 2009 05:44:05 -  1.25
+++ urlphy.c3 Jun 2013 02:34:04 -
@@ -196,7 +196,6 @@ urlphy_service(struct mii_softc *sc, str
if (++sc->mii_ticks <= sc->mii_anegticks)
return (0);

-   sc->mii_ticks = 0;
PHY_RESET(sc);

if (mii_phy_auto(sc, 0) == EJUSTRETURN)




 Done.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: NFS vs jumbograms?

2013-06-03 Thread Masanobu SAITOH

(2013/06/04 5:50), David Holland wrote:

On Mon, Jun 03, 2013 at 11:37:29AM -0400, Mouse wrote:
  > At $JOB, we have two i386 machines with wm interfaces connected
  > back-to-back with a short patch cable and a /30 subnet of 192.168.  One
  > is NFS-serving some disk space to the other over this link.  They are
  > running 4.0.1 (with a few tweaks of mine, but nothing touching
  > sys/nfs); while it's fallen off the end of official support, I thought
  > someone might happen to recall enough to be useful

nfs hasn't changed much, so if the problem is in nfs (rather than in
netinet) it's probably still in -current.

not that this actually helps.

Also note that there are a fair number of cases in nfs where lost
packets result in timeouts that don't get handled properly causing
things to wedge. If you haven't yet, try using tcp mounts as that
tends to avoid exercising some of those cases.


 I sometimes heard that wm had some problems with jumbo MTU, but I've
never checked whether it really has a bug or not. I suspect the bug
is not in NFS stack but in wm.

 I don't use jumbo MTU, so usually don't test the function when I add
the support of a new chip. Sorry :(
If someone show me a reproduceable configuraton, I'll try to fix it.

 Regards.

--
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


mii_ticks = 0 problem

2013-06-02 Thread Masanobu SAITOH
 Hi.

 The following patch fix a problem that a lot of phys misunderstand
the auto negotiation timer. Usually the autonegotiaton timer is set
to 10(MII_ANEGTICKS_GIGE), but a lot of dirver don't work as expected.

e.g.:

(mii_ticks == 0)
ifconfig bge0 media auto up
(after 2 seconds, the link was up)
ifconfig bge0 down
ifconfig bge0 up
(timer start from 2!!!)

 This problem is not serious but we should fix to work as we expected.

 Currently, some drivers do "sc->mii_ticks = 0;" BEFORE calling
*mii_phy_auto(). Some drivers don't do it. To not to forget it and to
reduce the duplication, I think it's better to move "sc->mii_ticks = 0;"
into *mii_phy_auto().

 Any objection?


Index: atphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/atphy.c,v
retrieving revision 1.12
diff -u -p -r1.12 atphy.c
--- atphy.c 23 Jul 2012 00:09:49 -  1.12
+++ atphy.c 3 Jun 2013 02:34:03 -
@@ -289,7 +289,6 @@ done:
if (sc->mii_ticks <= sc->mii_anegticks)
break;
 
-   sc->mii_ticks = 0;
atphy_mii_phy_auto(sc);
break;
}
@@ -405,6 +404,7 @@ atphy_mii_phy_auto(struct mii_softc *sc)
 {
uint16_t anar;
 
+   sc->mii_ticks = 0;
anar = BMSR_MEDIA_TO_ANAR(sc->mii_capabilities) | ANAR_CSMA;
if (sc->mii_flags & MIIF_DOPAUSE)
anar |= ANAR_X_PAUSE_TOWARDS;
Index: brgphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/brgphy.c,v
retrieving revision 1.63
diff -u -p -r1.63 brgphy.c
--- brgphy.c1 Apr 2013 13:41:37 -   1.63
+++ brgphy.c3 Jun 2013 02:34:04 -
@@ -581,6 +581,7 @@ brgphy_mii_phy_auto(struct mii_softc *sc
 {
int anar, ktcr = 0;
 
+   sc->mii_ticks = 0;
brgphy_loop(sc);
PHY_RESET(sc);
 
Index: ciphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/ciphy.c,v
retrieving revision 1.19
diff -u -p -r1.19 ciphy.c
--- ciphy.c 12 May 2009 14:31:27 -  1.19
+++ ciphy.c 3 Jun 2013 02:34:04 -
@@ -275,7 +275,6 @@ setit:
if (++sc->mii_ticks <= MII_ANEGTICKS)
break;
 
-   sc->mii_ticks = 0;
mii_phy_auto(sc, 0);
return (0);
}
Index: mii_physubr.c
===
RCS file: /cvsroot/src/sys/dev/mii/mii_physubr.c,v
retrieving revision 1.75
diff -u -p -r1.75 mii_physubr.c
--- mii_physubr.c   3 Oct 2012 07:08:58 -   1.75
+++ mii_physubr.c   3 Jun 2013 02:34:04 -
@@ -202,6 +202,7 @@ mii_phy_auto(struct mii_softc *sc, int w
struct mii_data *mii = sc->mii_pdata;
struct ifmedia_entry *ife = mii->mii_media.ifm_cur;
 
+   sc->mii_ticks = 0;
if ((sc->mii_flags & MIIF_DOINGAUTO) == 0) {
/*
 * Check for 1000BASE-X.  Autonegotiation is a bit
@@ -346,7 +347,6 @@ mii_phy_tick(struct mii_softc *sc)
if (++sc->mii_ticks <= sc->mii_anegticks)
return (EJUSTRETURN);
 
-   sc->mii_ticks = 0;
PHY_RESET(sc);
 
if (mii_phy_auto(sc, 0) == EJUSTRETURN)
Index: rgephy.c
===
RCS file: /cvsroot/src/sys/dev/mii/rgephy.c,v
retrieving revision 1.29
diff -u -p -r1.29 rgephy.c
--- rgephy.c18 Jul 2010 03:00:39 -  1.29
+++ rgephy.c3 Jun 2013 02:34:04 -
@@ -319,7 +319,6 @@ rgephy_service(struct mii_softc *sc, str
if (sc->mii_ticks <= sc->mii_anegticks)
return 0;
 
-   sc->mii_ticks = 0;
rgephy_mii_phy_auto(sc);
break;
}
@@ -428,6 +427,7 @@ rgephy_mii_phy_auto(struct mii_softc *mi
 {
int anar;
 
+   mii->mii_ticks = 0;
rgephy_loop(mii);
rgephy_reset(mii);
 
Index: urlphy.c
===
RCS file: /cvsroot/src/sys/dev/mii/urlphy.c,v
retrieving revision 1.25
diff -u -p -r1.25 urlphy.c
--- urlphy.c31 Jan 2009 05:44:05 -  1.25
+++ urlphy.c3 Jun 2013 02:34:04 -
@@ -196,7 +196,6 @@ urlphy_service(struct mii_softc *sc, str
if (++sc->mii_ticks <= sc->mii_anegticks)
return (0);
 
-   sc->mii_ticks = 0;
PHY_RESET(sc);
 
if (mii_phy_auto(sc, 0) == EJUSTRETURN)


-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)


Re: ETHERCAP_* & ioctl()

2012-10-31 Thread Masanobu SAITOH
 Hi, all.

 I sent the followin mail more than two years ago.

> http://mail-index.netbsd.org/tech-kern/2010/07/28/msg008613.html

 As the starting point to solve this problem, I committed the change to
add SIOCGETHERCAP stuff.

 Example:
> msk0: flags=8802 mtu 1500
> ec_capabilities=5
> ec_enabled=0
> address: 00:50:43:00:4b:c5
> media: Ethernet autoselect
> status: no carrier
> wm0: flags=8843 mtu 1500
> 
> capabilities=7ff80
> 
> enabled=7ff80
> ec_capabilities=7
> ec_enabled=0
> address: 00:1b:21:58:68:34
> media: Ethernet autoselect (1000baseT 
> full-duplex,flowcontrol,rxpause,txpause)
> status: active
> inet 192.168.1.5 netmask 0xff00 broadcast 192.168.1.255
> inet6 fe80::21b:21ff:fe58:6834%wm0 prefixlen 64 scopeid 0x2
> inet6 2001:240:694:1:21b:21ff:fe58:6834 prefixlen 64


 What do you think about this output?

-- 
---
SAITOH Masanobu (msai...@execsw.org
 msai...@netbsd.org)