Re: MCLADDREFERENCE() incrementing the wrong ext_refcnt?

2024-03-31 Thread Edgar Fuß
> So I think > atomic_inc_uint(&(o)->m_ext.ext_refcnt);\ > should really be > atomic_inc_uint(&(o)->m_ext_ref->m_ext.ext_refcnt); \ > which, of course, is the same thing if MEXT_ISEMBEDDED(o) is true. > Am I getting something wrong? Self-answer:

MCLADDREFERENCE() incrementing the wrong ext_refcnt?

2024-03-22 Thread Edgar Fuß
Hello. I'm under the impression that MCLADDREFERENCE() may increment the wrong ext_refcnt. In case it's permitted (I cant't find anything to the contrary) to call MCLADDREFERENCE(m1, m2) and then MCLADDREFERENCE(m2, m3), then the second call will increment m2's ext_refcnt where it should be

_KERNEL_OPT and 0x6e074def

2023-12-19 Thread Edgar Fuß
What's the point of #include'ing opt_foobar.h only if _KERNEL_OPT is defined and what's magic about 0x6e074def?

Re: Notes on kern/57133

2023-10-06 Thread Edgar Fuß
> One change I can try is to put the diagnostic printf higher in the > scsipi_request function As the problem is so simple to reproduce, I'd put it just below xs = arg. Or set a breakpoint on scsipi_get_opcodeinfo(), then, when hitting it, one on mpii_scsipi_request() (provided you find it's

Re: Notes on kern/57133

2023-10-04 Thread Edgar Fuß
> provide details on what this command is? A3h/0Ch is REPORT SUPPORTED OPERATION CODES The call is most probably from dev/scsipi:scsipi_get_opcodeinfo(). I'm still unsure how resid can be 0 at that point. scsipi_enqueue_xs() sets resid to datalen (which is undocumented). Apart from the path

Re: dumping on RAIDframe

2023-09-25 Thread Edgar Fuß
> you dump a memory block that isn't a multiple of a disk sector > (according to disklabel) You mean this one (from disklabel raid0): bytes/sector: 512 ?

Re: dumping on RAIDframe

2023-09-25 Thread Edgar Fuß
EF> dumping to dev 18,1 (offset=1090767, size=8252262): GO>Dumping to a RAID 1 set is supported in -8. But yes, none of those GO>values seem to align with each other. 18,1 is 'raid0b' thouugh, so that GO>part seems correct. MvE> offset and size relate to the dump data (dumplo and dumpsize),

boot.cfg location (was: GPT attributes in dkwedge [PATCH])

2023-09-25 Thread Edgar Fuß
> boot[].cfg is > searched in EFI par[tit]ion /EFI/NetBSD/boot.cfg > and root partition /boot.cfg. But how can EFI locate it on the root partition if it tells where the root partition lives?

Locating boot.cfg on ESP (was: GPT attributes in dkwedge [PATCH])

2023-09-25 Thread Edgar Fuß
> | It's not obviously where efiboot finds boot.cfg, since that's in > | esp:/EFI/NetBSD/boot.cfg or, > > And we correctly interpret that, always? It works for me on four servers I recently set up if I put it into /EFI/NetBSD on the ESP. It also, for reasons unknown to me, works on one other

Re: panic on mfii(4) vd removal

2023-09-22 Thread Edgar Fuß
> I get a panic if I remove a virtual disk from an mfii(4) device. That's another blunder in mfii(4). Patch (including the last) attached. Index: sys/dev/pci/mfii.c === RCS file: /cvsroot/src/sys/dev/pci/mfii.c,v retrieving revision

panic on mfii(4) vd removal

2023-09-21 Thread Edgar Fuß
I get a panic if I remove a virtual disk from an mfii(4) device. What I found out is that mfii_aen_ld_update() calls sysmon_envsys_sensor_detach(), which (near the end of the routine) calls TAILQ_REMOVE(). In that, the last statement (minus QUEUEDEBUG_TAILQ_POSTREMOVE()), which is

Adding a virtual disk to mpii(4)

2023-09-21 Thread Edgar Fuß
After adding a virtual disk to an mfii(4) device (racadm createvirtualdisk, in my case), you get a nice mfii0: logical drive 2 added (target 2) message, but scsictl scsibus0 scan 2 0 doesn't find any drive. That's because sc_ld[i].ld_present is still unset from mfii_attach() and

dumping on RAIDframe

2023-09-20 Thread Edgar Fuß
Didn't RAIDframe recently (for certain values of "recently") gain the function to dump on a level 1 set? Should this work in -8? swapctl -z says "dump device is raid0b" (and raid0 is a level 1 RAID), but reboot 0x100 in DDB says dumping to dev 18,1 (offset=1090767, size=8252262): dump device

typo in raidN.conf leading to alledgedly failed component

2023-09-12 Thread Edgar Fuß
I set up a server with a RAIDframe level 1 RAID and forgot raidctl -A softroot. So I booted an installation kernel via PXE, typed in a /tmp/raid0.conf and did raidctl -c /tmp/raid0.conf raid0, only I mistyped the name of the first component. That led to "hosed component", but worse, failed that

raidctl -A softroot and a failed component

2023-09-12 Thread Edgar Fuß
I had a RAIDframe level 1 RAID with the first component marked as failed, e,g, component0: failed /dev/dkN: optimal and although the set was configured -A softroot, the kernel didn't configure raid0a as the root file system, presumably because the dk numbers didn't match. I was

Re: Hard link creation witout write access

2023-09-07 Thread Edgar Fuß
> a likely source of security issues. Why, exactly? I hope you need search permission to the original file (you certainly need search and write permission to the destination directory), so what can you do after the link you couldn't have done before? What about rename instead of link, should

Re: Maxphys on -current?

2023-08-04 Thread Edgar Fuß
Hasn't there been a tls-maxphys branch?

unable to create xfer table DMA map for drive 0, error=12

2023-08-03 Thread Edgar Fuß
I attached a 2,5" SSD to a machine, did a drvctl -r ata_hl atabus1 and got svwsata0:1: unable to create xfer table DMA map for drive 0, error=12 wd2(svwsata0:1:0): using PIO mode 4 Is this a problem with -6 that machine runs or what does it mean?

Re: compare kernel config

2023-05-31 Thread Edgar Fuß
> Do we have a reliable way to compare kernel configations? config -x and diff?

Re: USB-related panic in 8.2_STABLE

2023-04-28 Thread Edgar Fuß
> The same patch should apply just as well on netbsd-8. OK, I just did that. But we still don't know what led to the disconnect. Does the ohci0: 1 scheduling overruns give any clue?

Re: USB-related panic in 8.2_STABLE

2023-04-27 Thread Edgar Fuß
> list *(ugen_get_cdesc+0xb1) 0x802f8f2e is in ugen_get_cdesc (/usr/src-8/sys/dev/usb/ugen.c:1376). 1371usb_config_descriptor_t *cdesc, *tdesc, cdescr; 1372int len; 1373usbd_status err; 1374 1375if (index == USB_CURRENT_CONFIG_INDEX) {

Re: USB-related panic in 8.2_STABLE

2023-04-27 Thread Edgar Fuß
> You didn't give timing. Unfortunately, we don't know the timing. We don't know when and why the UPS disconnected. > normally the UPS doesn't disconnect It doesn't. Why should it?

SEGV in mmap() when building lang/gcc8 with devel/binutils

2023-03-10 Thread Edgar Fuß
Sorry for the cross-post, but the problem is so weird that I'm confused what nature it is. For complicated reasons (see below for details), I'm trying to build lang/gcc8 so that it uses gas/gld from devel/binutils instead of /usr/bin/{as,ld}. I put DEPENDS+=

Re: ATA TRIM?

2022-12-25 Thread Edgar Fuß
> According to that PDF, dholland is wrong. I fail to see a behaviour that would be allowed due to dholland@'s definition, but not according to the one you cited, nor the other way round.

acpiwmibus at acpiwmi0 not configured

2022-12-19 Thread Edgar Fuß
I notice a line acpiwmibus at acpiwmi0 not configured in the autoconf messages. Indeed, my kernel config has acpiwmi* at acpi? and wmidell* at acpiwmibus? but no attachment for any acpiwmibus, nor does any other kernel config. Is there something magic about acpiwmibus or

mpii_start() vs. mfii_start(): bus_space_write_raw_8(), bus_space_barrier()

2022-10-11 Thread Edgar Fuß
I'm investigating timeout problems with my mpii(4) device (after the driver has been converted to MSI(-X). I'm trying to understand both sys/dev/pci/mpii.c and mfii.c since they adress the same hardware with different firmware. Comparing mpii_start() with mfii_start(), I'm stumbling over a

mfii0: cmd timeout

2022-09-19 Thread Edgar Fuß
This is NOT kern/55192. So I thought I had mastered my PERC H330; set up two virtual volumes containing a one-disc RAID 0, set up GPTs, EFI boot volumes, built a RAIDframe RAID 1, disklabeled that, newfs'd the partitions, only remaining step being unpacking the sets (and a few config files).

Re: Dell PERC H330: no disks, no volumes

2022-09-15 Thread Edgar Fuß
> There is a PERC H330 and a PERC HBA330 and the Dell PERC9 user manual > (includes the H330) says you can boot it in HBA mode. Not sure if > that means that you can chose the firmware. When I set the H330 to HBA mode, it still attaches as mfii0, the only difference to RAID mode being that the

Re: Dell PERC H330: no disks, no volumes

2022-09-14 Thread Edgar Fuß
> Yes, in the controller setup you can create "Non-RAID Disks" (aka > JBOD) or "Virtual Disks" (aka RAID volumes) Where exactly are those Non-RAID Disks hiodden? > In theory you could use bioctl to create and manage volumes, but the > driver doesn't implement it. Ah, interesting. That was the way

Re: Dell PERC H330: no disks, no volumes

2022-09-14 Thread Edgar Fuß
> I don't remember the details (and it depends on the controller version), > but you need to have physical disks assigned to one (or more) RAID volume, > and then the RAID volume has to be exported as one (or more) virtual disks. But what if I want to pass the bare discs to NetBSD for a RAIDframe

Re: panic in sysmon_envsys_unregister

2022-09-14 Thread Edgar Fuß
> I need to build a new install image (since I have no discs). I applied your fix to -8 and the panic disappeared. Thanks for the quick fix. Maybe it should be pulled up?

Re: Dell PERC H330: no disks, no volumes

2022-09-14 Thread Edgar Fuß
Oh, I wasn't aware the H330 and HBA330 are different devices! > There is a PERC H330 and a PERC HBA330 and the Dell PERC9 user manual > (includes the H330) says you can boot it in HBA mode. Not sure if > that means that you can chose the firmware. Oh well. So the HBA330 is a PowerEdge RAID

Re: panic in sysmon_envsys_unregister

2022-09-14 Thread Edgar Fuß
> This should be fixed by mfii.c rev. 1.26. Please update it and retry. Thanks. I need to build a new install image (since I have no discs). The other question is why the register call fails. According to the BIOS setup, the controller has no sensors. Could that be the problem?

panic in sysmon_envsys_unregister

2022-09-13 Thread Edgar Fuß
I get a panic on shutdown: netbsd:sysmon_envsys_unregister+0x128: cmpq0(%rdx),%r12 sysmon_envsys_unregister mfii_detach config_detach config_detach_all cpu_reboot kern_reboot sys_reboot syscall ds 4da0 es 0 fs 1 gs c632 rdi 818f0510sme_global_mtx rsi

Re: Dell PERC H330: no disks, no volumes

2022-09-13 Thread Edgar Fuß
> These controller chips can run two different kinds of firmware. > The mfii driver is for talking to the RAID firmware ("IR mode") > while the mpii driver is for talking to the vanilla SAS firmware > ("IT mode"). Ah, and how do I know which mode my card runs? mpii(4) explicitly mentions the Dell

Re: Dell PERC H330: no disks, no volumes

2022-09-13 Thread Edgar Fuß
It appears to me we have two drivers for the SAS3008: mfii(4) and mpii(4). Why?

Dell PERC H330: no disks, no volumes

2022-09-13 Thread Edgar Fuß
So after I managed to boot my new PowerEdge R6515, the next challenge is that I have no discs. The machine is equipped with a PERC H330 mini, a SCSI backplane and two SATA SSDs. I do see the discs in the BIOS's RAID controller configuration menu. Autoconfiguration says: mfii0 at pci1

Re: debugging a kernel that doesn't start

2022-09-13 Thread Edgar Fuß
> I'm trying to run NetBSD on a Dell PowerEdge R6515, and the kernel is being > loaded (PXE or USB) but then the machine hangs hard. I've made a giant step forward: booting the -current install image from a USB key /via UEFI/ works. Maybe it's a bug in the server's CSM. Thanks for all the

Re: debugging a kernel that doesn't start

2022-09-12 Thread Edgar Fuß
> then you can bypass all the worries of using BIOS routines or whatnot > and just poke the hardware directly. Probably stupid question: I can switch the machine to UEFI. Is it easier to debug things from there that from a BIOS boot?

Re: debugging a kernel that doesn't start

2022-09-12 Thread Edgar Fuß
> That could be a strong clue or it could be unrelated. OK, just in case that might be another clue: If I want to interrupt the boot countdown, the first keystroke gets lost, I need to press a second time.

Re: debugging a kernel that doesn't start

2022-09-12 Thread Edgar Fuß
> If you can setup a serial console, it may make things much easier. I do have a serial port on the machine. > I almost always use serial consoles on dev machines; I don't remember the > details but doing the equivalent of a putchar very early was possible. Is the BIOS still available or how does

Re: debugging a kernel that doesn't start

2022-09-12 Thread Edgar Fuß
> Have you tried booting a custom kernel with some drivers removed? No. I wouldn't know which drivers to remove. The problem is the Kernel utters absolutely nothing, so it must hang very, very early. > have you tried an uncompressed one? No, but I guess the official install image (on a USB key)

debugging a kernel that doesn't start

2022-09-12 Thread Edgar Fuß
I'm trying to run NetBSD on a Dell PowerEdge R6515, and the kernel is being loaded (PXE or USB) but then the machine hangs hard. What's the way to debug a kernel that hangs so early that you can't printf or drop into ddb? I guess that's a phenomenon quite common for a new port or changes to

Re: mfii(4) and Dell PERC

2022-08-08 Thread Edgar Fuß
Thanks for your answers. > Some people reported that kern/56669 (and perhaps kern/55192) still exist > on some systems :-( Hm. > bioctl mfi(i)X show Ah, thanks. What do I do in case a drive fails? Will adding a hot spare automagically start a reconstruction? > If your system has other number,

mfii(4) and Dell PERC

2022-08-08 Thread Edgar Fuß
I'm unsure whether this is the right list, is port-amd64 more appropriate? Does anyone use a Dell PERC H730P or similar RAID controller in RAID mode? mfii(4) says all configuration is done via the controller's BIOS. Does that mean I need to shut down in case a drive fails an I need to rebuild?

Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-24 Thread Edgar Fuß
> the request count on the mclpl line is incrementing at a pretty fast rate Maybe you're running into the same problem as me (see the "mbuf cluster leak?" thread on tech-net). Try a kernel with MBUFTRACE. If that shows you (via netstat -mss) a large number of tx bufs on a particular vlan

Re: mfii hanging on boot

2022-06-23 Thread Edgar Fuß
> I committed the change yesterday. I don't get what the #if defined(__LP64__) && 0 is for.

Re: killed: out of swap

2022-06-15 Thread Edgar Fuß
> Perhaps my understanding is wrong No.

Re: killed: out of swap

2022-06-14 Thread Edgar Fuß
> I assume my impression is completely wrong (today). OK, thanks for all the explanations and insights.

Re: killed: out of swap

2022-06-14 Thread Edgar Fuß
> So what should the kernel do? I don't know how thigs work under the hood today (I might have partially known in the times of sbrk()), but I would suppose that malloc() will ultimatively result in some system call enlarging the heap/data segment/whatever. That system call could simply fail. I

killed: out of swap

2022-06-14 Thread Edgar Fuß
I have a program that keeps malloc()ing (and scribbling a bit into the allocated memory) until malloc() fails. The intention is to put pressure on the VM system to find out how much pool cache memory it can reclaim. When I run that program (with swap space unconfigured), it doesn't terminate

Re: membar_enter semantics

2022-02-15 Thread Edgar Fuß
I know close to nothing about the subject in question, but maybe thoughts from a non-expert may be useful: If there's a widely adopted terminology, one should probably stick to it even if the wording is counter-intuitive or misleading (but note that fact in the documentation). After all,

findroot: double match for boot device (was: Autoconfigured RAIDframe raid* numbering)

2021-09-02 Thread Edgar Fuß
I do know that, but the warning seems to be new. It didn't appear before, but I had -A root (which now is force) before.

Re: Autoconfigured RAIDframe raid* numbering

2021-09-02 Thread Edgar Fuß
> > Additinally, I got > > WARNING: findroot: double match for boot device (sd4, sd5) > > (where sd4a/5a are raid2's components) before > > boot device: raid2 > > root on raid2a dumps on raid2b > > What does that mean? > > Is this with -current newer than > >

Autoconfigured RAIDframe raid* numbering

2021-09-02 Thread Edgar Fuß
If I have a number of autoconfigured RAIDframe sets on one machine, is there any guarantee which raid* number a set gets assigned? Is that numbering stable even if I remove one set (in the sense of physically un-plugging the drives) so the components will get different sd* numbers? I had raid0

RAIDframe: reconstucting a temporarily lost drive (was: SATA rescan)

2021-06-16 Thread Edgar Fuß
> drvctl -r -a ata_hl atabusX OK, that (after moving to a different slot) brought the drive back again. However, the raid had failed the missing drive (whether upon booting with the missing drive or shortly before the crash I can't tell). I had /dev/wd0a optimal plus component1 failed. I guess

SATA rescan?

2021-06-15 Thread Edgar Fuß
Is there a way (short of re-booting) to re-scan a SATA port for a disc absent (or dysfunctional) during the boot? I.e., something like scsictl rescan?

Re: panic in iic_search()

2021-06-15 Thread Edgar Fuß
This is another place where I have local patches in my tree that haven't been integated (see kern/55745). This is a regression in all "supported" versions of NetBSD (until -11 is released) rendering I2C inoperable on popular hardware.

8.x pmap fixes (was: boot -d)

2021-06-15 Thread Edgar Fuß
> Here they are (for netbsd-8). I can boot -d with them [...] I just noticed that I still have these patches locally. Any chances to get them into -8? Should I file a PR?

Re: timeouts connecting to pgsql database

2021-02-20 Thread Edgar Fuß
> What filesystem options are you using for wherever the database files > are located ? Back in the day I experienced that LFS was incredibly fast for a (MySQL) database. There were problems with the cleaner crashing, though.

Re: partial failures in write(2) (and read(2))

2021-02-11 Thread Edgar Fuß
> I suppose libc could set a default handler for the new signal, and do some > extra work to set errno. Then the libc routine could better use a new syscall, no?

Re: X vs serial console?

2021-02-09 Thread Edgar Fuß
> Is there any way I can test for it? Connect something to the HDMI outputs?

Re: X vs serial console?

2021-02-09 Thread Edgar Fuß
Could it be the case that the X server expects some aspects of the video hardware to be initialized by the video console driver that are uninitialized in the serial console case? E.g., as you say outputs are shared between HDMI and VGA, the X video simply goes to the HDMI output?

Re: USB lockup (probably solved)

2020-12-01 Thread Edgar Fuß
Looks like I'm making progress after all. > The change [nick] referred to was > > Revision 1.254.2.76 / (download) - annotate - [select for diffs], Mon > May 30 06:46:50 2016 UTC (4 years, 5 months ago) by skrll > Branch: nick-nhusb [...] > > Restructure the abort code for TD based transfers

Re: USB lockup

2020-11-28 Thread Edgar Fuß
I looked into the usbhist now. > is something being aborted? Yes. > I guess the E20 TD got written out with incorrect next_td, or some other > error condition caused the mixup. I think the only sane explanation absent a controller bug is that, at the time the HC finished E20, HcDoneHead was FA0

Re: USB lockup

2020-11-27 Thread Edgar Fuß
> Really hard to help without seeing the full ohcidebug usbhist log. I replaced the panic with abreak out of the done loop. Find attached my diff plus the usbhist from where I first started the offending command (which locks up the second time called). I didn't look into the log myself yet.

Re: USB lockup

2020-11-26 Thread Edgar Fuß
Thanks a lot for looking into this! > Really hard to help without seeing the full ohcidebug usbhist log. The problem is that file system (or block I/O) seems to lock up so the usbhist is hard to get out of the machine other than by canera. I guess dump-ing will take ages to complete (16G RAM).

Re: USB lockup

2020-11-26 Thread Edgar Fuß
> Add a check to ohci_softintr to see if the list goes circular and enter > ddb / dump usbhist when it does... I already did add a panic and it fired. I'm still trying to find out how that happens. What I'm seeing (dumped by device_ctrl_start()) is a chain of four TDs (named here after their

Re: USB lockup

2020-11-24 Thread Edgar Fuß
I guess there's something different going on. Unless I'm mistaken, the list is circular in the td_nexttd sense, but not in the nexttd sense.

Re: USB lockup

2020-11-24 Thread Edgar Fuß
> so the td list must have gone circular, no? It's indeed circular (in the td_nexttd sense), as addionally inserted debugging output revealed. It also happens in uniprocessor (boot -1) mode.

Re: USB lockup

2020-11-23 Thread Edgar Fuß
> So, during the partial lockup, I see > ohci_softintr#63@0: add TD 0x80013ec2de20 > ohci_softintr#63@0: add TD 0x80013ec2dea0 that's ohci_softintr#63@0: add TD 0x80013ec2dfa0 > ohci_softintr#63@0: add TD 0x80013ec2dee0 So I think it's endlessly looping in the

Re: USB lockup

2020-11-23 Thread Edgar Fuß
> The ddb backtrace usually is > bus_space_read_4() > bintime() > ohci_softintr() > usb_soft_intr() > softint_dispatch() > > The system call causing the lock-up is a USB_DEVICEINFO ioctl on /dev/usb0 > with udi_addr=2, which corresponds to ugen0. I tried a -current kernel from nyftp today, and

USB lockup (was: ktrace-ing a command that locks up the machine)

2020-11-20 Thread Edgar Fuß
> Hmmm, this was usb, right? Yes. > Maybe turn on options USBHIST (and/or EHCIHIST, OHCIHIST, UHCIHIST, > XHCIHIST). None of these seem to be described in options(4) man > page, but you can dump the debug data using ``vmstat -u histname''. > And get a listof the actual histname's with ``vmstat

USB debugging (was: ktrace-ing a command that locks up the machine)

2020-11-18 Thread Edgar Fuß
On Wed, Nov 18, 2020 at 09:05:47AM -0500, Greg Troxel wrote: > another suggestion is to enable USB debugging in the kernel and use a serial > console (or even just framebuffer) to see the last message before crash. I set options {USB,OHCI,EHCI}_DEBUG and sysctl -w hw.{usb,ohci,ehci}.debug=20 and

Re: ktrace-ing a command that locks up the machine

2020-11-18 Thread Edgar Fuß
> ktrace over NFS. That would be -- eh -- somewhat involved. I doubt it will work given that writing to an FS mounted -o sync gives an empty file.

Re: ktrace-ing a command that locks up the machine

2020-11-18 Thread Edgar Fuß
> Suggestion: put the ktrace file on a filesystem mounted -o sync. That (with ktrace -s) gave me an empty file.

ktrace-ing a command that locks up the machine

2020-11-18 Thread Edgar Fuß
So after fixing kern/53311 and kern/55745 on -8, I'm back to one nesting level down my original task. I have a command that (when run the second time and with certain USB devices connected) will irrevertibly (to me) partly (no console switching) lock up the machine. I need to enter DDB and

USB lock-ups

2020-11-16 Thread Edgar Fuß
Hello again. So after backporting the -current pmap fixes to -8 in order to be able to be able boot -d in order be able to examine I2C panics and after fixing them I have an operational -8 machine again only to find that the USB problems that made me update are still there. The simplest

Re: boot -d

2020-11-16 Thread Edgar Fuß
> So there seems to be something seriously amiss with I2C on -8 (and -9). After fixing that, it boots again (with the adopted pmap changes). Nevertheless, someone should review them, of course.

Re: boot -d

2020-11-16 Thread Edgar Fuß
> Why not take spdmem out of your kernel config for now and test the > pmap patches ? It then panics in dbcool_chip_ident(). So there seems to be something seriously amiss with I2C on -8 (and -9).

Re: boot -d

2020-11-13 Thread Edgar Fuß
> Why not take spdmem out of your kernel config for now and test the > pmap patches ? Yes, could do that next week (ENOTIME currently). Anything special to test? I've no idea what the code does resp. when it gets used.

Re: boot -d

2020-11-13 Thread Edgar Fuß
> I‘ve backported the fixes, will post them later. Here they are (for netbsd-8). I can boot -d with them, but because of the spdmem panics, I can't tell whether the machine would run with them. Someone(TM) should review them and request a pullup, please. Not sure what to do with the

Re: boot -d

2020-11-13 Thread Edgar Fuß
> Am 12.11.2020 um 20:41 schrieb Andreas Gustafsson : > > t's probably easier to revert src/sys/arch/x86/x86/db_memrw.c 1.6 I‘ve backported the fixes, will post them later.

Re: boot -d

2020-11-12 Thread Edgar Fuß
> It's probably easier to revert src/sys/arch/x86/x86/db_memrw.c 1.6. As far as I understood (which may well be wrong) the fixes fixed a real problem that only surfaced on that change by chance and might have other consequences?

Re: boot -d

2020-11-12 Thread Edgar Fuß
> This looks like PR 53311. Ah, thanks! > The commit where that problem started (src/sys/arch/x86/x86/db_memrw.c 1.6) > was pulled up to to the -8 branch, and apparently the commits that fixed it > were not. I currently seem to attract pull-ups that mess up things. I had a look at the relevant

boot -d

2020-11-12 Thread Edgar Fuß
Hello again. In about the third nesting level of what I wanted to do in the first place, I tried "boot netbsd -d" in the secondary boot. It loads the kernel, then complains about the ffs module missing (I don't use modules and don't have an 8.2 directory on that machine), clears the screen,

panic in iic_search()

2020-11-11 Thread Edgar Fuß
I have an AMD64 server running 8/amd64, which ran happily (other than USB issues, which is another story) with 8.1_STABLE from September 2019. I updated to netbsd-8 from yesterday (so that's 8.2_STABLE) and a newly compiled kernel crashes in iic_search(). The last line printed before that is:

Re: RAIDframe: what if a disc fails during copyback

2020-10-30 Thread Edgar Fuß
> it locks out all other non-copyback IO in order to finish the job! Oops! > Locking out all other IO is very poor... but if it's a small enough RAID set > you might be able to get away with the downtime for the copyback... Certainly not. > You shouldn't need to reboot for this... the 'failing

Re: RAIDframe: what if a disc fails during copyback

2020-10-30 Thread Edgar Fuß
Thanks for the detailed answer. > it's still there, and it does work, That's reassuring to know. > but it's not at all performant or system-friendly. Just how bad is it? > If you want the components labelled nicely, give the system a reboot Re-booting our file server is something I like to

Re: RAIDframe: what if a disc fails during copyback

2020-10-29 Thread Edgar Fuß
There still seems to be confusion on what I did. Let A and B be the two original components, C a spare (in the cupboard) and B' be B with the new firmware. I start with A and B as the two components of a RAID-1. Now B failes. I have a degraded RAID with A alone. I plug in C, scsictl scsibus0

Re: RAIDframe: what if a disc fails during copyback

2020-10-29 Thread Edgar Fuß
> So you have drives A, B, and C. A and B were live. Let's say B is the > one that failed. You reconstructed onto C and have been running with A > and C. Yes. > Now you have a new B (which in this case is the same hardware with new > firmware) and want to put it back into service. I'm not

RAIDframe: what if a disc fails during copyback

2020-10-29 Thread Edgar Fuß
(I could probably direct this question to oster@ instead of tech-kern@) In a RAIDframe RAID-1, a disc failed and I reconstructed on a spare. Now I want to replace the failed component (actually by the same disc, which needed a firmware update) and want to copyback to it. How will RAIDframe

Re: fsck updating but not fixing filesystem

2020-08-24 Thread Edgar Fuß
> I have a reasonably large ffs filesystem (7.4GB, 35,459,874 files) I gues you mean 7.4TB? I remember (shudder) something similar, where the file server would panic (bad dir), fsck would fix some dirs (missing . or ..), the file server would panic ... rinse and repeat. Slightly short of me

Re: SIGCHLD and sigaction()

2020-08-16 Thread Edgar Fuß
> I don't understand what problem queued SIGCHLD was invented to address. My impression is that it allows you to get notified of state changes of your child processes. If one signal could annonce several state changes, how would you know what these state changes are?

SIGCHLD and sigaction()

2020-08-15 Thread Edgar Fuß
Another question in the context of SIGCHLD: When I install a SIGCHLD handler via sigaction() using SA_SIGINFO, is it guaranteed that my handler is called (at least) once per death-of-a-child? There is sentence in SUS If SA_SIGINFO is set in sa_flags, then subsequent occurrences of sig

Re: wait(2) and SIGCHLD

2020-08-14 Thread Edgar Fuß
1. Sample program attached. Change SIG_IGN to SIG_DFL to see the difference. 2. macOS seems to behave the same way, as does Linux. 3. I don't see where POSIX defines or allows this, but given 2., I'm surely missing something. 4. The wording in wait(2) could be improved to clarify this is

Re: wait(2) and SIGCHLD

2020-08-14 Thread Edgar Fuß
> I'm not sure I've completely understood your question Probably not. Or I don't get what you are trying to say. What I observe is that a process that explicitly ignores SIGCHLD (SIG_IGN), then forks a child which exits, when wait()ing for the child, gets ECHILD (i.e., wait returns -1 and errno

Re: wait(2) and SIGCHLD

2020-08-14 Thread Edgar Fuß
The second question (that I forgot in the original mail) is whether wait(2) returning ECHILD for whatwever handling of SIGCHLD is covered by POSIX.

wait(2) and SIGCHLD

2020-08-14 Thread Edgar Fuß
I'm confused regarding the behaviour of wait(2) wrt. SIGCHLD handling. The wait(2) manpage says: wait() will fail and return immediately if: [ECHILD]The calling process has no existing unwaited-for child processes; or no status from the terminated

Re: Horrendous RAIDframe reconstruction performance

2020-06-28 Thread Edgar Fuß
> That's the reconstruction algorithm. It reads each stripe and if it > has a bad parity, the parity data gets rewritten. That's the way parity re-write works. I thought reconstruction worked differently. oster@?

  1   2   3   4   5   >