I am running DragonFly on linux-kvm with virtio-blk for years, and I've only seen rarely issues (maybe 1-2 times a year there is a crash). I will give it a try once I update the machine.

Regards,

  Michael

On 05/28/16 02:23, Matthew Dillon wrote:
Ok, Zachary noted that ivadasz had a patch.  Imre and I went over it on
IRC and he committed the patch to master.  I also committed some
additional changes so it would be great if anyone using master + virtio
in a virtual-hosted environment can [re]test the changes.

There are likely going to be numerous other issues with virtual hosting
not yet addressed.

-Matt

On Fri, May 27, 2016 at 10:08 AM, Matthew Dillon <[email protected]
<mailto:[email protected]>> wrote:

    Virtio (for block storage devices) could be the cause.  There are
    known bugs in the DragonFly driver for virtio which haven't been
    tracked down yet (not enough of the devs are using virtual hosting
    to be able to reproduce the problem in a debugable way).

    -Matt

    On Fri, May 27, 2016 at 7:39 AM, Steve Petrie, P.Eng.
    <[email protected] <mailto:[email protected]>> wrote:

        Greetings To DragonFlyBSD List,

        The subject of random server crashes with DragonFly running
        running on a virtualized host machine, is of great interest (and
        concern) to me. Caveat: I am an (almost) complete DragonFly newbie.

        Please see my commens inline below.

        Steve

        ----- Original Message ----- From: "Stefan Unterweger"
        <[email protected]
        <mailto:[email protected]>>
        To: "Matthew Dillon" <[email protected]
        <mailto:[email protected]>>
        Cc: <[email protected] <mailto:[email protected]>>
        Sent: Friday, May 27, 2016 3:38 AM
        Subject: Re: Random server crashes every few weeks (smp_invltlb:
        endless loop […] retrysmp_invltlb: ipi sent)



            * Matthew Dillon on Thu, May 26, 2016 at 11:00:18AM -0700:

                It's really hard to say from something which is
                virtually hosted.  It kinda
                sounds like the virtual host isn't assigning enough of
                its own cpus to the
                virtual host.  The fact that DragonFly is complaining
                about smp_invltlb()
                implies that the host's virtualized cpu threads are not
                getting scheduled
                properly.

                One thing to note is that we do not do any instruction
                escapes to hint to
                virtual hosts when a cpu is in a tight loop waiting for
                synchronization.
                It would be nice if we had some support for that, it
                would probably make
                DFly play better on virtualized systems.


            This is an interesting suggestion, which at least would
            explain at least
            some of the cases where I’ve experienced the crashes (the
            daily HAMMER
            cronjob, heavy paging under stress, I/O bursts and so on).

            So in effect, could it be that the crashes are more likely
            as either my
            own server comes under load or some -other- server who
            happens to run in
            the same hypervisor?

            Would this warrant opening a ticket with Profitbricks, or is
            it just as
            likely that I’m wasting my time and will only get a response
            along the
            lines of ‘Use Linux; Dragonfly BSD most certainly is not
            supported’?

                I suggest setting the number of cores to 1.  That will
                get rid of all SMP
                interplay and hopefully remove the issues the virtual
                host is choking on.


            Interestingly enough, I have seen the opposite so far.  At
            first, I have
            run the server on only one core, to save money and because
            it doesn’t
            really yet need any more.  When on one core, it still
            freezes, along
            approximately the same pattern, but I never got a trace there.

            My guess then was that perhaps there would have been some
            odd race
            condition between paging, HAMMER and dm_crypt—adding another
            core
            temporarily seemed more stable and then regressed back to
            the mean.

            I will try to set up another VM to see whether I can
            reliably reproduce
            such a crash.


        After a great deal of research, I chose DragonFly as the OS for
        a new website (not yet online). Three main attributes drew me to
        DragonFly: 1. reputation for reliability and speed, 2. hammer
        file system, 3. responsiveness of DragonFly open source community.

        However, my business plan for this new website (not yet online),
        requires starting out hosting it on a VM under QEMU / KVM
        virtualization, because I cannot justify the much higher cost of
        dedicated server hosting hardware. And I like the brutally
        competitive quasi-commoditized hosting services market for QEMU
        / KVM virtualization offerings.

        I do have an experimental working DragonFly installation on a
        QEMU / KVM VM hosted at Elastic Hosts www.elastichosts.com
        <http://www.elastichosts.com> I access this VM through TightVNC
        and I get a DragonFly console using PuTTY.

        But I had to suspend work on testing this DragonFly VM
        installation, due to other business priorities. I hope to get
        back to it later in 2016.

        However, I can highly recommend Elastic Hosts for their solid
        cloud infrastructure and their strong customer support.

        So if Stefan wants to expand his testing of the DragonFly crash
        to a VM with a (probably) different underlying architecture than
        at ProfitBricks, I would recommend giving an Elastic Hosts QEMU
        / KVM VM a try.

        Alternatively, if Stafan has (or develops) some simple limited
        self-contained testing setup, for reproducing the DragonFly
        crash he is experiencing with the ProfitBricks VM he's presently
        using, I would be interested to try to set up the same testing
        scenario on my current DragonFly (and later on an upgraded
        version of DragonFly) on the Elastic Hosts VM where I presently
        have (an outdated version of) DragonFly operational.

        Steve



            Thanks for your answer,
             Stefan



            PS: Just in case, as I’ve forgotten it previously: here’s
            the dmesg from
               the server in question.

            | Copyright (c) 2003-2015 The DragonFly Project.
            | Copyright (c) 1992-2003 The FreeBSD Project.
            | Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991,
            1992, 1993, 1994
            | The Regents of the University of California. All rights
            reserved.
            | DragonFly v4.4.1-RELEASE #2: Sun Dec  6 19:10:59 EST 2015
            |
            
[email protected]:/usr/obj/home/justin/release/4_4/sys/X86_64_GENERIC
            | TSC clock: 2600054420 Hz, i8254 clock: 1193169 Hz
            | CPU: AMD Opteron 62xx class CPU (2600.11-MHz K8-class CPU)
            |   Origin = "AuthenticAMD"  Id = 0x600f12  Stepping = 2
            |
            
Features=0x783fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE,SSE2>
            |
            
Features2=0x96982203<SSE3,PCLMULQDQ,SSSE3,CX16,SSE4.1,SSE4.2,POPCNT,AESNI,XSAVE,AVX,VMM>
            |   AMD Features=0x24500800<SYSCALL,NX,MMX+,Page1GB,LM>
            |   AMD
            Features2=0x10be7<LAHF,CMP,SVM,ABM,SSE4A,MAS,Prefetch,OSVW,XOP,FMA4>
            |   MONITOR/MWAIT Features=0x2<INTBRK>
            | real memory  = 3219762176 (3070 MB)
            | avail memory = 2990727168 (2852 MB)
            | lapic: divisor index 0, frequency 500005713 Hz
            | SMI Frequency (worst case): 28571 Hz (35 us)
            | Initialize MI interrupts
            | wdog: In-kernel automatic watchdog reset enabled
            | kbd1 at kbdmux0
            | md0: Preloaded image <initrd.img> 15728640 bytes at
            0xffffffff82739ac0
            | md1: Malloc disk
            | ACPI: RSDP 0x00000000000FC980 000014 (v00 BOCHS )
            | ACPI: RSDT 0x00000000BFFFBCA0 000040 (v01 BOCHS  BXPCRSDT
            00000001 BXPC 00000001)
            | ACPI: FACP 0x00000000BFFFFF80 000074 (v01 BOCHS  BXPCFACP
            00000001 BXPC 00000001)
            | ACPI: DSDT 0x00000000BFFFBCE0 00151D (v01 BXPC   BXDSDT
             00000001 INTL 20100528)
            | ACPI: FACS 0x00000000BFFFFF40 000040
            | ACPI: APIC 0x00000000BFFFFC60 000270 (v01 BOCHS  BXPCAPIC
            00000001 BXPC 00000001)
            | ACPI: HPET 0x00000000BFFFFC20 000038 (v01 BOCHS  BXPCHPET
            00000001 BXPC 00000001)
            | ACPI: SRAT 0x00000000BFFFF770 0004A8 (v01 BOCHS  BXPCSRAT
            00000001 BXPC 00000001)
            | ACPI: SSDT 0x00000000BFFFD8E0 001E8E (v01 BOCHS  BXPCSSDT
            00000001 BXPC 00000001)
            | ACPI: SSDT 0x00000000BFFFD870 00003D (v01 BOCHS  BXPCSSDT
            00000001 BXPC 00000001)
            | ACPI: SSDT 0x00000000BFFFD200 00066E (v01 BXPC   BXSSDTPC
            00000001 INTL 20100528)
            | cryptosoft0: <software crypto> on motherboard
            | aesni0: <AES-CBC,AES-XTS> on motherboard
            | padlock0: No ACE support.
            | rdrand0: No RdRand support.
            | acpi0: <BOCHS BXPCRSDT> on motherboard
            | ACPI: 4 ACPI AML tables successfully acquired and loaded
            | ACPI FADT: SCI testing interrupt mode ...
            | ACPI FADT: SCI select level/low
            | objcache_reclaimlist
            | objcache_reclaimlist
            | objcache_reclaimlist
            | objcache_reclaimlist
            | acpi0: Power Button (fixed)
            | acpi_timer0 on acpi0
            | acpi_hpet0: <High Precision Event Timer> iomem
            0xfed00000-0xfed003ff on acpi0
            | acpi_hpet0: frequency 100000000
            | pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
            | pci0: <ACPI PCI bus> on pcib0
            | pci_link4: Unable to route IRQs: AE_NOT_FOUND
            | isab0: <PCI-ISA bridge> at device 1.0 on pci0
            | isa0: <ISA bus> on isab0
            | atapci0: <Intel PIIX3 WDMA2 controller> port
            0xc120-0xc12f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device
            1.1 on pci0
            | ata0: <ATA channel 0> on atapci0
            | ata1: <ATA channel 1> on atapci0
            | acd0: DVDROM <QEMU DVD-ROM/1.0> at ata1-master WDMA2
            | uhci0: <Intel 82371SB (PIIX3) USB controller> port
            0xc0c0-0xc0df irq 11 at device 1.2 on pci0
            | usbus0: controller did not stop
            | usbus0 on uhci0
            | pci0: <bridge> (vendor 0x8086, dev 0x7113) at device 1.3 irq 9
            | vgapci0: <VGA-compatible display> mem
            0xfd000000-0xfdffffff at device 2.0 on pci0
            | vgapci0: Boot video device
            | virtio_pci0: <VirtIO PCI Balloon adapter> port
            0xc0e0-0xc0ff irq 11 at device 3.0 on pci0
            | virtio_pci1: <VirtIO PCI Block adapter> port 0xc000-0xc03f
            mem 0xfebf0000-0xfebf0fff irq 10 at device 5.0 on pci0
            | vtblk0: <VirtIO Block Adapter> on virtio_pci1
            | virtio_pci1: host features: 0x710006d4
            
<EventIdx,RingIndirect,NotifyOnEmpty,Topology,WriteCache,SCSICmds,BlockSize,DiskGeometry,MaxNumSegs>
            | virtio_pci1: negotiated features: 0x254
            <WriteCache,BlockSize,DiskGeometry,MaxNumSegs>
            | virtio_pci2: <VirtIO PCI Block adapter> port 0xc040-0xc07f
            mem 0xfebf1000-0xfebf1fff irq 10 at device 6.0 on pci0
            | vtblk1: <VirtIO Block Adapter> on virtio_pci2
            | virtio_pci2: host features: 0x710006d4
            
<EventIdx,RingIndirect,NotifyOnEmpty,Topology,WriteCache,SCSICmds,BlockSize,DiskGeometry,MaxNumSegs>
            | virtio_pci2: negotiated features: 0x254
            <WriteCache,BlockSize,DiskGeometry,MaxNumSegs>
            | virtio_pci3: <VirtIO PCI Network adapter> port
            0xc100-0xc11f mem 0xfebf2000-0xfebf2fff irq 11 at device 7.0
            on pci0
            | vtnet0: <VirtIO Networking Adapter> on virtio_pci3
            | virtio_pci3: host features: 0x711f8060
            
<EventIdx,RingIndirect,NotifyOnEmpty,RxModeExtra,VLanFilter,RxMode,ControlVq,Status,MrgRxBuf,TxAllGSO,MacAddress>
            | virtio_pci3: negotiated features: 0x110f8020
            
<RingIndirect,NotifyOnEmpty,VLanFilter,RxMode,ControlVq,Status,MrgRxBuf,MacAddress>
            | usbus0: 12Mbps Full Speed USB v1.0
            | vtnet0: MAC address: 02:01:06:f6:1b:63
            | add dynamic link state
            | virtio_pci4: <VirtIO PCI Block adapter> port 0xc080-0xc0bf
            mem 0xfebf3000-0xfebf3fff irq 11 at device 8.0 on pci0
            | vtblk2: <VirtIO Block Adapter> on virtio_pci4
            | virtio_pci4: host features: 0x710006d4
            
<EventIdx,RingIndirect,NotifyOnEmpty,Topology,WriteCache,SCSICmds,BlockSize,DiskGeometry,MaxNumSegs>
            | virtio_pci4: negotiated features: 0x254
            <WriteCache,BlockSize,DiskGeometry,MaxNumSegs>
            | atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq
            1 on acpi0
            | atkbd0: <AT Keyboard> irq 1 on atkbdc0
            | kbd0 at atkbd0
            | ugen0.1: <Intel> at usbus0
            | uhub0: <Intel UHCI root HUB, class 9/0, rev 1.00/1.00,
            addr 1> on usbus0
            | psm0: <PS/2 Mouse> irq 12 on atkbdc0
            | psm0: model IntelliMouse Explorer, device ID 4
            | cpu0: <ACPI CPU> on acpi0
            | cpu_cst0: <ACPI CPU C-State> on cpu0
            | cpu1: <ACPI CPU> on acpi0
            | cpu_cst1: <ACPI CPU C-State> on cpu1
            | ACPI: Enabled 16 GPEs in block 00 to 0F
            | orm0: <ISA Option ROM> at iomem 0xe9800-0xeffff on isa0
            | pmtimer0 on isa0
            | vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem
            0xa0000-0xbffff on isa0
            | sc0: <System console> at flags 0x100 on isa0
            | sc0: VGA <16 virtual consoles, flags=0x300>
            | sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
            | sio0: type 16550A
            | sio1: can't drain, serial port might not exist, disabling
            | hpt27xx: no controller detected.
            | CAM: Configuring 2 busses
            | CAM: finished configuring all busses
            | cd0 at ata1 bus 0 target 0 lun 0
            | cd0: <QEMU QEMU DVD-ROM 1.0> Removable CD-ROM SCSI-0 device
            | cd0: 16.000MB/s transfers
            | cd0: cd present [329728 x 2048 byte records]
            | uhub0: 2 ports with 2 removable, self powered
            | ugen0.2: <QEMU> at usbus0
            | uhid0: <QEMU QEMU USB Tablet, class 0/0, rev 1.00/0.00,
            addr 2> on usbus0
            | no B_DEVMAGIC (bootdev=0)
            | Device Mapper version 4.16.0 loaded
            | dm_target_zero: Successfully initialized
            | dm_target_crypt: Successfully initialized
            | dm_target_error: Successfully initialized
            | Mounting root from ufs:md0s0
            | DMA space used: 1236k, remaining available: 131072k
            | Mounting devfs
            | dm_target_crypt: Setting min/max mpipe buffers: 2/30
            | dm_target_crypt: Setting min/max mpipe buffers: 2/30
            | HAMMER(Rhaal) recovery check seqno=055a4f51
            | HAMMER(Rhaal) recovery range 300000000cc2da60-300000000cc2da60
            | HAMMER(Rhaal) recovery nexto 300000000cc2da60
            endseqno=055a4f52
            | HAMMER(Rhaal) mounted clean, no recovery needed
            | chroot_kernel: set new rootnch/rootvnode to /new_root
            | dm_target_crypt: Setting min/max mpipe buffers: 2/30
            | dm_target_crypt: Setting min/max mpipe buffers: 2/30
            | dm_target_crypt: Setting min/max mpipe buffers: 2/30
            | HAMMER: read-only -> read-write
            | HAMMER(Rhaal-Daten) recovery check seqno=352f5bc4
            | HAMMER(Rhaal-Daten) recovery range
            3000000000d5c108-3000000000d77bc8
            | HAMMER(Rhaal-Daten) recovery nexto 3000000000d77bc8
            endseqno=352f5cc1
            | HAMMER(Rhaal-Daten) recovery undo
            3000000000d5c108-3000000000d77bc8 (113344 bytes)(RW)
            | HAMMER(Rhaal-Daten) Found REDO_SYNC 3000000000cb87a0
            | HAMMER(Rhaal-Daten) recovery complete
            | HAMMER(Rhaal-Daten) recovery redo
            3000000000d5c108-3000000000d77bc8 (113344 bytes)(RW)
            | HAMMER(Rhaal-Daten) Find extended redo  3000000000cb87a0,
            670056 extbytes
            | HAMMER(Rhaal-Daten) End redo recovery
            | dm_target_crypt: Setting min/max mpipe buffers: 2/30
            | swap low/high-water marks set to 83874/125811




Reply via email to