Re: [osv-dev] OSv page_fault assert during "bsd: initializing"?

Waldek Kozaczuk Tue, 27 Apr 2021 16:34:58 -0700


On Thursday, April 22, 2021 at 11:42:54 AM UTC-4 d787...@gmail.com wrote:


> Further to my previous reply, I've now been able to reproduce the original 
> bsd: initializing Assert failure on my Fedora 33 system.
> It did require some specific actions, which seem to invoke the failure 
> relatively quickly. These more closely align the firecracker instance 
> launch method with our test system, where we launch a few different OSv VMs 
> under firecracker (on Alpine) in quick succession. On our test system we 
> see a much higher failure rate, as high as 1 in 20 launches.
>
> Using latest OSv source (pulled today), with no modifications (i.e. none 
> of the new instrumenting changes you've provided in relation to this issue 
> applied):
>
> build -j4 fs=rofs image=native-example
> firecracker.py -V (just to prove that it runs and to obtain the converted 
> raw ROFS image file needed by firecracker).
>
> I then duplicated the kernel.elf and usr.raw files to:
> kernel1.elf
> kernel2.elf
> kernel3.elf
> usr1.raw
> usr2.raw
> usr3.raw
>
> This probably isn't necessary, but for the avoidance of doubt, I used this 
> approach rather than sharing the same files across multiple firecracker 
> instances.
>
> I then extracted the firecracker json config generated by the 
> firecracker.py -V command into 3 separate json config files. In each file, 
> I modified the paths for the kernel.elf and usr.raw to reference one of the 
> numbered versions of the same file. Here's an example of the resulting 
> content:
>
> {
>    "machine-config": {
>       "vcpu_count": 1,
>       "mem_size_mib": 128,
>       "ht_enabled": false
>    },
>    "drives": [
>       {
>          "drive_id": "rootfs",
>          "path_on_host": "/home/test/usr1.raw",
>          "is_root_device": false,
>          "is_read_only": false
>       }
>    ],
>    "boot-source": {
>       "kernel_image_path": "/home/test/kernel1.elf",
>       "boot_args": "--verbose --nopci --rootfs=rofs /hello"
>    }
> }
>
> Finally, in three different terminal windows, I ran a script which 
> performs the usual while forever loop launching the firecracker instance, 
> using a different numbered json config. in each terminal window and logging 
> to a different log file in each terminal window. Here's an example of the 
> script content:
>
> *Script 1*
> #!/bin/bash
> rm test1.log; while true; do firecracker --no-api --config-file 
> firecracker1.json >> test1.log ; done
>
> *Script 2*
> #!/bin/bash
> rm test2.log; while true; do firecracker --no-api --config-file 
> firecracker2.json >> test2.log ; done
>
> *Script 3*
> #!/bin/bash
> rm test3.log; while true; do firecracker --no-api --config-file 
> firecracker3.json >> test3.log ; done
>
>
> On my system, running these three scripts concurrently invokes the Assert 
> condition within 30 seconds of the start of the 3rd script.
> Interestingly, when one fails, all 3 seem to fail, though I can't be sure 
> this is happening at exactly the same time. 
>
> Hopefully this provides a mechanism by which you can reproduce the issue.
>

I have run the exact same test - 3 terminals each firing the command as you 
have it above. And I could NOT reproduce it after 50K runs or so in each 
terminal.

Now the difference is that my setup does NOT involve nested virtualization. 
Both of my MBPros have Ubuntu and Fedora installed natively (dual or triple 
boot mode). 

I have never run firecracker nor OSv on firecracker in such a setup. Until 
this point, I was not even aware that such a possibility exists on Mac 
(this page - https://kb.vmware.com/s/article/80785 - seems to point there 
are some limitations). Given all that it could be that this is a bug in OSv 
or some other layer involved in this nested virtualization setup.

I know that Firecracker is used in production on the so-called 'bare-metal' 
EC2 instances (i3, etc) that in reality involve nested virtualization 
(Nitro hypervisor) but I am not sure how well firecracker is supported in 
nested virtualization setup in general besides that.

Do your experiments target nested virtualization by design or is it just a 
matter of convenience?

Have you tried to reproduce this problem in not nested virtualization setup?

Waldek

PS. I do not have VMware Fusion handy on any of my MBPros machines but I 
guess I could install it and see if I can reproduce your issue. Is VMware 
Fusion free?

 

> On Wednesday, April 21, 2021 at 3:15:08 PM UTC+1 David Smith wrote:
>
>> Some info. for my host (Intel CPU based) Mac system (on which I'm running 
>> MacOS 10.15.7):
>>
>> Darwin 19.6.0 Darwin Kernel Version 19.6.0: Tue Jan 12 22:13:05 PST 2021; 
>> root:xnu-6153.141.16~1/RELEASE_X86_64 x86_64
>>
>> machdep.cpu.max_basic: 27
>> machdep.cpu.max_ext: 2147483656 <(214)%20748-3656>
>> machdep.cpu.vendor: GenuineIntel
>> machdep.cpu.brand_string: Intel(R) Core(TM) i5-1038NG7 CPU @ 2.00GHz
>> machdep.cpu.family: 6
>> machdep.cpu.model: 126
>> machdep.cpu.extmodel: 7
>> machdep.cpu.extfamily: 0
>> machdep.cpu.stepping: 5
>> machdep.cpu.feature_bits: 9221959987971750911
>> machdep.cpu.leaf7_feature_bits: 4072613871 <(407)%20261-3871> 1077960526
>> machdep.cpu.leaf7_feature_bits_edx: 3154117648 <(315)%20411-7648>
>> machdep.cpu.extfeature_bits: 1241984796928
>> machdep.cpu.signature: 460517
>> machdep.cpu.brand: 0
>> machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR 
>> PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 
>> PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 
>> SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 
>> RDRAND F16C
>> machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 AVX2 
>> FDPEO SMEP BMI2 ERMS INVPCID FPU_CSDS AVX512F AVX512DQ RDSEED ADX SMAP 
>> AVX512IFMA CLFSOPT IPT AVX512CD SHA AVX512BW AVX512VL AVX512VBMI UMIP PKU 
>> GFNI VAES VPCLMULQDQ AVX512VNNI AVX512BITALG AVX512VPOPCNTDQ RDPID SGXLC 
>> FSREPMOV MDCLEAR IBRS STIBP L1DF ACAPMSR SSBD
>> machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW 
>> RDTSCP TSCI
>> machdep.cpu.logical_per_package: 16
>> machdep.cpu.cores_per_package: 8
>> machdep.cpu.microcode_version: 160
>> machdep.cpu.processor_flag: 7
>> machdep.cpu.mwait.linesize_min: 64
>> machdep.cpu.mwait.linesize_max: 64
>> machdep.cpu.mwait.extensions: 3
>> machdep.cpu.mwait.sub_Cstates: 286396448
>> machdep.cpu.thermal.sensor: 1
>> machdep.cpu.thermal.dynamic_acceleration: 1
>> machdep.cpu.thermal.invariant_APIC_timer: 1
>> machdep.cpu.thermal.thresholds: 2
>> machdep.cpu.thermal.ACNT_MCNT: 1
>> machdep.cpu.thermal.core_power_limits: 1
>> machdep.cpu.thermal.fine_grain_clock_mod: 1
>> machdep.cpu.thermal.package_thermal_intr: 1
>> machdep.cpu.thermal.hardware_feedback: 0
>> machdep.cpu.thermal.energy_policy: 1
>> machdep.cpu.xsave.extended_state: 743 2688 2696 0
>> machdep.cpu.xsave.extended_state1: 15 2432 8448 0
>> machdep.cpu.arch_perf.version: 5
>> machdep.cpu.arch_perf.number: 8
>> machdep.cpu.arch_perf.width: 48
>> machdep.cpu.arch_perf.events_number: 8
>> machdep.cpu.arch_perf.events: 0
>> machdep.cpu.arch_perf.fixed_number: 4
>> machdep.cpu.arch_perf.fixed_width: 48
>> machdep.cpu.cache.linesize: 64
>> machdep.cpu.cache.L2_associativity: 8
>> machdep.cpu.cache.size: 256
>> machdep.cpu.address_bits.physical: 39
>> machdep.cpu.address_bits.virtual: 48
>> machdep.cpu.core_count: 4
>> machdep.cpu.thread_count: 8
>> machdep.cpu.tsc_ccc.numerator: 104
>> machdep.cpu.tsc_ccc.denominator: 2
>>
>> I would add that this is my development system. We have a test system 
>> which utilises VMWare ESXi and runs the OSv VM under firecracker on an 
>> Alpine Linux VMWare VM (so nested virtualisation again) and we also see the 
>> "bsd initializing" crash on startup on this configuration occasionally.
>>
>> I added the patch to dump the physical memory region information as you 
>> requested. Built again with build fs=rofs image=native-example
>> then tested with  while true; do ./scripts/firecracker.py >> 
>> /tmp/running; done (my firecracker.py script is slightly modified to add 
>> in the '--power-off-on-abort' cmd line argument, which I think is not 
>> important, but I mention it just in case).
>> On the 996'th iteration, I hit the missing symbol failure. Below is the 
>> output for the last successful invocation followed by the failed 
>> invocation. The physical memory information dumped is consistent across all 
>> 996 invocations.
>>
>> OSv v0.55.0-237-g7cd33926
>> mempool: add range at:      ffff800000956434
>> mempool: add range of size: 00000000076a9bcc
>> mempool: add range at:      ffff800000000001
>> mempool: add range of size: 000000000009fbff
>> mempool: add range at:      ffff800000100000
>> mempool: add range of size: 0000000000100000
>> 1 CPUs detected
>> Firmware vendor: Unknown
>>
>> *percpu_l1 = 0x175040
>> bsd: initializing - done
>> VFS: mounting ramfs at /
>> VFS: mounting devfs at /dev
>> net: initializing - done
>> Detected virtio-mmio device: (2,0)
>> virtio-blk: Add blk device instances 0 as vblk0, devsize=6474752
>> random: intel drng, rdrand registered as a source.
>> random: <Software, Yarrow> initialized
>> VFS: unmounting /dev
>> VFS: mounting rofs at /rofs
>> VFS: mounting devfs at /dev
>> VFS: mounting procfs at /proc
>> VFS: mounting sysfs at /sys
>> VFS: mounting ramfs at /tmp
>> Booted up in 121.29 ms
>> Cmdline: /hello 
>> Hello from C code
>> VFS: unmounting /dev
>> VFS: unmounting /proc
>> VFS: unmounting /
>> ROFS: spent 1.78 ms reading from disk
>> ROFS: read 64 512-byte blocks from disk
>> ROFS: allocated 61 512-byte blocks of cache memory
>> ROFS: hit ratio is 93.88%
>> Powering off.
>> 2021-04-21T13:22:17.382768: Start
>> 2021-04-21T13:22:17.382851: API socket-less: True
>> 2021-04-21T13:22:17.382855: Firecracker ready
>> 2021-04-21T13:22:17.382940: Configured VM
>> 2021-04-21T13:22:17.382945: Added disk
>> 2021-04-21T13:22:17.382948: Created OSv VM with cmdline: 
>> --power-off-on-abort --verbose --nopci --rootfs=rofs /hello
>> {
>>    "machine-config": {
>>       "vcpu_count": 1,
>>       "mem_size_mib": 128,
>>       "ht_enabled": false
>>    },
>>    "drives": [
>>       {
>>          "drive_id": "rootfs",
>>          "path_on_host": 
>> "/home/david-griffin/projects/firecracker/osv/scripts/../build/last/usr.raw",
>>          "is_root_device": false,
>>          "is_read_only": false
>>       }
>>    ],
>>    "boot-source": {
>>       "kernel_image_path": 
>> "/home/david-griffin/projects/firecracker/osv/scripts/../build/last/kernel.elf",
>>       "boot_args": "--power-off-on-abort --verbose --nopci --rootfs=rofs 
>> /hello"
>>    }
>> }
>> 2021-04-21T13:22:17.385972: Waiting for firecracker process to terminate
>> 2021-04-21T13:22:17.586150: End
>> OSv v0.55.0-237-g7cd33926
>> mempool: add range at:      ffff800000956434
>> mempool: add range of size: 00000000076a9bcc
>> mempool: add range at:      ffff800000000001
>> mempool: add range of size: 000000000009fbff
>> mempool: add range at:      ffff800000100000
>> mempool: add range of size: 0000000000100000
>> 1 CPUs detected
>> Firmware vendor: Unknown
>>
>> *percpu_l1 = 0x175040
>> addr: 0x0000000000000008    mmu::page_size: 4096     elf:mspa: 
>> 0x0000000000000000
>> trying to execute or access missing symbol
>> [backtrace]
>> addr: 0x0000000000000008    mmu::page_size: 4096     elf:mspa: 
>> 0x0000000000000000
>>
>> Prior to adding the physical memory info. patch, I setup a Fedora 33 
>> system (again under my VMWare Fusion) environment, installed the same OSv 
>> sources, built the same image with the same commands and ran the same test. 
>> I was unable to reproduce either startup failure over two attempts each 
>> running approx. 50K iterations.
>> Fedora 33 is running a later kernel (5.11.11) compared to both my Ubuntu 
>> 20.04 (5.8.0) and Alpine (5.10.13). Could there be some subtle kernel 
>> interaction? 
>>
>> On Tuesday, April 20, 2021 at 5:08:23 PM UTC+1 David Smith wrote:
>>
>>> Again, I appreciate you continuing to make suggestions. I was also on 
>>> vacation yesterday and catching up today. I will respond to your various 
>>> points in more detail as soon as possible.
>>>
>>> On the question of nested virtualisation, the answer is yes, my dev 
>>> environment is Ubuntu 20.04 under VMWare Fusion, with Intel VT-x/EPT and 
>>> Code Profiling CPU options enabled in the VM settings.
>>>
>>> On Thursday, April 15, 2021 at 11:06:37 PM UTC+1 jwkoz...@gmail.com 
>>> wrote:
>>>
>>>> Another related possibility is that we have a bug somewhere where we 
>>>> read a configuration value - some memory address, size parameter, etc. And 
>>>> we read it incorrectly - instead of say 4 bytes we read 8 bytes and we get 
>>>> garbage value which is then used to set up things incorrectly. And we end 
>>>> up reading from or writing to some wrong places.
>>>>
>>>> On Thursday, April 15, 2021 at 5:04:47 PM UTC-4 Waldek Kozaczuk wrote:
>>>>
>>>>> Sorry I was away on a vacation so responding somewhat late.
>>>>>
>>>>> Yes, I did see your April 6th reply, which seems to prove that the 
>>>>> issue is unlikely related to the GCC version nor other build specifics. 
>>>>>
>>>>> It is still most likely a bug in OSv that is triggered by some 
>>>>> runtime-specific element that we do not know. My hunch is that it is 
>>>>> memory-related, possibly setup, given we see two types of crashes - "bsd 
>>>>> .." page fault and missing symbol page error. The second one is very 
>>>>> weird 
>>>>> as the symbols from the kernel should always be found, so most likely 
>>>>> something is reading from the wrong part of memory.
>>>>>
>>>>> Having said that I do not know where to go from there especially given 
>>>>> I cannot reproduce it.
>>>>>
>>>>> Today I tried to re-produce on my Ubuntu 20.10 machine to no avail. I 
>>>>> looked at your email from Feb 26 where you wrote:
>>>>>
>>>>> "On the crash issue, I built an OSv image with the standard native 
>>>>> example: *./scripts/build image=native-example*
>>>>> Then ran continuous launches under firecracker using a bash loop: *while 
>>>>> true; do ./scripts/firecracker.py; done*
>>>>> After 1249 iterations, I got the same Assert at startup."
>>>>>
>>>>> which seems to prove that this error is not app-specific and I used 
>>>>> exactly some commands to build and run the example:
>>>>>
>>>>> With the original command:
>>>>>
>>>>> while true; do ./scripts/firecracker.py >> /tmp/running; done
>>>>>
>>>>> I could run it 71700 (over 71K) with no errors. 
>>>>>
>>>>>
>>>>> With verbose enabled:
>>>>>
>>>>> while true; do ./scripts/firecracker.py -V >> /tmp/running2; done
>>>>>
>>>>> I could run it 46691(over 46K) with no errors.
>>>>>
>>>>>
>>>>> Let me share some of my host machine-specific (5-6 year old MacBook 
>>>>> Pro) info:
>>>>>
>>>>> *more /proc/cpuinfo *
>>>>>
>>>>> *processor : 0*
>>>>>
>>>>> *vendor_id : GenuineIntel*
>>>>>
>>>>> *cpu family : 6*
>>>>>
>>>>> *model : 70*
>>>>>
>>>>> *model name : Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz*
>>>>>
>>>>> *stepping : 1*
>>>>>
>>>>> *microcode : 0x1c*
>>>>>
>>>>> *cpu MHz : 876.525*
>>>>>
>>>>> *cache size : 6144 KB*
>>>>>
>>>>> *physical id : 0*
>>>>>
>>>>> *siblings : 8*
>>>>>
>>>>> *core id : 0*
>>>>>
>>>>> *cpu cores : 4*
>>>>>
>>>>> *apicid : 0*
>>>>>
>>>>> *initial apicid : 0*
>>>>>
>>>>> *fpu : yes*
>>>>>
>>>>> *fpu_exception : yes*
>>>>>
>>>>> *cpuid level : 13*
>>>>>
>>>>> *wp : yes*
>>>>>
>>>>> *flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
>>>>> pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx 
>>>>> pdpe1*
>>>>>
>>>>> *gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl 
>>>>> xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor 
>>>>> ds_cpl 
>>>>> vmx smx est tm2 s*
>>>>>
>>>>> *sse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt 
>>>>> tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb 
>>>>> invpcid_singl*
>>>>>
>>>>> *e pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid 
>>>>> ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt 
>>>>> dtherm 
>>>>> ida arat pln *
>>>>>
>>>>> *pts md_clear flush_l1d*
>>>>>
>>>>> *vmx flags : vnmi preemption_timer invvpid ept_x_only ept_ad ept_1gb 
>>>>> flexpriority tsc_offset vtpr mtf vapic ept vpid unrestricted_guest ple 
>>>>> shadow_vm*
>>>>>
>>>>> *cs*
>>>>>
>>>>> *bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds 
>>>>> swapgs itlb_multihit srbds*
>>>>>
>>>>> *bogomips : 4589.68*
>>>>>
>>>>> *clflush size : 64*
>>>>>
>>>>> *cache_alignment : 64*
>>>>>
>>>>> *address sizes : 39 bits physical, 48 bits virtual*
>>>>>
>>>>> *power management:*
>>>>>
>>>>> uname -a
>>>>> Linux wkoMacBookPro 5.8.0-49-generic #55-Ubuntu SMP Wed Mar 24 
>>>>> 14:45:45 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
>>>>>
>>>>> So my machine comes with Intel CPUs.
>>>>>
>>>>> Is yours Intel as well or AMD? I know that Firecracker had to add some 
>>>>> special support for AMD.
>>>>>
>>>>> Finally given it might be memory-setup related issue I wonder what the 
>>>>> physical memory ranges you see.
>>>>>
>>>>> This patch:
>>>>> git diff core/
>>>>> diff --git a/core/mempool.cc b/core/mempool.cc
>>>>> index bd8e2fcf..52f94774 100644
>>>>> --- a/core/mempool.cc
>>>>> +++ b/core/mempool.cc
>>>>> @@ -1732,6 +1732,8 @@ void free_initial_memory_range(void* addr, 
>>>>> size_t size)
>>>>>      if (!size) {
>>>>>          return;
>>>>>      }
>>>>> +    debug_early_u64("mempool: add range at:      ", (u64)addr);
>>>>> +    debug_early_u64("mempool: add range of size: ", size);
>>>>>      auto a = reinterpret_cast<uintptr_t>(addr);
>>>>>      auto delta = align_up(a, page_size) - a;
>>>>>      if (delta > size) {
>>>>>
>>>>> produces this extra info with verbose output:
>>>>>
>>>>> 2021-04-15T13:40:50.493125: Start
>>>>> 2021-04-15T13:40:50.493200: API socket-less: True
>>>>> 2021-04-15T13:40:50.493204: Firecracker ready
>>>>> 2021-04-15T13:40:50.493293: Configured VM
>>>>> 2021-04-15T13:40:50.493298: Added disk
>>>>> 2021-04-15T13:40:50.493301: Created OSv VM with cmdline: --verbose 
>>>>> --nopci --rootfs=zfs /hello
>>>>> {
>>>>>    "machine-config": {
>>>>>       "vcpu_count": 1,
>>>>>       "mem_size_mib": 128,
>>>>>       "ht_enabled": false
>>>>>    },
>>>>>    "drives": [
>>>>>       {
>>>>>          "drive_id": "rootfs",
>>>>>          "path_on_host": 
>>>>> "/home/wkozaczuk/projects/osv-master/scripts/../build/last/usr.raw",
>>>>>          "is_root_device": false,
>>>>>          "is_read_only": false
>>>>>       }
>>>>>    ],
>>>>>    "boot-source": {
>>>>>       "kernel_image_path": 
>>>>> "/home/wkozaczuk/projects/osv-master/scripts/../build/last/kernel.elf",
>>>>>       "boot_args": "--verbose --nopci --rootfs=zfs /hello"
>>>>>    }
>>>>> }
>>>>> OSv v0.55.0-240-g9d1f5111
>>>>> *mempool: add range at:      ffff800000957434*
>>>>> *mempool: add range of size: 00000000076a8bcc*
>>>>> *mempool: add range at:      ffff800000000001*
>>>>> *mempool: add range of size: 000000000009fbff*
>>>>> *mempool: add range at:      ffff800000100000*
>>>>> *mempool: add range of size: 0000000000100000*
>>>>> 1 CPUs detected
>>>>> Firmware vendor: Unknown
>>>>> bsd: initializing - done
>>>>> VFS: mounting ramfs at /
>>>>> VFS: mounting devfs at /dev
>>>>> net: initializing - done
>>>>> Detected virtio-mmio device: (2,0)
>>>>> virtio-blk: Add blk device instances 0 as vblk0, devsize=268435456
>>>>> random: intel drng, rdrand registered as a source.
>>>>> random: <Software, Yarrow> initialized
>>>>> VFS: unmounting /dev
>>>>> VFS: mounting zfs at /zfs
>>>>> zfs: mounting osv/zfs from device /dev/vblk0.1
>>>>> VFS: mounting devfs at /dev
>>>>> VFS: mounting procfs at /proc
>>>>> VFS: mounting sysfs at /sys
>>>>> BSD shrinker: event handler list found: 0xffffa000009ad080
>>>>>         BSD shrinker found: 1
>>>>> BSD shrinker: unlocked, running
>>>>> Booted up in 47.39 ms
>>>>> Cmdline: /hello 
>>>>> Hello from C code
>>>>> VFS: unmounting /dev
>>>>> VFS: unmounting /proc
>>>>> VFS: unmounting /
>>>>> Powering off.
>>>>> 2021-04-15T13:40:50.495752: Waiting for firecracker process to 
>>>>> terminate
>>>>> 2021-04-15T13:40:50.594645: End
>>>>>
>>>>> Also in all 46691 runs with verbose I would always see the same memory 
>>>>> ranges.
>>>>>
>>>>> I wonder what ranges are in your case. I know we used to have some 
>>>>> edge-case bugs in memory setup around parsing e820 ranges so.
>>>>>
>>>>> Maybe we have a bug where we read incorrectly e820 and then create a 
>>>>> wrong mapping or read less memory than provided.
>>>>>
>>>>> Are the ranges the same in your case or possibly fluctuate and in case 
>>>>> of the crash are somewhat different?
>>>>>
>>>>> Otherwise, I am out of any more ideas.
>>>>>
>>>>> Waldek
>>>>>
>>>>> PS. Do you possibly run your tests in a nested virtualization setup or 
>>>>> possibly on EC2 i3 metal instances?
>>>>>
>>>>>
>>>>> On Sunday, April 11, 2021 at 12:43:17 PM UTC-4 d787...@gmail.com 
>>>>> wrote:
>>>>>
>>>>>> The gcc 10.2.0 build already included the missing symbol change to 
>>>>>> disable setting up the missing symbols page.
>>>>>>
>>>>>> Did you see my April 6th reply?
>>>>>>
>>>>>>
>>>>>> On Sunday, April 11, 2021 at 4:45:59 AM UTC+1 jwkoz...@gmail.com 
>>>>>> wrote:
>>>>>>
>>>>>>> Interesting. What happens if you apply that other change to disable 
>>>>>>> setting up that missing symbols page and compile with gcc 10.2.0?
>>>>>>>
>>>>>>> On Wednesday, April 7, 2021 at 8:30:10 AM UTC-4 d787...@gmail.com 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Re: Your earlier point that it could be due to my gcc version 
>>>>>>>> revealing a gcc/OSv bug, I thought I would try upgrading to the latest 
>>>>>>>> gcc/g++ tools. So I'm now running:
>>>>>>>>
>>>>>>>> gcc (Ubuntu 10.2.0-5ubuntu1~20.04) 10.2.0
>>>>>>>>
>>>>>>>> I rebuilt my OSv image as follows:
>>>>>>>>
>>>>>>>> ./scripts/build clean
>>>>>>>> ./scripts/build -j4 image=golang-example
>>>>>>>>
>>>>>>>> Then re-ran my test. At iterations 591, 688 and 9326, I saw the 
>>>>>>>> missing symbol failure, but so far, no sign of the Assert failure.
>>>>>>>>
>>>>>>>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/4fce21e5-e2ab-423e-9656-f5b5663476b1n%40googlegroups.com.

Re: [osv-dev] OSv page_fault assert during "bsd: initializing"?

Reply via email to