> > make.conf je obecne urcen spis pro nastavovani globalnich parametru, nikoliv veci prilis parametrizovanych a tudiz v ruznych situacich ruznych. > > Nicmene, pri prekladu kernelu a modulu se nepouzije promenna CFLAGS nybrz COPTFLAGS a pokud je soucasne nadefinovana promenna NO_CPU_COPTFLAGS tak se k COPTFLAGS automaticky nepridaji nastaveni pro konkretni procesor zalozene na architekture (a muzes respektive musis si je tam tedy dat sam). Tim se otevira moznost mit pro preklad kernelu a modulu separatni nastaveni flagu, ktere das, vcetne nastaveni pro procesor, do COPTFLAGS, zatimco flagy pro preklad ostatnich veci se nastavi beznym zpusobem > > Tohle cele se ale tyka jen prekladu C/CPP zdrojaku. Assemblerovy kod a jeho preklad nastaveni CFLAGS ani COPTFLAGS neovlivni. A ani jakekoliv jin enastaveni arcgitektury nebo neceho jineho. Assemblerovske zdrojaky se proste prekladaji bez moznosti ovlivnit optiony s jakymi se to bude delat. > > Kompilator samotny pak urcuje promenna CC kterou si pro preklad nastav vzdy na ten kompilator, ktery je podle tebe v dany chvili potreba. > > > Pripadne, mate zkusenost s kompilaci kernelu pod gcc 4.9 ? > > Ne, ale pamatuju si, ze nekde v handbooku ci kde je pouziti vlastnich nastaveni optimalizace pri prekladu jadra povazovano za neco co delas "na vlastni nebezpeci". Muze dojit ke vzniku race-condition zpusobenych nevhodnou optimalizaci pri prekladu a jadro pak muze nahodne padat ci vykazovat jine "podivne" chovani. > > Takze do tohoto dobrodruzstvi jsem se nikdy nepustil.
Ahoj, po nejakem experimentovani jsem dospel prozatim k ~manualnimu prepinani. Mam dva stroje, jeden s Atom D525, druhy s I7 (vypis viz nize). Pokousel jsem se vytvorit nejakou rozumnou optimalizaci jadra, ktera by mi umoznila aktivovat nektere rozsirene instrukcni sady a zvysit eventuelne vykon. Mozna se to nekomu z vas bude hodit, kazdopadne by mne zajimaly vase napady. Jak mne Dan Lukes varoval, muze dojit k problemum s kompatibilitou kompileru a jadra, ktera finalne muze skoncit az nefunkcnosti system - to je zivot. Kazdopadne stale nemam doreseno jak automaticky prepinat flagy (nejake .if nastaveni), maximalne scriptovat. OS je FreeBSD 9.1 Pro kompilaci v userlandu jsem pouzival gcc49 a informace o nastaveni CPUTYPE je prevzato z http://gcc.gnu.org/onlinedocs/gcc/i386-and-x86_002d64-Options.html. Kompiler pro jadro je 4.2.1, ktery je odpovidajici pro zachovani urcite bezpecnosti, casovani a dalsich zalezitosti, ale kazda mince ma dve strany, v tomto pripade omezena podpora novejsich instrukcnich sad. Flagy pro GCC CC= /usr/local/bin/gcc49 CXX= /usr/local/bin/g++49 CPP= /usr/local/bin/cpp49 Narazil jsem na problem s kompilaci nekterych balicku, ktere v pripade pouziti jineho kompileru nez systemoveho proste zhavaruji (neprojde ani config), nebo balicku vyzadujicich systemovy kompiler a nastaveni odpovidajiciho CPUTYPE. Takze mam hruby postup - zkompilovat s optimalizaci pro CPU, pokud neprojde zkompilovat s definici pro kernel, pokud neprojde vypnout GCC a pokud neprojde stahnout z portu. To lze scriptovat, je to necestne a nesportovni, ale zatim to funguje. CPUTYPE pro D525 #userland CPUTYPE?= atom #kernel, world a nektere balicky CPUTYPE?= nocona CPUTYPE pro i7 #userland CPUTYPE?= corei7-avx #kernel, world a nektere balicky CPUTYPE?= core2 Default flagy. Puvodne jsem premyslel nad vyuzitim funnkcionality prepinace -march, ale zase - zlobila spousta portu, neresilo to problem volby kompileru pro jadro a porty CFLAGS= -O2 -pipe -fno-strict-aliasing COPTFLAGS= -O2 -pipe -funroll-loops -ffast-math -fno-strict-aliasing Zatim jsem si delal jenom hrube testy, kazdopadne vyuziti funkcionality i7 ma rozhodne smysl pro VPN site a sifrovani v AES-CBC modu, rozdil je dost vyrazny. Kompilaci si ovsem nepomohu, dulezitejsi je nahrat modul aesni (samozrejmne pouze na i5, i7 nebo novejsich) bud pres kldload nebo v: /boot/loader.conf aesni_load="YES" Jinak, zaznamenal jsem zmenou kompilace pro jiny typ CPU obecne snizeni reakcnich casu pod zatezi (napr. kompilace vsech portu mi dobehne o zhruba 10-15% rychleji). Co se tyka Atomu, nezaznamenal jsem nejaky rozdil, takze zustanu u kompilace jadra pro nocona. Jedine, co bohuzel nedokazu zmerit je stabilita a bezpecnost, to ze mi to funguje neznamena, ze je vse v poradku. Jak mi kdysi nekdo rekl: "Uz pro ten krasny vlhky pocit, ze to mam o 0.0001% rychlejsi ...." # openssl engine -c -tt (cryptodev) BSD cryptodev engine [RSA, DSA, DH, AES-128-CBC] [ available ] (dynamic) Dynamic engine loading support [ unavailable ] # openssl speed aes-128-cbc To get the most accurate results, try to run this program when this computer is idle. Doing aes-128 cbc for 3s on 16 size blocks: 21866431 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 64 size blocks: 5708626 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 256 size blocks: 1435293 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 1024 size blocks: 361581 aes-128 cbc's in 3.00s Doing aes-128 cbc for 3s on 8192 size blocks: 45242 aes-128 cbc's in 3.00s OpenSSL 0.9.8y 5 Feb 2013 built on: date not available options:bn(64,64) md2(int) rc4(ptr,int) des(idx,cisc,16,int) aes(partial) blowfish(idx) compiler: cc available timing options: USE_TOD HZ=128 [sysconf value] timing function used: getrusage The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128 cbc 116507.29k 121780.37k 122464.86k 123397.48k 123518.22k Prikladam vypis CPUID # cpuid Vendor ID: "GenuineIntel"; CPUID level 2 Intel-specific functions: Version 000106ca: Type 0 - Original OEM Family 6 - Pentium Pro Model 28 - Intel Atom processor, 45nm Stepping 10 Reserved 0 Extended brand string: " Intel(R) Atom(TM) CPU D525 @ 1.80GHz" CLFLUSH instruction cache line size: 8 Initial APIC ID: 2 Hyper threading siblings: 4 Feature flags: bfebfbff: FPU Floating Point Unit VME Virtual 8086 Mode Enhancements DE Debugging Extensions PSE Page Size Extensions TSC Time Stamp Counter MSR Model Specific Registers PAE Physical Address Extension MCE Machine Check Exception CX8 COMPXCHG8B Instruction APIC On-chip Advanced Programmable Interrupt Controller present and enabled SEP Fast System Call MTRR Memory Type Range Registers PGE PTE Global Flag MCA Machine Check Architecture CMOV Conditional Move and Compare Instructions FGPAT Page Attribute Table PSE-36 36-bit Page Size Extension CLFSH CFLUSH instruction DS Debug store ACPI Thermal Monitor and Clock Ctrl MMX MMX instruction set FXSR Fast FP/MMX Streaming SIMD Extensions save/restore SSE Streaming SIMD Extensions instruction set SSE2 SSE2 extensions SS Self Snoop HT Hyper Threading TM Thermal monitor 31 Pending Break Enable Feature flags set 2: 0040e31d: SSE3 SSE3 extensions DTES64 64-bit debug store MONITOR MONITOR/MWAIT instructions DS-CPL CPL Qualified Debug Store TM2 Thermal Monitor 2 SSSE3 Supplemental Streaming SIMD Extension 3 CX16 CMPXCHG16B xTPR Send Task Priority messages PDCM Perfmon and debug capability MOVBE MOVBE instruction Extended feature flags: 20100000: XD-bit Execution Disable bit EM64T Intel Extended Memory 64 Technology Extended feature flags set 2: 00000001: LAHF LAHF/SAHF available in IA-32e mode TLB and cache info: 59: unknown TLB/cache descriptor ba: unknown TLB/cache descriptor 4f: unknown TLB/cache descriptor c0: unknown TLB/cache descriptor 80: unknown TLB/cache descriptor 30: 1st-level instruction cache: 32-KB, 8-way set associative, 64-byte line size 0e: unknown TLB/cache descriptor # cpuid Vendor ID: "GenuineIntel"; CPUID level 13 Intel-specific functions: Version 000306a9: Type 0 - Original OEM Family 6 - Pentium Pro Model 58 - Stepping 9 Reserved 0 Extended brand string: " Intel(R) Core(TM) i7-3612QE CPU @ 2.10GHz" CLFLUSH instruction cache line size: 8 Initial APIC ID: 3 Hyper threading siblings: 16 Feature flags: bfebfbff: FPU Floating Point Unit VME Virtual 8086 Mode Enhancements DE Debugging Extensions PSE Page Size Extensions TSC Time Stamp Counter MSR Model Specific Registers PAE Physical Address Extension MCE Machine Check Exception CX8 COMPXCHG8B Instruction APIC On-chip Advanced Programmable Interrupt Controller present and enabled SEP Fast System Call MTRR Memory Type Range Registers PGE PTE Global Flag MCA Machine Check Architecture CMOV Conditional Move and Compare Instructions FGPAT Page Attribute Table PSE-36 36-bit Page Size Extension CLFSH CFLUSH instruction DS Debug store ACPI Thermal Monitor and Clock Ctrl MMX MMX instruction set FXSR Fast FP/MMX Streaming SIMD Extensions save/restore SSE Streaming SIMD Extensions instruction set SSE2 SSE2 extensions SS Self Snoop HT Hyper Threading TM Thermal monitor 31 Pending Break Enable Feature flags set 2: 7fbae3ff: SSE3 SSE3 extensions PCLMULDQ PCLMULDQ instruction DTES64 64-bit debug store MONITOR MONITOR/MWAIT instructions DS-CPL CPL Qualified Debug Store VMX Virtual Machine Extensions SMX Safer Mode Extension EST Enhanced Intel SpeedStep Technology TM2 Thermal Monitor 2 SSSE3 Supplemental Streaming SIMD Extension 3 CX16 CMPXCHG16B xTPR Send Task Priority messages PDCM Perfmon and debug capability 17 - unknown feature SSE4.1 Streaming SIMD Extension 4.1 SSE4.2 Streaming SIMD Extension 4.2 x2APIC Extended xAPIC support POPCNT POPCNT instruction 24 - unknown feature AESNI AES Instruction set XSAVE XSAVE/XSTOR states OSXSAVE OS-enabled extended state managerment AVX AVX extensions 29 - unknown feature 30 - unknown feature Extended feature flags: 28100800: SYSCALL SYSCALL/SYSRET instructions XD-bit Execution Disable bit RDTSCP RDTSCP and IA32_TSC_AUX are available EM64T Intel Extended Memory 64 Technology Extended feature flags set 2: 00000001: LAHF LAHF/SAHF available in IA-32e mode TLB and cache info: 5a: Data TLB: 2MB or 4MB pages, 4-way set associative, 32 entries 03: Data TLB: 4KB pages, 4-way set assoc, 64 entries 76: unknown TLB/cache descriptor ff: unknown TLB/cache descriptor b2: Instruction TLB: 4-KB Pages, 4-way set associative, 64 entries f0: 64-byte prefetching ca: Shared 2nd-level TLB: 4-KB Pages, 4-way set associative, 512 entries Processor serial: 0000-0000-0000-0000-0000-0000 -- FreeBSD mailing list ([email protected]) http://www.freebsd.cz/listserv/listinfo/users-l
