Mixing amd64 kernel with i386 world
I have a system with 4GB RAM and hence need to use an amd64 kernel to use all the RAM (I can only access 3GB RAM with an i386 kernel). OTOH, amd64 processes are significantly (50-100%) larger than equivalent i386 processes and none none of the applications I'll be running on the system need to be 64-bit. This implies that the optimal approach is an amd64 kernel with i386 userland (I'm ignoring PAE as a useable approach). I've successfully run i386 jails on amd64 systems so I know this mostly works. I also know that there are some gotchas: - kdump needs to match the kernel - anything accessing /dev/mem or /dev/kmem (which implies anything that uses libkvm) probably needs to match the kernel. Has anyone investigated this approach? -- Peter Jeremy pgpxr8ZptU2HX.pgp Description: PGP signature
Re: seeding randomness in zee cloud
On 2013-May-31 12:01:02 +0200, Dirk-Willem van Gulik di...@webweaving.org wrote: Thanks to a badly-written mngt script - we've rencently noticed a freshly generated ssh-key on a new AWS instances to be indentical to one seen a few months prior. ... I am surmising that perhaps the (micro-T) images do not have that much entropy on startup. This is a fairly common issue - typically, the first thing a newly installed system does immediately after a boot (when it has the least entropy available) is to generate its SSH host keys. Now we happen to have very easy access to blocks of 1024bits of randomness from a remote server in already nicely PKI signed packages (as it is needed later for something else). Obtaining entropy from another machine is an option but you need to ensure that the source is trustworthy, you only use the entropy once and that the entropy can't be intercepted by anyone else. Or does this cause a loss/reset of all entropy gathered by the hardware sofar ? As others have indicated, no. Writing to /dev/random can't reduce the available entropy. Or is there a cleaner way to add a additional seed as a one-off with disturbing as little as possible (in the few seconds just after the network is brought up). If this needs to be done automatically, not really. If there's a person available, you could use the please type a screen full of random junk approach and feed both the inter-character timings (which should be done automatically via IRQ harvesting) and junk into /dev/random. -- Peter Jeremy pgpeZ4geVWmT_.pgp Description: PGP signature
Re: GSOC 2013 project Kernel Size Reduction for Embedded System
On 2013-Apr-09 11:05:56 -0700, Freddie Cash fjwc...@gmail.com wrote: You have to look at the in-memory sizes, not the on-disk sizes. Or, even better, look at the difference between installed physical RAM and how much RAM is available to userland processes. -- Peter Jeremy pgpOHqKqYTU0M.pgp Description: PGP signature
Re: Help porting Linux app - getting Free Memory and Real Memory
On 2013-Mar-29 20:27:27 -0400, Rod Person rodper...@rodperson.com wrote: Everything is going we except that the program gives warnings that there isn't enough free memory on the system to perform certain actions. That premise sounds suspiciously like the upstream author doesn't understand how Unix VM works. int getSysCtl(int top_level, int next_level){ int mib[2], ctlvalue; size_t len; mib[0] = top_level; mib[1] = next_level; len = sizeof(ctlvalue); sysctl(mib, 2, ctlvalue, len, NULL, 0); return ctlvalue; } int main(void){ int realmem = getSysCtl(CTL_HW, HW_REALMEM); int usermem = getSysCtl(CTL_HW, HW_USERMEM); HW_REALMEM and HW_USERMEM return an unsigned long, not an int. That probably explains the nonsense value you are seeing in 'User Memory'. printf(Total VM Memory: %i\n,vmsize.t_vm); printf(Total Real Memory: %i\n,vmsize.t_rm); printf(shared real memory: %i\n,vmsize.t_rmshr); printf(active shared real memory: %i\n,vmsize.t_armshr); printf(Total Free Memory pages: %i\n,vmsize.t_free); And these numbers are all in pages as you surmise. -- Peter Jeremy pgpcVmlAkNW_M.pgp Description: PGP signature
Re: ZFS regimen: scrub, scrub, scrub and scrub again.
On 2013-Jan-21 12:12:45 +0100, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: That's why i use properly tuned UFS, gmirror, and prefer not to use gstripe but have multiple filesystems When I started using ZFS, I didn't fully trust it so I had a gmirrored UFS root (including a full src tree). Over time, I found that gmirror plus UFS was giving me more problems than ZFS. In particular, I was seeing behaviour that suggested that the mirrors were out of sync, even though gmirror insisted they were in sync. Unfortunately, there is no way to get gmirror to verify the mirroring or to get UFS to check correctness of data or metadata (fsck can only check metadata consistency). I've since moved to a ZFS root. Which is marketing, not truth. If you want bullet-proof recoverability, UFS beats everything i've ever seen. I've seen the opposite. One big difference is that ZFS is designed to ensure it returns the data that was written to it whereas UFS just returns the bytes it finds where it thinks it wrote your data. One side effect of this is that ZFS is far fussier about hardware quality - since it checksums everything, it is likely to pick up glitches that UFS doesn't notice. If you want FAST crash recovery, use softupdates+journal, available in FreeBSD 9. I'll admit that I haven't used SU+J but one downside of SU+J is that it prevents the use of snapshots, which in turn prevents the (safe) use of dump(8) (which is the official tool for UFS backups) on live filesystems. of fuss. Even if you dislodge a drive ... so that it's missing the last 'n' transactions, ZFS seems to figure this out (which I thought was extra cudos). Yes this is marketing. practice is somehow different. as you discovered yourself. Most of the time this works as designed. It's possible there are bugs in the implementation. While RAID-Z is already a king of bad performance, I don't believe RAID-Z is any worse than RAID5. Do you have any actual measurements to back up your claim? i assume you mean two POOLS, not 2 RAID-Z sets. if you mixed 2 different RAID-Z pools you would spread load unevenly and make performance even worse. There's no real reason why you could't have 2 different vdevs in the same pool. A full scrub of my drives weighs in at 36 hours or so. which is funny as ZFS is marketed as doing this efficient (like checking only used space). It _does_ only check used space but it does so in logical order rather than physical order. For a fragmented pool, this means random accesses. Even better - use UFS. Then you'll never know that your data has been corrupted. For both bullet proof recoverability and performance. use ZFS. -- Peter Jeremy pgpo1y4DGw4Rb.pgp Description: PGP signature
Re: IBM blade server abysmal disk write performances
On 2013-Jan-18 12:12:11 -0800, Dieter BSD dieter...@gmail.com wrote: adding hw.ata.wc=0 to /boot/loader.conf. The bigger problem is that FreeBSD does not support queuing on all controllers that support it. Not something that admins can fix, and inexcusable for an OS that claims to care about performance. Apart from continuous whinging and whining on mailing lists, what have you done to add support for queuing? -- Peter Jeremy pgpPelv8iAQPo.pgp Description: PGP signature
Re: FreeBSD for serious performance?
three drivers lock out other drivers for too long when something unusual happens. I'm using both siis ahci and have never seen anything that points to a bug in those device drivers causing the system to lockup. And I don't recall (offhand) seeing other reports of it. This again points to a problem with your particular configuration, rather than FreeBSD. And other, non-disk drivers have the same problem of locking out other drivers, even during normal operation. And this happens on yet other drivers on other people's hardware, not just mine. Can you provide mailing list or PR references to these. -- Peter Jeremy pgpxxy322gDTO.pgp Description: PGP signature
Re: FreeBSD for serious performance?
On 2012-Dec-11 15:43:21 -0500, Dieter BSD dieter...@engineer.com wrote: I care about data integrity, so things like ECC are on my must-have list. Well, that's supported by all server CPUs (AMD Opteron, Intel Itanium, Intel Xeon, Oracle/Sun SPARC) and some desktop CPUs (most AMD x86 chips). A high clock rate doesn't help when some device driver does block_all_interrupts(); while(1) DELAY(MIGHT_AS_WELL_BE_FOREVER); At least four device drivers have caused me to lose data this way. Which device drivers? We can't fix problems we don't know about. Data integrity, and yes, reliability, that sort of thing. Virtually everything except some embedded and consumer-grade x86 systems manage that. But without NCQ I'm only getting ~6% of what I should be getting. So, in one sentence you state that ECC is a must have and then you complain that that FreeBSD doesn't support NCQ on an old, low-end (consumer-grade) chipset that doesn't support ECC. It's not some rare, obscure chip. Lots of boxes have it. None that support ECC, so you wouldn't be interested in any of them. I never found a way to boot from different partitions, much less different disks with GPT. Yes, this is a limitation of FreeBSD's GPT loader. So far, no-one has written the code to support multiple boot partitions or disks. Note that most BIOS's allow you to select the boot disk - which is a workaround. -- Peter Jeremy pgpgEyjjmIWvx.pgp Description: PGP signature
Re: make -jN buildworld on 512MB ram
On 2012-Oct-31 12:58:18 -0700, Alfred Perlstein bri...@mu.org wrote: It seems like the new compiler likes to get up to ~200+MB resident when building some basic things in our tree. The killer I found was the ctfmerge(1) on the kernel - which exceeds ~400MB on i386. Under low RAM, that fails _without_ reporting any errors back to make(1), resulting in a corrupt new kernel (it booted but had virtually no devices so it couldn't find root). Doesn't our make(1) have some stuff to mitigate this? I would expect it to be a bit smarter about detecting the number of swaps/pages/faults of its children and taking into account the machine's total ram before forking off new processes. The difficulty I see is that the make process can't tell anything about the memory requirements of the pipeline it is about to spawn. As a rule of thumb, C++ needs more memory than C but that depends on what is being compiled - I have a machine-generated C program that makes gcc bloat to ~12GB. Any ideas? I mean a really simple algorithm could be devised that would be better than what we appear to have (which is nothing). If you can afford to waste CPU, one approach would be for make(1) to setrlimit(2) child processes and if the child dies, it retries that child by itself - but that will generate unnecessary retries. Another, more involved, approach would be for the scheduler to manage groups of processes - if a group of processes is causing memory pressure as a whole then the scheduler just stops scheduling some of them until the pressure reduces (effectively swap them out). (Yes, that's vague and lots of hand-waving that might not be realisable). -- Peter Jeremy pgpOxQEkEC3S2.pgp Description: PGP signature
Re: make -jN buildworld on 512MB ram
On 2012-Oct-31 14:21:51 -0700, Alfred Perlstein bri...@mu.org wrote: Ah, but make(1) can delay spawning any new processes when it knows its children are paging. That could work in some cases and may be worth implementing. Where it won't work is when make(1) initially hits a parallelisable block of big programs after a series of short, small tasks: System is OK so the first big program is spawned. ~100msec later, the next small task finishes. System in still OK (because the first big task is still growing and hasn't achieved peak bloat[*]) so it spawns another big task. Repeat a few times and you have a collection of big processes starting to thrash the system. Another, more involved, approach would be for the scheduler to manage groups of processes - if a group of processes is causing memory pressure as a whole then the scheduler just stops scheduling some of them until the pressure reduces (effectively swap them out). (Yes, that's vague and lots of hand-waving that might not be realisable). I think that could be done, this is actually a very interesting idea. Another idea is for make(1) to start to kill -STOP a child when it detects a lot of child paging until other independent children complete running, which is basically what I do manually when my build explodes until it gets past some C++ bits. This is roughly a userland variant of the scheduler change above. The downside is that make(1) can no longer just wait(2) for a process to exit and then decide what to do next. Instead, it needs to poll the system's paging activity and take action on one of its children. Some of the special cases it needs ta handle are: 1) The offending process isn't a direct child but a more distant descendent - this will be the typical case: make(1) starts gcc(1) which spawns cc1plus which bloats. 2) Multiple (potentially independent) make(1) processes all detect that the system is too busy and stop their children. Soon after, the system is free so they all SIGCONT their children. Repeat. (Note that any scheduler changes also need to cope with this). [*] Typical cc1/cc1plus behaviour is to steadily grow as the input is processed. At higher optimisation levels, parse trees are not freed at the end of a function to allow global inlining and optimisation. -- Peter Jeremy pgpQZGAx1DMTZ.pgp Description: PGP signature
Re: Building with WITH_DEBUG (-g) in make.conf
On 2012-Sep-04 23:50:35 +0200, Dimitry Andric d...@freebsd.org wrote: There's a difference between just using '-g', which should never change the behaviour of the program at runtime, and adding -DDEBUG or similar flags on the command line, which may or may not enable extra code, or even cause totally different code paths. In theory, gcc should generate identical code with and without '-g' but, last time I looked, adding '-g' causes non-trivial changes in the gcc code paths so it's quite possible that different code is emitted. What is not different, is that both -g and other debugging options will generally cause compiling and linking to take longer, since these stages will have to process the additional debug information. As well as being much larger - several times larger is not uncommon. This further slows things down due to the additional I/O and reduced cache effectiveness. -- Peter Jeremy pgpr1yzbXpbHm.pgp Description: PGP signature
Re: Unsigned Integer Encoding
On 2012-Aug-15 10:33:46 +0200, Daniel Grech dgre...@gmail.com wrote: Hi, I have what is probably a really elementary question but I can't seem to figure it out. In the following snippet of code, why is it that a and b do not have the same value in the end ? : See http://en.wikipedia.org/wiki/Endian Im asking this as I am currently encoding a protocol in which i receive data as a sequence of bytes. Casting for example 4 bytes from this stream leaves me with the situation in variable b, while the situation I am looking to accomplish is the one in A (i.e. the bytes are not encoded in reverse form). I suggest you look at xdr(3) and rpcgen(1) -- Peter Jeremy pgpSy5s1R6Dmk.pgp Description: PGP signature
Re: contigmalloc() breaking Xorg
On 2012-Aug-06 10:16:13 -0400, John Baldwin j...@freebsd.org wrote: On Thursday, July 12, 2012 8:26:05 am John Baldwin wrote: However, rather add a wiredmalloc(), I think you should just have bus_dmamem_alloc() call kmem_alloc_attr() directly in this case. One of the things I've been meaning to add to bus_dma is a way to allocate other memory types (e.g. WC memory), and in that case it would be best to just call kmem_alloc_attr() directly instead. After my recent changes, I think this would do the above. What do you think of this, and if it looks ok, can you test it? Index: busdma_machdep.c === --- busdma_machdep.c (revision 239020) +++ busdma_machdep.c (working copy) @@ -533,13 +533,14 @@ bus_dmamem_alloc(bus_dma_tag_t dmat, void** vaddr, dmat-lowaddr = ptoa((vm_paddr_t)Maxmem) attr == VM_MEMATTR_DEFAULT) { *vaddr = malloc(dmat-maxsize, M_DEVBUF, mflags); + } else if (dmat-nsegments = btoc(dmat-maxsize) + dmat-alignment = PAGE_SIZE + (dmat-boundary == 0 || dmat-boundary = dmat-lowaddr)) { + /* Page-based multi-segment allocations allowed */ + *vaddr = (void *)kmem_alloc_attr(kernel_map, dmat-maxsize, + mflags, 0ul, dmat-lowaddr, attr); + *mapp = contig_dmamap; } else { - /* - * XXX Use Contigmalloc until it is merged into this facility - * and handles multi-seg allocations. Nobody is doing - * multi-seg allocations yet though. - * XXX Certain AGP hardware does. - */ *vaddr = (void *)kmem_alloc_contig(kernel_map, dmat-maxsize, mflags, 0ul, dmat-lowaddr, dmat-alignment ? dmat-alignment : 1ul, dmat-boundary, attr); I've tried this on a -current/amd64 box (r239130) that has 2GB RAM and ZFS. I looped tar(1)s of several bits of /usr/src whilst doing a -j2 buildworld and killing X every 30s for about 7 hours. (The Xserver grabs a 8192-page dmamem tag when it starts). It all seemed to work OK. I haven't tried it on the box where I originally saw the problem because that's running 8.x. I'll have a look at backporting your and alc@'s fixes when I get some spare time. -- Peter Jeremy pgpNNsdrw2Bk7.pgp Description: PGP signature
Re: kqueue timer timeout period
On 2012-Jul-10 10:03:08 -0500, Paul Albrecht albre...@glccom.com wrote: I have a question about the kqueue timer timeout period ... what's data supposed to be? I thought it was supposed to be the period in milliseconds, but I seem to off by one. For example, if I set date (the timeout period) to 20 milliseconds, I often wait 21 milliseconds which is very undesirable for my application. FreeBSD is not a real-time OS. The timeouts specified in various syscalls (eg kevent(EVFILT_TIMER), nanosleep(), select(), poll()) specify minimum timeouts. Once the timeout (rounded up to the next tick) has expired, the process will be placed back into the queue of processes eligible to be run by the scheduler - which may impose a further arbitrary delay. Periodic timers are somewhat better behaved: Scheduler delays only impact process scheduling after the timeout expires and the average rate should be very close to that requested. -- Peter Jeremy pgpaFY3z3IfaV.pgp Description: PGP signature
Re: Replacing BIND with unbound (Was: Re: Pull in upstream before 9.1 code freeze?)
On 2012-Jul-09 14:15:13 +0200, in freebsd-security, Andrej (Andy) Brodnik and...@brodnik.org wrote: Excuse my ignorance - but is there a how-to paper on transition from bind to unbound for SOHO? In particular, if unbound has no authoritative server capabilities, what suggestions are there for handling the private hosts in a SOHO environment? -- Peter Jeremy pgpJAciudHfKN.pgp Description: PGP signature
Re: Replacing BIND with unbound 9.1 code freeze?)
Firstly, I should note that I'm not against removing bind from base. I'm merely saying that users are going to need some guidance during the transition. On 2012-Jul-09 13:52:15 -0700, Doug Barton do...@freebsd.org wrote: On 07/09/2012 13:47, Peter Jeremy wrote: On 2012-Jul-09 14:15:13 +0200, in freebsd-security, Andrej (Andy) Brodnik and...@brodnik.org wrote: Excuse my ignorance - but is there a how-to paper on transition from bind to unbound for SOHO? You don't need to transition if you don't want to. Just install BIND from the ports. IMHO, this is a copout. If the default response to anyone asking a question about transitioning is install bind then we might as well leave bind in the base system. As I see it, FreeBSD systems fall roughly into 3 categories: 1) Client systems that need to lookup external DNS servers only. 2) SOHO systems that primarily do external lookups but need to be internally authoritative about their local network. 3) Systems that are primarily DNS servers. The third category is clearly a use ports case - there's no need for the base system to include all the tools necessary to build one of the root nameservers. The base system _must_ handle the first category - and I'll accept advice from dougb@ des@ that unbound is a good choice for this. The issues people seem to have with the change here are the user tools to interface with DNS - currently dig(1), host(1) and nslookup(1) - and des@ has now adequately covered this. I think the majority of the remaining unease in this thread comes from people who administer systems in the second category. I (and I expect lots of other people) use bind for this solely because it is in the base system, not because it is the best tool for the job. In particular, if unbound has no authoritative server capabilities, what suggestions are there for handling the private hosts in a SOHO environment? Stub and/or forward zones. The unbound docs have more information. But unfortunately no tutorial guides. Having looked at the online copy of unbound.conf(5), it appears that unbound _does_ have some limited server capabilities - this wasn't clear in the original proposal. It's not immediately clear to me whether it's adequate for my purposes and, if it isn't, what I should use. This is an area where I expect there will be community input - potentially via the FreeBSD wiki. -- Peter Jeremy pgp6vbMlLvV6G.pgp Description: PGP signature
Re: Replacing BIND with unbound
On 2012-Jul-10 00:40:07 +0200, Dag-Erling Smørgrav d...@des.no wrote: They are sufficiently similar that writing a wrapper that supports a significant subset of dig's command-line option and uses drill as a backend shouldn't take more than an afternoon for a reasonably experienced programmer. I would further suggest that where a dig(1) option isn't emulated, the fallback error message should refer the user to drill(1). As for nslookup... it's been deprecated for a decade. But old fogies might still use it. Can I suggest that something along the lines of the the following be installed as /usr/bin/nslookup: #!/bin/sh echo nslookup is no longer supported. Please see drill(1) or host(1) 2 exit 1 -- Peter Jeremy pgpP08j1bRN4J.pgp Description: PGP signature
Re: contigmalloc() breaking Xorg
On 2012-Jul-03 21:17:53 +1000, Peter Jeremy pe...@server.rulingia.com wrote: I have a reasonably recent 8-stable/amd64 system (r237444) with a ATI Radeon HD 2400 Pro, xorg-server-1.10.6,1 and xf86-video-ati-6.14.3_1 8GB RAM and ZFS. I'm seeing fairly consistent problems with Xorg ... How difficult would it be to modify bus_dmamem_alloc() [at least on x86] to handle multi-segment allocations? I think I've managed to create an amd64 bus_dmamem_alloc() that allows page-sized allocations as long as no boundary condition is specified and no more than page-sized alignment is required (porting it to other architectures would be trivial). I've given it a quick whirl inside a VBox and no smoke came out but I'd appreciate someone with a better understanding of bus_dma(9) and vm/vm_contig.c giving http://www.rulingia.com/bugs/patch-wiredmalloc a once-over. Note that this patch is against 8.x but there's only a trivial difference to head. BTW, the comment in busdma_machdep.c:bus_dmamem_alloc() * XXX Use Contigmalloc until it is merged into this facility * and handles multi-seg allocations. Nobody is doing * multi-seg allocations yet though. * XXX Certain AGP hardware does. does not appear to be accurate. Apart from drm, quite a few drivers call bus_dma_tag_create(9) with multiple segments and also call bus_dmamem_alloc(9) [though I haven't verified that the calls share the same bus_dma_tag, so I can't be absolutely certain]. BTW(2): Whilst studying busdma_machdep.c for arm and mips, I've noticed they appear to potentially allocate substantial kernel stack under some conditions as several bus_dma(9) functions include: bus_dma_segment_t dm_segments[dmat-nsegments]; What prevents this overflowing the kernel stack? -- Peter Jeremy pgp36kekzDWj0.pgp Description: PGP signature
Re: Pull in upstream before 9.1 code freeze?
On 2012-Jul-05 09:22:25 +0200, Jonathan McKeown j.mcke...@ru.ac.za wrote: As for the idea that Linux refugees need extra help to migrate, that's the sort of thinking that led to things like: alias dir=ls Whilst we're on the subject, can we please also have #define BEGIN { #define END } wired into gcc to help people migrating from Algol and Pascal. -- Peter Jeremy pgp8dpJReQa4x.pgp Description: PGP signature
Re: install-prompt for missing features (Was: Re: Pull in upstream before 9.1 code freeze?)
On 2012-Jul-04 19:10:08 -0400, Mike Meyer m...@mired.org wrote: My first thought was to suggest it be a port as well, but I'm not sure that can be done sanely. The easiest way is probably to implement some form of generic command not found hook into sh(1) and tcsh(1) - in interactive mode, if a specific function exists, execute it rather than reporting an error message . The actual functionality to map a command name to a port and suggest it to the user could nten be implemented separately as a port and the user would enable it by adding the appropriate function definition to their .profile/.login/.[t]cshrc files. Note that I'm not currently interested in this functionality and am not volunteering to implement it. -- Peter Jeremy pgpLiwmiVDNPP.pgp Description: PGP signature
contigmalloc() breaking Xorg
I have a reasonably recent 8-stable/amd64 system (r237444) with a ATI Radeon HD 2400 Pro, xorg-server-1.10.6,1 and xf86-video-ati-6.14.3_1 8GB RAM and ZFS. I'm seeing fairly consistent problems with Xorg spinning in swwrt for long periods (I've seen ½hr) and then failing. The resultant Xorg.0.log shows (eg): [854259.962] (EE) RADEON(0): [pci] Out of memory (-12) That message comes from xf86-video-ati-6.14.3/src/radeon_dri.c:RADEONDRIPciInit() and the -12 indicates ENOMEM. That code (indirectly) issues DRM_IOCTL_SG_ALLOC and winds up in sys/dev/drm/drm_scatter.c:drm_sg_alloc(), which uses bus_dma_tag_create(), bus_dmamem_alloc() and bus_dmamap_load() to actually allocate memory below 4GB. Setting hw.dri.0.debug shows that it's trying to allocate 32MB: Jul 3 18:57:49 server kernel: [drm:pid72128:drm_ioctl] pid=72128, cmd=0xc0106438, nr=0x38, dev 0xff000246ee00, auth=1 Jul 3 18:57:49 server kernel: [drm:pid72128:drm_sg_alloc_ioctl] Jul 3 18:57:49 server kernel: [drm:pid72128:drm_sg_alloc] sg size=33554432 pages=8192 Jul 3 19:28:09 server kernel: [drm:pid72128:drm_ioctl] returning 12 [note the timestamps] Whilst drm_sg_alloc() allows non-contiguous allocation (the code just wants 8192 pages), bus_dma(9) states: The current implementation of bus_dmamem_alloc() will allocate all requests as a single segment. (and this is the same in 10-current). bus_dmamem_alloc() for a region greater than one page uses contigmalloc(). I believe that Xorg spinning in swwrt is a regression but I don't have a good idea for when it started (and http://lists.freebsd.org/pipermail/freebsd-stable/2011-February/061369.html suggests that it's been occurring for quite a while). For that matter contigmalloc() also seems to have a long history of causing problems with other parts of FreeBSD - I first crossed swords with it 7½ years ago (when it was causing panics in umass(4)). Previously, the work-around for contigmalloc() issues was to ensure that the appropriate contigmalloc() calls occurred shortly after a reboot - before RAM got too fragmented. That doesn't appear to work here because it looks like Xorg releases and (tries to) re-allocates the memory during a reset (ie on logout). It is a _serious_ nuisance having to reboot because I fumbled my password... Can anyone suggest a way forward? Note that additional RAM isn't an option for this box. How difficult would it be to modify bus_dmamem_alloc() [at least on x86] to handle multi-segment allocations? Does anyone have a tool that can display physical RAM allocation? This would at least allow me to identify offending allocations. http://lists.freebsd.org/pipermail/freebsd-hackers/2011-February/thread.html asks the same question but just peters out. -- Peter Jeremy pgpuclQzdFSg9.pgp Description: PGP signature
Re: contigmalloc() breaking Xorg
On 2012-Jul-03 21:17:53 +1000, Peter Jeremy pe...@rulingia.com wrote: Does anyone have a tool that can display physical RAM allocation? This would at least allow me to identify offending allocations. http://lists.freebsd.org/pipermail/freebsd-hackers/2011-February/thread.html asks the same question but just peters out. That link should be http://lists.freebsd.org/pipermail/freebsd-hackers/2011-February/034321.html sorry about the mis-paste. -- Peter Jeremy pgp1iwrLyMm1P.pgp Description: PGP signature
Re: Replacing rc(8) (Was: FreeBSD Boot Times)
On 2012-Jun-18 19:18:57 -0400, Brandon Falk bfalk_...@brandonfa.lk wrote: As the original poster of this thread, I can also say that Doug is correct. The issue is not rc, it is the actual kernel boot process. I've videoed my netbook rebooting and gone through the video in slow motion and that definitely doesn't match what I see. When it comes to a desktop/laptop/simple server and all you do with rc is configure a static IP, start dbus/hal/sshd, and maybe launch a few jails... at least 90-98% of the boot process is spent doing the kernel work. In my case, starting from when the screen blanks after a reboot, the breakdown is: 7.4s - POST + BIOS splash (including ~1s waiting for input) 4.4s - boot0/1/2, starting loader (including boot spinners) 1.5s - loading kernel 11.0s - loader countdown 7.0s - kernel startup 32.0s - rc scripts (mounting root through VTY login prompt) 5.0s - X + xdm startup Note that the majority of kernel probe time is: 2000ms - atkbd 750ms - ata 500ms - memory probe 500ms - ath0 450ms - psm0 So, in my case, rc scripts account for just under 50% of the total boot time and 50% of the remainder are various waiting for input timeouts. The kernel amounts for 10% of the total time and 50% of that is 4 devices. I intend to work through the rc process in more detail to see where I can reduce the elapsed time. -- Peter Jeremy pgpfk36pZhHvM.pgp Description: PGP signature
Re: Replacing rc(8) (Was: FreeBSD Boot Times)
On 2012-Jun-21 10:09:01 -0700, Doug Barton do...@freebsd.org wrote: On 06/21/2012 05:28 AM, Peter Jeremy wrote: 32.0s - rc scripts (mounting root through VTY login prompt) I think that there is some confusion about what I wrote originally, so let me clarify. From the time that /etc/rc starts through the time that the prompt appears almost all of the time is spent waiting for the services to start. There is very little time spent IN the rc scripts themselves (barring something that is poorly written of course). Agreed - I (and I expect everyone else) am using rc script to cover the total wall time between exec()ing the script and it returning - in most cases, this is almost entirely synchronous service startup time. The end-user experience is governed by how long does it take between rebooting or turning the power on and when I can login or interact with my service. Reducing this total time is going to require a combination of changes in multiple areas. One point I'd make is that the rc scripts run with cold caches so reads cause physical I/O. There are somewhat over 150 scripts in /etc/rc.d and a variable number in /usr/local/etc/rc.d (I have between 6 33 on different systems). rcorder(8) needs to read each script so, on a system using spinning rust, this amounts to 2-3 seconds overhead. So the only way to improve the time from /etc/rc to usable system (whatever that means for the user) is to see what we can parallelize. The problem is that this is a really hard problem. :) And as someone pointed out, changing from a serial to a parallel process is going to be disruptive because it will uncover the inadequately specified dependencies that we have now which are hidden by the serial process ... (I mentioned this problem). One (though intrusive) way would be to use the approach the ports system used when it enabled parallelism within port builds: Add new keyword(s) within each script to control parallelism for that script. Initially, the infrastructure would assume serial unless scripts were explicitly marked parallel or background but once the situation was sufficiently under control, it could be flipped to assume parallel unless a script specifically specified serial. (Note that I haven't looked at the detail of actually implementing this). As an aside, usable system is a relevant point. My Netbook originally came with Linpus and it took about 30s from poweron until the Linpus GUI was displayed and allowed user interaction. This looked quite impressive but it took another 30-60s before the system was actually usable because the GUI was started quite early before (eg) the network was up. -- Peter Jeremy pgpXRyu725r3y.pgp Description: PGP signature
Re: Replacing rc(8) (Was: FreeBSD Boot Times)
On 2012-Jun-21 00:17:11 +0200, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: - Lack of dependency handling for manual start/stop which is not really a problem and often an adventage. In your opinion. IMO, runlevels are a mostly a work-around to hide the lack of proper dependency handling. One obvious use for dependencies is where I have an occasional need for a box to be a NFS server. Given proper dependency handling, I can say service nfsserver onestart and it will automatically start rpcbind, statd, lockd and mountd. Another case is doing single-user maintenance on a ZFS system. It would be nice to be able to just say service zfs start instead of having to remember to start hostname and hostid first. I agree there are cases where you might want to ignore dependencies. This is easily handled with at ignore dependencies flag on service. - No provision to automatically restart a daemon if it dies. but it should not be a part of rc subsystem at all. Both the monitoring tool and rc subsystem have to interwork to ensure that services aren't inappropriately started or stopped. You can treat them as separate if you insist but the interactions make it much easier if they are designed together. First - deamons should not die without reason. Agreed but sometimes they can for a wide variety of reasons. If they do, admin should clearly know it Agreed. But this still requires infrastructure that is not currently avaliable in the base system and is irrelevant to the issue of whether the daemon should be restarted automatically. and feel it's effects, Some of this can be automated. And a fallback of try starting it a few times and complain loudly if that doesn't work is easily implemented (init(8) does it now) and generally works. and after fixing a source problem , restart it manually. This can present difficulties if you can't actually login remotely because it's sshd that has aborted unexpectedly. And it can be irritating when you get woken at 0300 just to restart some random service that glitched. in case when you are for some reason required (temporarily of course) to use daemons that often dies, then just make restart wrapper shell script and put it in place of actual daemon in rc.d script. It's nothing to do with often dies. Unless you have bug-free sotfware, you can have rarely-and-unexpectedly dies - which is just as annoying if it occurs when you (or a customer) needs it. And, with this approach, you wind up with N protected daemons and a further N (or 2N if you're paranoid) monitoring daemons - each thrown together independently. A single, central process that can detect when a process dies (or fails some pre-defined working normally test) and optionally take some pre-defined action would seem preferable. (Hint: init(8) can already do a lot of this). there is IMHO already too much automata in default FreeBSD: default /etc/crontab, /etc/newsyslog.conf and /etc/periodic directory. All gets deleted by me as soon as i install FreeBSD. You are free to disable or delete as much of FreeBSD as you like but I personally prefer my systems to reduce my workload by automating normal maintenance tasks. -- Peter Jeremy pgp749EyaVr0n.pgp Description: PGP signature
Re: Replacing rc(8) (Was: FreeBSD Boot Times)
On 2012-Jun-20 09:05:05 -0600, Daniel Robbins drobb...@funtoo.org wrote: I see a great potential for collaboration here between Gentoo, Funtoo (my current project, a derivative/fork of Gentoo), FreeBSD and OpenRC (which is now an independently-managed project, distinct from the upstream distros) The more different projects can share common code, the better. But if boot time isn't a huge priority, then maybe it is the wrong place to focus. Boot time is an issue for some people - even people with never rebooted servers need fast boot times when they _do_ need to reboot (hardware failures, kernel security fixes) to get that last '9' of uptime. I think the big benefit of OpenRC to FreeBSD is that we are looking to continually improve it and include you in the requirements-gathering process for future development efforts. Even if FreeBSD doesn't switch to OpenRC, it's definitely worth looking at the shortcomings of the current rc system and how it could be improved. The most obvious ones (IMHO) are: - Lack of dependency handling for manual start/stop - No provision to automatically restart a daemon if it dies. Solaris SMF has already been mentioned. As someone who has had the misfortune to use it, I would say that the underlying concept is nice but the implementation is a disaster. In particular, _everything_ is different to traditional Unix init systems. The systems administrator needs to learn a completely new mindset for interacting with the init system and the package developer has to develop completely different service management scripts. On 2012-Jun-20 17:28:45 +0200, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: 1) can it be compatible with 2 ports already made for FreeBSD, where many of them install rc.d scripts in CURRENT format. The dependency information should already be encoded in both base and ports rc scripts. Unfortunately, it looks like OpenRC encodes the information in a different way, so it's not just a plugin replacement. One task for anyone wanting to integrate OpenRC would be working out how to handle this - preferably without rewriting every rc script. Since we already have dependency information, there is no technical reason why a tool like rcorder(8) couldn't indicate that particular scripts or groups of scripts could run in parallel. In practice, I expect that doing so will turn up a large number of scripts which have incorrect dependency information which has been masked by the current serial processing. Anyone implementing parallel rc processing will need to be able to distinguish between errors in their tools and errors in the actual rc scripts. I know dougb@ regularly picks up issues with new updated ports but it's not realistic to rely on him manually picking up every rc script error. -- Peter Jeremy pgpvt6oBaRAkM.pgp Description: PGP signature
Re: FreeBSD Boot Times
On 2012-Jun-13 21:55:22 +0200, Hans Petter Selasky hsela...@c2i.net wrote: Try setting: sysctl hw.usb.no_boot_wait=1 Note that this is a tunable and will need to be specified in /boot/loader.conf to have any effect. -- Peter Jeremy pgpojiOBCDYfk.pgp Description: PGP signature
Re: Upcoming release schedule - 8.4 ?
On 2012-Jun-14 08:09:30 +0100, Chris Rees utis...@gmail.com wrote: Except STABLE is no good for production, and the problem is EoL- updates and support stop. There's nothing stopping you from from running -stable in production. Obviously, you will need to do more extensive testing than you might need to for a -release. As for EoL, all software goes EoL and support for that software stops. FreeBSD releases are typically supported for 4 years - IMO, that's excellent value for money. On 2012-Jun-14 12:48:19 +0100, Chris Rees utis...@gmail.com wrote: On Jun 14, 2012 9:30 AM, Damien Fleuriot m...@my.gd wrote: I've moved us to 8.3-STABLE recently and am quite happy with it, so far. Too strong wording perhaps; but you can't claim that an EOL stable branch will have the level of support afforded to live branches. That was supposed to be my point, as Mark has also explained. You are the only person that is claiming that 8.x is EOL. I have not seen any official announcement to that effect. The absence of an announcement of 8.4-release does not make it EOL. -- Peter Jeremy pgpxLxs9FqeLP.pgp Description: PGP signature
Re: detailed map of WIRED memory under FreeBSD 9
On 2012-Jun-01 10:19:37 +0200, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: what tool and how can be used to display detailed map what exactly wired memory on my system as it is far way too much (1.5GB out of 4GB RAM). The procfs map pseudo-file should give you this information on a per-process level. Unfortunately, the only documentation appears to be the source (sys/fs/procfs/procfs_map.c) -- Peter Jeremy pgpPIwypWnWVm.pgp Description: PGP signature
Re: proper newfs options for SSD disk
On 2012-May-25 22:46:10 +0200, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: Should I split /dev/ada1 into two separate partitions, one for real /usr/local and one for my HOME and only crypt this with geli(8)? i would make / on 16GB SSD (including /usr, /usr/local, don't divide to partitions) and geli encrypted home on 4GB SSD (whole device). if sometimes 4GB would be too little for /home then you would manually move things that don't need encryption somewhere else. This may be a worthwhile suggestion. Why? Your laptop have most probably slow CPU and it will make everything too slow if you make everything encrypted. I'd suggest some experiments - create a largish RAMdisk with and without GELI and see how the performance compares (this will be a lot faster than converting your SSD as well as saving a full-SSD erase/write cycle). As for the overall SSD write rate, some time ago, I worked through minimising the write activity on the SSD in my laptop (I wrote a tool that monitors write transfers via devstat(3) and it would be possible to track down the actual modified files via kqueue(2) if necessary). I'm now down to about two chunks of about 13 transfers each per hour (due to entopy saving and ntp.drift updating). The changes were: 1) Mount the SSD filesystems as noatime 2) Turn off all local syslogging (syslog is directed to another system when my laptop is at home, lost otherwise). 3) Change maillog rotation to size instead of daily 4) Run save-entropy once per hour instead of roughly every 11 minutes. [Note that */11 means 0,11,22,33,44,55 not every 11 minute] 5) Patch the save-entropy script to reduce the write load when it's run (see PR bin/134225). 6) Use a swap-back /tmp As for applications - firefox generates quite a heavy write load during normal use. Moving the cache to /tmp will help but I don't think there's any complete solution. Also, you're probably better off running a traditional lightweight window manager than something like KDE or Gnome. -- Peter Jeremy pgpxd8fhB1HUY.pgp Description: PGP signature
Re: proper newfs options for SSD disk
On 2012-May-18 22:54:43 +0200, Dimitry Andric d...@freebsd.org wrote: Be sure to use -t enable when creating the filesystem: Only if your SSD supports TRIM. Some consumer-grade SSDs don't and get very confused if sent TRIM commands. -- Peter Jeremy pgp2LuXn5iRWb.pgp Description: PGP signature
Re: NFS - slow
On 2012-Apr-27 22:05:42 +0200, Wojciech Puchar woj...@wojtek.tensor.gdynia.pl wrote: is there any way to speed up NFS server? ... - write works terribly. it performs sync on every write IMHO, You don't mention which NFS server or NFS version you are using but for traditional NFS, this is by design. The NFS server is stateless and NFS server failures are transparent (other than time-wise) to the client. This means that once the server acknowledges a write, it guarantees the client will be able to later retrieve that data, even if the server crashes. This implies that the server needs to do a synchronous write to disk before it can return the acknowledgement back to the client. -- Peter Jeremy pgpDukb68dqQA.pgp Description: PGP signature
Re: Graphical Terminal Environment
On 2012-Mar-06 14:08:51 -0500, Brandon Falk bfalk_...@brandonfa.lk wrote: When I mention lines, circles, etc I was thinking moreso at the very low level of fonts being drawn by lines and dots (although I would like to branch it eventually to support 2d graphics where people could maybe make some 2d games, but keep the high-res terminal on the side to keep it minimal). I also may want to draw some lines to border terminal windows (screen would eliminate this obviously). If you're looking for something minimal, vector support should be one of the first things to go. At small sizes (in terms of dots), the best fonts are all bitmaps, rather than vector descriptions. One of the features of TrueType and Postscript is that a vendor can provide hand-tweaked bitmap glyphs for small sizes of a vector font. Likewise the VT100 demonstrated that you don't need vector line drawing to draw boxes. Some points to keep in mind: Anything beyond what is supported in your VESA BIOS requires custom support for your specific video chip. This is part of the code in x11-drivers/xf86-video-*. LCD monitors look fairly poor unless driven at their native resolution so, unless your VESA BIOS provide a mode that suits your monitor, you will need custom driver code. -- Peter Jeremy pgpavw3qB3HKy.pgp Description: PGP signature
Re: FreeBSD has serious problems with focus, longevity, and lifecycle
On 2012-Jan-21 02:07:35 +0100, Daniel Gerzo dan...@freebsd.org wrote: I have already said it in this thread - I believe we should consider issuing much more errata notices (i.e. -pX); with that I mean we should consider more bugs as major bugs. I don't really see a reason why not. The problem is that one person's major bug is completely unimportant to another person. Increasing the number of errata notices would annoy just as many people as it would please. And we should also provide some mechanism to allow for cherry-picking individual errata notices to be applied on the given release (e.g. p1, p2, p5, but not p3, p4). This is a nice idea and is something you might be able to do yourself (because the exact list of changes are in the errata notes) but I don't believe this is practical for the Project to support. Errata updates need to be lightweight from the Project's point of - it can't afford to spend months testing them. With the current model, p4 can only be applied on top of p3 so releasing p4 means: 1) verifying that applying the p4 changes to p3 corrects the bug(s) that caused p4 to be created. 2) doing regression tests to ensure that adding the p4 change to p3 doesn't break anything. With your model, someone can apply p4 on top of any combination of p0, p1, p2, p3 - 8 possibilities. This means there's now 8 times the effort involved in releasing the errata update and it increases exponentially with the number of errata. And it gets more complicated if more than one update affects the same (or related) areas of code. I don't think it makes sense to issue errata notices for driver updates as in support for new devices because we don't ship installation media with these patches applied. I'm not sure that's really an issue - except for people who can't install because the updated driver is needed to support their install media. That said, driver updates can be problematic - someone who has a new chip that isn't supported by the current driver, or who is having a serious issue with the current driver will want the update. Someone who isn't having problems won't want to touch their driver in case it introduces problems with their system. -- Peter Jeremy pgpVm8HHD84Kg.pgp Description: PGP signature
Re: reducing compiler instances during buildkernel
On 2011-Nov-05 22:30:21 +, Alexander Best arun...@freebsd.org wrote: wouldn't it be possible to somehow spawn N gcc or clang instances (make -jN buildkernel) and then pipe the src to one of those N instances? just like with something like multics N processes were started and then people used the job control language to load binaries into those processes. This is likely to require very non-trivial changes to gcc or clang. The major issue is that the process needs to be in a known initial state before beginning a compile - and it's very unlikely that the compiler cleans itself up enough to return to that state. If you really want to trim low-hanging fruit, try disposing of libtool and GNU configure instead - their overheads are _many_ orders of magnitude higher than make exec()ing gcc. -- Peter Jeremy pgpi5RM6Xcqpb.pgp Description: PGP signature
Re: Porting FreeBSD to Raspberry Pi
On 2011-Nov-03 10:22:22 +0100, Lars Engels lars.eng...@0x20.net wrote: http://www.raspberrypi.org/archives/302 ... If someone is willing to port FreeBSD to the Raspberry, I'd try to get one of the boards and send it to the porter. Whilst FreeBSD on the Raspberry Pi would be attractive, I'm not sure how practical it would be. http://www.raspberrypi.org/archives/28 makes it fairly clear that the multimedia side will be all closed source and relevant datasheets will not be available. -- Peter Jeremy pgpjYASZmxhP3.pgp Description: PGP signature
Re: Measuring memory footprint in C/C++ code on FreeBSD
On 2011-Oct-20 19:57:31 +0200, Razmig K strontiu...@gmail.com wrote: I'd like to measure the memory footprint in C/C++ code for a program running under FreeBSD and Linux in terms of total process size including heap objects. Due to execution length, I'd like to avoid the use of valgrind. It's not clear whether the program is attempting to determine it's own (or a child's) memory footprint, or that of an arbitrary process. In the former case, getrusage() is the obvious choice. This as a portable interface. If you want to examine arbitrary processes, the best interface on FreeBSD would be kvm_getprocs(3). BTW, since you mention heap objects, I presume you are aware that malloc() uses mmap(), rather than sbrk() to obtain memory. -- Peter Jeremy pgphug33XKVIW.pgp Description: PGP signature
Re: excessive use of gettimeofday(2) and other syscalls
On 2011-Sep-07 12:41:54 -0600, Manish Vachharajani mani...@lineratesystems.com wrote: This is great info, thanks. Is it worth having some kind of environment variable tunable (or even compile time tunable) to have a fast gettimeofday then? Maybe. rwatson@ produced a preloadable .so to do this - see http://www.watson.org/~robert/freebsd/clock/ Is there a complimentary body of code that assumes gettimeofday is precise? I'm not aware of any but it's not necessarily trivial to identify such code - it's unlikely to fail outright but instead may deliver results that may not be as accurate as the author intended. I think a better way of looking at the problem is that some code was designed on the assumption that certain operations were cheap and therefore uses those operations more freely than it would have had those operations been more expensive. -- Peter Jeremy pgp8RtgcBLrmz.pgp Description: PGP signature
Re: excessive use of gettimeofday(2) and other syscalls
On 2011-Sep-06 16:44:48 -0600, Manish Vachharajani mani...@lineratesystems.com wrote: Under 7.3 (haven't checked 8 or 9) this issue crops up because the time system call calls gettimeofday under the hood (see lib/libc/gen/time.c). As a result, the kernel tries to get an accurate subsecond resolution time that simply gets chopped to the nearest second. Under 8.x and later, time(3) uses clock_gettime(CLOCK_SECOND,...) rather than gettimeofday(). This is intended to be much cheaper than gettimeofday(). On 2011-Sep-06 21:15:55 -0400, Rayson Ho raysonlo...@gmail.com wrote: IMO, the time returned by gettimeofday does not need to be high precision. There are higher resolution time APIs on Linux and I believe the application programmers know when to use the slower but more accurate clock API. There are 3 standard APIs for returning time of day: time(3) provides second precision gettimeofday(2) provides microsecond precision clock_gettime(2) provides nanosecond precision By default, FreeBSD attempts to provide resolution as close as possible to the precision - which makes the 2 system calls fairly expensive. In order to reduce the cost where the resolution isn't important, FreeBSD provides several non-standard clock types for clock_gettime(2). This approach differs from Linux - and it seems that there is a non-trivial body of code that assumes that calling gettimeofday() is very cheap. There is probably a good case for an API that provides a resolution of the order of a tick but there is no standard for this. -- Peter Jeremy pgpvoAwjmooj4.pgp Description: PGP signature
Re: ZFS installs on HD with 4k physical blocks without any warning as on 512 block size device
On 2011-Aug-22 12:45:08 +0200, Ivan Voras ivo...@freebsd.org wrote: It would be suboptimal but only for the slight waste of space that would have otherwise been reclaimed if the block or fragment size remained 512 or 2K. This waste of space is insignificant for the vast majority of users and there are no performance penalties, so it seems that switching to 4K sectors by default for all file systems would actually be a good idea. This is heavily dependent on the size distribution. I can't quickly check for ZFS but I've done some quick checks on UFS. The following are sizes in MB for my copies of the listed trees with different UFS frag size. These include directories but not indirect blocks: 1b 512b 1024b 2048b 4096b 4430 4511 4631 4875 5457 /usr/ncvs 4910 5027 5181 5499 6133 Old FreeBSD SVN repo 299 370 485733 1252 /usr/ports cheched out from CVS 467 485 509557656 /usr/src 8-stable checkout from CVS Note that the ports tree grew by 50% going from 1K to 2K frags and will grow by another 70% going to 4KB frags. Similar issues will be seen when you have lots of small file. -- Peter Jeremy pgp3V7msgMGk0.pgp Description: PGP signature
Re: invalid argument in select() when peer socket is in FD_SET
On 2011-Jul-31 16:20:08 +0200, Christoph P.U. Kukulies k...@kukulies.org wrote: if (array_of_fds[i]) { nfds = max(nfds, array_of_fds[i]) + 1; I suspect you mean: nfds = max(nfds, array_of_fds[i] + 1); FD_SET(array_of_fds[i], readfds); } -- Peter Jeremy pgpdiN44x0CiJ.pgp Description: PGP signature
Re: [PATCH] bogus use of __linux__ in aicasm
On 2011-Jul-02 17:22:15 +0200, Robert Millan r...@debian.org wrote: Code in sys/dev/aic7xxx/aicasm/ contains a few checks on the __linux__ macro that actually break build on GNU systems (including Linux-based ones but also GNU/kFreeBSD). Thanks for all the patches. I wonder if you could use send-pr to formally log them so they don't get lost. -- Peter Jeremy pgpfo8hhIEWmy.pgp Description: PGP signature
Re: Active slice, only for a next boot
On 2011-May-30 17:42:39 +, Dieter BSD dieter...@engineer.com wrote: A better approach is to be able to boot whatever slice you want without having to change the active slice. NetBSD can do this. The MBR puts up a menu of the bootable slices on the disk being booted. You can allow the timer to time out and boot the default. Or you can enter the number of the slice you want to boot. Or you can type a function key F1 F2 ... to boot a different disk, and it will load the MBR from that disk and run it. There is an alternative for keyboards without function keys. So can FreeBSD - though only for MBR - this functionality doesn't seem to have made it into the GPT bootcode. And it works great. Except that one of the 27 stages of boot code that FreeBSD uses INSISTS on booting the active slice, so you can tell the MBR to boot slice 3 and slice 3's boot code sees that slice 4 is active and boots slice 4. Multibooting worked correctly when I last used it (a few years ago). Have you raised this as a PR? RS-232 console + hardware modem + POTS = remote console And even that doesn't fully work unless you have a serial-aware BIOS. -- Peter Jeremy pgpuWO4m4fFPp.pgp Description: PGP signature
Re: Prebind from OpenBSD
On 2011-Mar-27 20:54:18 +0100, Robert Watson rwat...@freebsd.org wrote: part of rc.d. I'd also investigate large applications like Firefox, Chrome, KDE, Gnome, etc. KDE already integrates prebinding tricks in its design, but I don't think the others do. Improving startup time for large, infrequently started executables will enhance the user experience but not do a great deal for overall system performance. (And note that things like OOo, emacs and browsers have significant amounts of code in embedded scripting languages and it's unlikely that pre-binding will help much there). I suspect a bigger overall win would be gained by speeding up small but very frequently started executables - /bin/sh is the most obvious candidate here, though there are probably other candidates in /bin and /usr/bin. In this case, you need to measure how frequenctly it is started as well as the per-startup savings. For some of these executables, it's easy to get a reasonably accurate estimate of the potential pre-binding savings by comparing the speed of the existing dynamically-linked executable in /bin with the same statically-linked executable in /rescue. One thing that I'm not sure if you've take into account is process- initiated library loading (using dlopen(3) and friends). Note that even /bin/sh can do this through things like locale handling. -- Peter Jeremy pgpj6ptimMgB2.pgp Description: PGP signature
Re: [GSoc] Timeconter Performance Improvements
On 2011-Mar-25 08:18:38 -0400, John Baldwin j...@freebsd.org wrote: For modern Intel CPUs you can just assume that the TSCs are in sync across packages. They also have invariant TSC's meaning that the frequency doesn't change. Synchronised P-state invariant TSCs vastly simplify the problem but not everyone has them. Should the fallback be more complexity to support per-CPU TSC counts and varying frequencies or a fallback to reading the time via a syscall? I believe we already have a shared page (it holds the signal trampoline now) for at least the x86 platform (probably some others as well). r217151 for amd64 and r217400 for ppc. It doesn't appear to be supported on other platforms. My reading of the code is that there is a single shared page used by all processes/CPUs. In order to support non-synchronised TSCs, this would need to be changed to per-CPU. -- Peter Jeremy pgpTiRyo5tsg4.pgp Description: PGP signature
Re: [GSoc] Timeconter Performance Improvements
On 2011-Mar-24 17:00:02 +0800, Jing Huang jing.huang@gmail.com wrote: In this scenario, I plan to use both tsc and shared memory to calculate precise time in user mode. The shared memory includes system_time, tsc_system_time and factor_tsc-system_time. This sounds like a reasonable approach to me. Note that once we implement a shared page, there is probably a variety of other information we could usefully place on that page. SunOS 4.x included a page of shared memory per CPU. This was mapped as an array (indexed by CPU number) at one address and the page reflecting the current CPU was additionally mapped at another fixed address. This allowed a process to both refer to data on its CPU as well any CPU on the system. We also consider the CPU frequency, because tsc counter is related to it. When kernel changes CPU frequency, the shared memory should be update subsequently. Two issues with this, particularly on x86 without invariant TSC: - looking up the current CPU frequency may not be a cheap operation - the reported CPU frequency appears to be just an approximate value, rather than the actual TSC frequency. On 2011-Mar-24 21:34:35 +0800, Jing Huang jing.huang@gmail.com wrote: As I know, tsc counter is CPU specific. If the process running on a multi-core platform, we must consider switching problem. The one way, we can let the kernel to take of this. When switching to another CPU, the kernel will reset the shared memory according to the new CPU. I'm not sure what the cost of managing this page mapping will be. The second way, we can use CPUID instruction to get the info of current CPU, which can be executed in user mode ether. At the same time, the kernel maintains shared memory for each CPU. When invoke gettimeofday, the function will calculate precise time with current CPU's shared memory. This approach suffers from a race condition between the CPUID instruction and accessing the appropriate shared page - there is the potential for an interrupt causing the process to be switched to a different CPU, resulting in an incorrect page being accessed. -- Peter Jeremy pgpHImAnkRcSI.pgp Description: PGP signature
Re: Greetings
Hi Dheeraj, On 2010-Dec-01 04:19:46 +0530, dheeraj suthar dheerajsuthar2...@gmail.com wrote: Kindly do guide me(as I am new here.) and involve me in some programming project related to above mentioned fields. Also I am currently going through project lists on FreeBSD list and will apply soon to which suits my skill set. I suggest you have a read through http://www.freebsd.org/projects/ideas/ to find something that sounds interesting to you and then contact the relevant person. If no contact is shown then ask about it here. -- Peter Jeremy pgpa6M4hDHi0x.pgp Description: PGP signature
Re: Unhappy with cross-worlding
On 2010-Nov-16 03:24:22 +0100, rank1see...@gmail.com wrote: I decided to go for amd64 (it's name, is so deceiving, that I've just recently, accidentaly figured out, that it can be used, with intel CPUs, too) :P Well, FreeBSD had support for the amd64 before Intel licensed the architecture from AMD and renamed it. By the time Intel had called it EM64T, the FreeBSD Project decided it was too late to rename its port. That said, this has caused a degree of confusion over the years. -- Peter Jeremy pgp8z2gUE46vO.pgp Description: PGP signature
Re: dump cache performance
On 2010-Oct-27 20:17:06 +0200, Dag-Erling Smørgrav d...@des.no wrote: Peter Jeremy peterjer...@acm.org writes: I've mostly convered to ZFS but still have UFS root (which is basically a full base install without /var but including /usr/src - 94k inodes and 1.7GB). I've run both the 8-stable (stable) and patched (jfd) dump alternately 4 times with 50/250MB cache with the following results: x stable + jfd ++ | +| | +| | x+| |x xx +| ||AMA| ++ N Min MaxMedian AvgStddev x 4 9413 9673 95689555.5 107.12143 + 4 15359 15359 15359 15359 0 9413 what? Puppies? Ooops, sorry - KB/sec as reported in the dump summary. -- Peter Jeremy pgp3QZs1Cb6ob.pgp Description: PGP signature
Re: dump cache performance
On 2010-Oct-24 18:05:05 +0200, Jean-Francois Dockes j...@dockes.org wrote: It appears that modifying dump to use a shared cache in a very simple way (move the control structures to the shared segment and perform simple locking) yields substantial speed increases. Indeed. That's better than I expected. Would someone be interested in reviewing the patch and/or perform more tests ? I've mostly convered to ZFS but still have UFS root (which is basically a full base install without /var but including /usr/src - 94k inodes and 1.7GB). I've run both the 8-stable (stable) and patched (jfd) dump alternately 4 times with 50/250MB cache with the following results: x stable + jfd ++ | +| | +| | x+| |x xx +| ||AMA| ++ N Min MaxMedian AvgStddev x 4 9413 9673 95689555.5 107.12143 + 4 15359 15359 15359 15359 0 Difference at 95.0% confidence 5803.5 +/- 131.063 60.7347% +/- 1.3716% (Student's t, pooled s = 75.7463) -- Peter Jeremy pgpTPIvainovD.pgp Description: PGP signature
Re: Reading rtc on FreeBSD
Repeating your question will not encourage an answer. On 2010-Aug-19 13:09:46 +0300, phil hefferan wdef...@gmail.com wrote: I've been looking around for how to read the cmos/rtc on FreeBSD. There is no hwclock utility in FreeBSD that I can read sources for to see how it is done. The RTC is only accessed within the kernel (/sys/isa/atrtc.c for i386 and amd64) and read in /sys/kern/subr_rtc.c::inittodr() implies that, on FreeBSD, gettimeofday reads the software time and settimeofday sets the cmos clock. I read here http://www.mail-archive.com/freebsd-hardw...@freebsd.org/msg03414.html that settimeofday in fact sets both rtc and system time together. gettimeofday(2) reads the software clock only. settimeofday(2) writes both the software clock and RTC. BUT the source to adjkerntz.c for FreeBSD seems to say that gettimeofday reads the CMOS clock not the system time: /* get local CMOS clock and possible kernel offset */ if (gettimeofday(tv, tz)) { syslog(LOG_ERR, gettimeofday: %m); return 1; } That comment is incorrect. Which is it? Does gettimeofday read the cmos clock/rtc on FreeBSD? If not, how do I read the battery-backed clock on FreeBSD? There is no managed access to the RTC in FreeBSD. Your only option to read the RTC is to directly access its IO port registers via io(4) or i386_set_ioperm(2) -- Peter Jeremy pgpi50bk1nXdH.pgp Description: PGP signature
Re: disk I/O, VFS hirunningspace
Regarding vfs.lorunningspace and vfs.hirunningspace... On 2010-Jul-15 13:52:43 -0500, Alan Cox alan.l@gmail.com wrote: Keep in mind that we still run on some fairly small systems with limited I/O capabilities, e.g., a typical arm platform. More generally, with the range of systems that FreeBSD runs on today, any particular choice of constants is going to perform poorly for someone. If nothing else, making these sysctls a function of the buffer cache size is probably better than any particular constants. That sounds reasonable but brings up a related issue - the buffer cache. Given the unified VM system no longer needs a traditional Unix buffer cache, what is the buffer cache still used for? Is the current tuning formula still reasonable (for virtually all current systems it's basically 10MB + 10% RAM)? How can I measure the effectiveness of the buffer cache? The buffer cache size is also very tightly constrained (vfs.hibufspace and vfs.lobufspace differ by 64KB) and at least one of the underlying tuning parameters have comments at variance with current reality: In sys/param.h: * MAXBSIZE - Filesystems are made out of blocks of at most MAXBSIZE bytes * per block. MAXBSIZE may be made larger without effecting ... * * BKVASIZE - Nominal buffer space per buffer, in bytes. BKVASIZE is the ... * The default is 16384, roughly 2x the block size used by a * normal UFS filesystem. */ #define MAXBSIZE65536 /* must be power of 2 */ #define BKVASIZE16384 /* must be power of 2 */ There's no mention of the 64KiB limit in newfs(8) and I recall seeing occasional comments from people who have either tried or suggested trying larger blocksizes. Likewise, the default UFS blocksize has been 16KiB for quite a while. Are the comments still valid and, if so, should BKVASIZE be doubled to 32768 and a suitable note added to newfs(8) regarding the maximum block size? -- Peter Jeremy pgp33d0jx50sK.pgp Description: PGP signature
Re: Using lex in a shared library
On 2010-Jul-02 23:53:17 +0100, Philip Herron redbr...@gcc.gnu.org wrote: Although maybe not helpful but have you considered using automake/libtool instead makes it so much simpler in my opinion. ... Automake will auto-handle Lex and Yacc files too. And is extremely portable. You are joking, right? Of all the supposedly portable build environment tools I've used, GNU autotools is by far the slowest, most bloated and least portable. And when you run into problems, you are faced with trying to follow hundreds of KB of opaque shellscript and obfuscated makefiles. -- Peter Jeremy pgpWjLAOqgSaA.pgp Description: PGP signature
Poor file(1) performance
I recently had reason to run file(1) (file-5.03 on FreeBSD 8-stable) on a large number of files and felt the performance wasn't up to par. When I investigated further, I found that about 95% of the runtime related to the two regex's to recognize REXX files: # OS/2 batch files are REXX. the second regex is a bit generic, oh well # the matched commands seem to be common in REXX and uncommon elsewhere 100 regex/c =^[\ \t]{0,10}call[\ \t]{1,10}rxfunc OS/2 REXX batch file text 100 regex/c =^[\ \t]{0,10}say\ ['] OS/2 REXX batch file text Since REXX files are not present in my environment, I can avoid the issue by just commenting out the offending lines. Someone with more expertise in magic(5) might be able to suggest a better fix. -- Peter Jeremy pgpkae67IVhm4.pgp Description: PGP signature
Re: GSoC: Making ports work with clang
On 2010-May-03 16:33:19 +0300, Andrius Morkūnas hinok...@gmail.com wrote: I wasn't talking about any specific port. What I meant is that new hardware won't stop coming out just because FreeBSD decided not to update their gcc. New CPUs may have new instructions and other things that are different from their predecessors in one way or another. As an example of an increasingly common CPU that gcc 4.2 doen't support, consider the Intel Atom. It supports the 'Core' (ie up to SSSE3) instructions but only does in-order execution (like the Pentium 1). -- Peter Jeremy pgpKNqqUkjeJl.pgp Description: PGP signature
Re: periodically save current time to time-of-day hardware
On 2010-Mar-27 01:38:36 +0100, Dag-Erling Smørgrav d...@des.no wrote: Peter Jeremy peterjer...@acm.org writes: It's not especially important how regularly the RTC is updated, just that it _is_ updated. This suggests that an alternative approach would be for adjtime() / ntp_adjtime() to directly call resettodr() if it's more than P minutes since resettodr() was last called. It just occurred to me that resettodr() is very slow (it usually involves writing to NVRAM over an I2C bus), so it might not be a good idea to call it from adjtime(). Traditionally, the (PC) RTC is on the ISA bus (though it's possible it might use I2C on other architectures or LPC on current PCs). I thought about speed but only in terms of simulated ISA accesses and didn't think that adjtime() / ntp_adjtime() were especially time critical (resettodr() should occur after they have updated the kernel TOD parameters). The alternative would be a kthread to update the RTC and I didn't think that was worth it. Note that resettodr() is currently called with Giant held so if it _is_ excesssively slow, it might be worthwhile reviewing the existing code in kern_time.c::settime() and subr_clock.c::sysctl_machdep_adjkerntz(). As a general comment, whilst resettodr() needs to be serialised, there is no need for it to block. If thread B wants to call resettodr() whilst thread A is doing so, thread B can just skip the call because calling resettodr() twice in quick succession has no benefit. It does if thread B set the system clock before calling resettodr() (think ntpd -gq). Yes - I hadn't considered resettodr() taking a non-trivial time to execute. This could allow the scenario: Thread A grabs the RTC update lock and begin updating the RTC and, whilst it's doing so, thread B updates the system clock and then calls resettodr() - which turns into a no-op because the update lock is held. Actually, it might be a good idea to call resettodr() any time the clock is stepped. This should occur now via kern_time.c::settime(). Given that: - resettodr() needs to be serialised; - resettodr() may take a significant amount of time; and - resettodr() should ideally be synchronised to the second boundary; maybe creating a kthread to manage the RTC updating is reasonable. A rough outline of my idea would be: A new kthread which sleeps on channel update_rtc. When woken, it checks to see if it's within (say) 50msec of a second boundary and so, it does a trylock on the (new) RTC mutex. If it grabs the mutex then it performs the update. If it was too far from the second boundary or it fails to grab the mutex then it sleeps until the next second boundary and tries again. The existing resettodr() would then turn into a wakeup(update_rtc). Or is this overkill? -- Peter Jeremy pgpSMceldoUSk.pgp Description: PGP signature
Re: periodically save current time to time-of-day hardware
On 2010-Mar-26 17:30:14 +0100, Dag-Erling Smørgrav d...@des.no wrote: Andriy Gapon a...@icyb.net.ua writes: Dag-Erling Smørgrav d...@des.no writes: Andriy Gapon a...@icyb.net.ua writes: Also, I am aware that the period should be configurable (sysctl). Why? Because there would always be someone who would want a different value :) Although I can see an argument for a sysctl to turn it on or off. Good idea. You can combine the two - P == 0 means don't save, P 0 means save every P minutes. IIRC, Linux saves the clock at shutdown, and every 11 minutes if and only if the system clock is synchronized to an external reference. At least some versions of Linux also save a RTC drift approximation and last set timestamp whenever the RTC is updated. This allows the kernel to better set the system clock from the RTC at boot (ie, our inittodr()). The downside is that this needs to store 8-16 bytes of state somewhere non-volatile. Linux does this using an external program and a file - but finding a location for a regularly updated file that is read very early in the rc.d sequence might be problematic. I know how to add a shutdown hook (event handler), but I don't know how to check if time synchronization is taking place. adjtime() / adjtimex() sets a flag. I'm not sure if (or how) the flag is cleared when synchronization stops (i.e. /etc/rc.d/ntpd stop); perhaps the simplest solution is to set a T = monotime() every time adjtime() is called, and check that monotime() - (T * 60) (P * 60). It's not especially important how regularly the RTC is updated, just that it _is_ updated. This suggests that an alternative approach would be for adjtime() / ntp_adjtime() to directly call resettodr() if it's more than P minutes since resettodr() was last called. As a general comment, whilst resettodr() needs to be serialised, there is no need for it to block. If thread B wants to call resettodr() whilst thread A is doing so, thread B can just skip the call because calling resettodr() twice in quick succession has no benefit. This means the serialisation can be a simple atomic_readandclear_int(). -- Peter Jeremy pgpKIlf9OHHpi.pgp Description: PGP signature
Re: Another tool for updating /etc -- lua||other script language bikeshed
On 2010-Mar-24 14:11:21 +0100, Ivan Voras ivo...@freebsd.org wrote: Since the issue comes around very rarely, I assume there are not many people who also get the shivers when they see a shell script (and then a posixy /bin/sh shell script) more than a 100 lines long? :) With the specific exception of GNU configure and related horrors, I personally don't have anything against shell scripts. You can write good or bad code in any language. Wouldn't it be nice to have a blessed (i.e. present-in-base) script language interpreter with a syntax that has evolved since the 1970-ies? There's awk (though it's somewhat restricted in its abilities to do anything more than text manipulation) but in principle, I agree. The requirements as I see them are (in no particular order): - BSD-compatible license - must be compatible with buildworld (primarily, it must be possible to cross-build) - contains a critical mass of users in the FreeBSD developer (and ideally committer) community - language must be reasonably stable - will a script written today still work correctly in (say) 5 years. - must be acceptable to the vast majority of the user base (no religious wars allowed) There was once Perl in base and even though I personally dislike Perl at least it was a standard of sorts and guaranteed to be there if needed. It was removed because it didn't support cross-building (buildworld is always done as a cross-build) and was evolving at a rate incompatible with the base system. As a possible alternative, or at least to learn about others' opinion on the subject, I'd like to suggest Lua (http://www.lua.org/). As someone who has never used Lua, how well does it meet the requirements above? -- Peter Jeremy pgpLSK4hUsCTp.pgp Description: PGP signature
Re: Future CPUs - 128 threads
On 2010-Feb-11 20:18:04 +0100, Ivan Voras ivo...@freebsd.org wrote: Recent news: Niagara 3 - 128 hardware threads Power 7 - 32 hardware threads I'm not sure how far off real Niagara-3 based products are but you can buy dual-socket T-2 systems (128 h/w threads) off the shelf. Sun announced some years ago that they would be doubling the number of threads per CPU socket every 2 years or so. -- Peter Jeremy pgpetUC0u65iK.pgp Description: PGP signature
Re: book on parallel programming
On 2010-Jan-29 04:24:13 -0500, Sergey Babkin bab...@verizon.net wrote: BTW, looks like DamonNews is dead? All there is left is the emblem and some strange blog. All the rest is gone, including the archives of old issues. Seems it's been taken over by a squatter. Archives are available via the Wayback Machine at http://web.archive.org -- Peter Jeremy pgpKfAoBjxUqj.pgp Description: PGP signature
Re: Checksum mismatch -- will transfer entire file
On 2010-Jan-04 20:37:49 +0600, Victor Sudakov sudakov+free...@sibptus.tomsk.ru wrote: Erik Trulsson wrote: It will add the tag to the file of course. (In CVS tags are stored inside each RCS file.) Actually, FreeBSD's CVS checkin scripts were hacked many years ago to unexpand $FreeBSD$ on checkin so that the actual repo text part just includes $FreeBSD$ and doesn't update on each checkin. So, branching a native CVS repo would still produce a massive change and download of RCS files by cvsup? Yes - because the RCS file includes all the metadata - ie tags. This is very visible when (eg) the ports tree is tagged. -- Peter Jeremy pgpRd7puaVAoK.pgp Description: PGP signature
Re: Checksum mismatch -- will transfer entire file
On 2010-Jan-03 21:45:03 +0600, Victor Sudakov sudakov+free...@sibptus.tomsk.ru wrote: I also see that many changes in the CVS seem to be useless: there are no changes other than file version increments. SVN metadata changes will appear in CVS as version changes only. The most obvious/common case is branching - branching a native CVS repo just adds the branch tags. Branching a SVN repo replicates the tree and the SVN-CVS exporter turns the branch into a commit that touches each affected file. -- Peter Jeremy pgpYOZtfKwxYa.pgp Description: PGP signature
Re: Wine on amd64 in 32 bit jail
On 2009-Nov-19 17:12:19 -0600, Sam Fourman Jr. sfour...@gmail.com wrote: I would like to help get this working.. is there a howto somewhere to setup a i386 jail on amd64? I used teh instructions on http://wiki.freebsd.org/Wine (and pointed the jail to /compat/i386) I haven't tried wine, but I do have an i386 jail on my main amd64 server (primarily to build apps for my netbook) and have managed to build all the apps I want (including Firefox, OpenOffice.org and jdk15). I have a full i386 world installed in the jail and have the following overrides in my environment: MACHINE=i386 UNAME_p=i386 UNAME_m=i386 I did run into problems initially because my i386 userland wasn't aligned with my amd64 kernel but rebuilding both fixed that (I'm running 8.0-RC1 and a bit). Note that some tools that poke around in kernel innards won't work - ps and lsof are the most obvious. ktrace works but the resultant ktrace.out files need to read with an amd64 kdump. Inside teh jail uname -a still produces this: FreeBSD i386.puffybsd.com 8.0-RC3 FreeBSD 8.0-RC3 #0: Wed Nov 18 22:22:44 UTC 2009 root@:/usr/obj/usr/src/sys/WORKSTATION amd64 You are missing the UNAME_x environment variables. so trying to compile mesa-demos produces this It will compile and run with the above environment changes. -- Peter Jeremy pgp0c7Y3gLsg4.pgp Description: PGP signature
Re: Dump Utility cache efficiency analysis
On 2009-Jun-23 15:52:04 -0400, Nirmal Thacker thacker.nir...@gmail.com wrote: I would first like to understand the opinions of anyone who has looked at this problem or think this would be a worthwhile project to start off with. I'm aware of the following references: http://www.mavetju.org/mail/view_message.php?list=freebsd-hackersid=375676 http://www.mavetju.org/mail/view_thread.php?list=freebsd-stableid=1335519thread=yes 1. Installing a stable FreeBSD build 2. Check out a version of the Build suitable for the project Any changes will need to apply to FreeBSD -current, though they may be back-ported once tested. This means that you will need a -current system at some point. 8-current is reasonably stable at this point and would be my suggestion. 3. Pointers to begin studying the current implementation in the code-tree structure (would I expect it to lie in the fs/ directory?). I tried to find it in the FreeBSD cross reference (http://fxr.watson.org/) The code is in src/sbin/dump. It references various system header files in order to understand the UFS on-disk format. Lastly- does this project require the know-how's of device drivers? If so, I would have to work harder. No. Dump is completely userland. -- Peter Jeremy pgpPzj39nQ3RN.pgp Description: PGP signature
Re: Maybe confused about AMD64 / i386 compatibility
On 2009-Jun-13 15:55:29 -0500, Joe Greco jgr...@ns.sol.net wrote: Adding a SIL3112A gives us the SATA. These are known to cause data corruption (check the archives). I wouldn't trust anything that has passed through a SIL chip without independent validation. -- Peter Jeremy pgpMf1MOQkbWz.pgp Description: PGP signature
Re: C99: Suggestions for style(9)
On 2009-Apr-29 12:10:44 -0700, John Gemignani john.gemign...@isilon.com wrote: Are local variables allocated on-the-fly on the stack or does the compiler preallocate the space on entry? This is compiler and optimisation dependent. As a general rule, if a compiler is not performing any optimisation, it is likely to allocate all variables on the stack. An obvious code optimisation is to keep variables in registers instead of storing them on the stack. In the specific case of older gcc versions with optimisation enabled (I'm not sure of the behaviour of gcc 4.x), gcc will immediately spill (allocate on the stack) local variables that aren't eligible to be held in registers (eg too large or the variable address is taken) and allocate remaining variables to virtual registers. During code generation, virtual registers are mapped to physical registers and any remaining virtual registers are allocated on the stack. If I have to delve into a crashdump, having the variables on the big entry allocation has been very helpful in the past. OTOH, not caching variables in registers has a significant adverse impact on performance. -- Peter Jeremy pgpK1xJipjZo7.pgp Description: PGP signature
Re: C99: Suggestions for style(9)
On 2009-Apr-26 09:02:36 +0200, Christoph Mallon christoph.mal...@gmx.de wrote: as some of you may have noticed, several years ago a new millenium started and a decade ago there was a new C standard. Your implication that FreeBSD is therefore a decade behind the times is unfair. Whilst the C99 standard was published a decade ago, compilers implementing that standard are still not ubiquitous. HEAD recently switched to C99 as default (actually gnu99, but that's rather close). Note that gcc 4.2 (the FreeBSD base compiler) states that it is not C99 compliant. Maybe using all of these changes is a bit too big change at once, but I'd like some of them applied to style(9). So, please consider the proposed changes individually and not a as a all-or-nothing package. One area you do not address is code consistency. Currently, the FreeBSD kernel (and, to a lesser extent, userland) are mostly style(9) compliant - ie the entire codebase is mostly written in the same style. Whilst you may not like it (and I don't know that anyone completely agrees with everything in style(9)), it does mean that the code is consistent. Changing style(9) in a way that is not consistent with current code means that either existing code must be brought into line with the new standard - which incurs a one-off cost - or the code base becomes split into old and new style - incurring an ongoing maintenance cost as maintainers switch between styles. Both approaches incur a risk of introducing new bugs. Note that I'm not saying that style(9) can never be changed, I'm just saying that any change _will_ incur a cost and the cost as well as the potential benefits need to be considered. [Reduce declaration scope as far as possible, initialise variables where they are declared, sort by name] Whilst my personal preference is for the style suggested by Christoph (and I generally agree with the points he makes), this is also the change that incurs the biggest stylistic change. This is not a change that can be practically retrofitted to existing code and therefore its implementation would result in a mixture of code styles - increasing the risk of bugs due to confusion as to which style was being used. I am not yet sure whether the benefits outweigh the costs, [Don't parenthesize return values] Removed, because it does not improve maintainability in any way. This change could be made and tested mechanically. But there is no justification for making it - stating that the existing rule is no better than the proposed rule is no reason to change. [No KR declarations] Removed, because there is no need to use them anymore. Whilst this is a change that could be performed mechanically, there are some gotchas relating to type promotion (as you point out). The kernel already contains a mixture of ANSI KR declarations and style(9) recommends using ANSI. IMHO, there is no need to make this change until all KR code is removed from FreeBSD. [ Don't insert an empty line if the function has no local variables.] This change could be made and tested mechanically. IMHO, this change has neglible risk and could be safely implemented. +.Sh LOCAL VARIABLES Last, but definitely not least, I added this paragraph about the use of local variables. This is to clarify, how today's compilers handle unaliased local variables. Every version of gcc that FreeBSD has ever used would do this for primitive types when optimisation was enabled. This approach can become expensive in stack (and this is an issue for kernel code) when using non-primitive types or when optimisation is not enabled (though I'm not sure if this is supported). Assuming that gcc (and icc and clang) behaves as stated in all supported optimisation modes, this change would appear to be quite safe to make. -- Peter Jeremy pgp4M8f6oRjRW.pgp Description: PGP signature
Using bus_dma(9)
I'm currently trying to port some code that uses bus_dma(9) from OpenBSD to FreeBSD and am having some difficulties in following the way bus_dma is intended to be used on FreeBSD (and how it differs from Net/OpenBSD). Other than the man page and existing FreeBSD drivers, I am unable to locate any information on bus_dma care and feeding. Has anyone written any tutorial guide to using bus_dma? The OpenBSD man page provides pseudo-code showing the basic cycle. Unfortunately, FreeBSD doesn't provide any similar pseudo-code and the functionality is distributed somewhat differently amongst the functions (and the drivers I've looked at tend to use a different order of calls). So far, I've hit a number of issues that I'd like some advice on: Firstly, the OpenBSD model only provides a single DMA tag for the device at attach() time, whereas FreeBSD provides the parent's DMA tag at attach time and allows the driver to create multiple tags. Rather than just creating a single tag for a device, many drivers create a device tag which is only used as the parent for additional tags to handle receive, transmit etc. Whilst the need for multiple tags is probably a consequence of moving much of the dmamap information from OpenBSD bus_dmamap_create() into FreeBSD bus_dma_tag_create(), the rationale behind multiple levels of tags is unclear. Is this solely to provide a single point where overall device DMA characteristics limitations can be specified or is there another reason? Secondly, bus_dma_tag_create() supports a BUS_DMA_ALLOCNOW flag that pre-allocates enough resources to handle at least one map load operation on this tag. However it also states [t]his should not be used for tags that only describe buffers that will be allocated with bus_dmamem_alloc() - does this mean that only one of bus_dmamap_load() or bus_dmamap_alloc() should be used on a tag/mapping? Or is the sense backwards (ie don't specify BUS_DMA_ALLOCNOW for tags that are only used as the parent for other tags and never mapped themselves)? Or is there some other explanation. Thirdly, bus_dmamap_load() has a uses a callback function to return the actual mapping details. According to the man page, there is no way to ensure that the callback occurs synchronously - a caller can only request that bus_dmamap_load() fail if resources are not immediately available. Despite this, many drivers pass 0 for flags (allowing an asynchronous invocation of the callback) and then fail (and cleanup) if bus_dmamap_load() returns EINPROGRESS. This appears to open a race condition where the callback and cleanup could occur simultaneously. Mitigating the race condition seems to rely on one of the following two behaviours: a) The system is implicitly single-threaded when bus_dmamap_load() is called (generally as part of the device attach() function). Whilst this is true at boot time, it would not be true for a dynamically loaded module. b) Passing BUS_DMA_ALLOCNOW to bus_dma_tag_create() guarantees that the first bus_dmamap_load() on that tag will be synchronous. Is this true? Whilst it appears to be implied, it's not explicitly stated. Finally, what are the ordering requirements between the alloc, create, load and sync functions? OpenBSD implies that the normal ordering is create, alloc, load, sync whilst several FreeBSD drivers use tag_create, alloc, load and then create. As a side-note, the manpage does not document the behaviour when bus_dmamap_destroy() or bus_dma_tag_destroy() are called whilst a bus_dmamap_load() callback is queued. Is the callback cancelled or do one or both destroy operations fail? -- Peter Jeremy pgprRjJNH0S6R.pgp Description: PGP signature
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
On 2009-Mar-30 18:45:30 -0700, Maxim Sobolev sobo...@freebsd.org wrote: You don't really need to do it on every execve() unconditionally. It could be done on demand in libc, so that only when thread pass certain threshold, the common page optimization code kicks in and does its open/mmap/etc magic. Otherwise, normal syscall is performed. This optimisation is premature. First step is to implement an approach that always maps (or whatever) the data and then gather some information about its overheads in the real world. If they are deemed excessive, only then do we start looking at how to improve things. And IMO, the first step would be to lazily map the page - so it's not mapped by default but mapped the first time any of the information in it is used. that for example gettimeofday() only gets optimized if threads calls it more frequently that 1 call/sec. Whilst this thread started talking about timecounters, once you have a shared page, there is a variety of other information that could be exported - PID being the most obvious. If the page is exported as code rather than data (as has been suggested) then you also have the possibility of exporting CPU-dependent optimised versions of some library functions (ala Solaris). The more stuff you export, the less you gain from supporting an export threshold. On 2009-Mar-30 18:31:06 -0700, Maxim Sobolev sobo...@freebsd.org wrote: It's not that easy, unless you can pin thread to a specific core before reading that page. I.e. imagine the case when your thread reads per-cpu page, get preempted and scheduled to a different core, then executes RDTSC there, still thinking it got TSC reading from the first core. Even if it does re-read from that page again after reading TSC to determine if he has read the correct TSC, still it's possible (though not very likely) that it has been preempted again and scheduled to the first core after reading the TSC. Good point. If you export code, rather than data, then the scheduler can just special-case threads where the return address is inside the magic page (this is a fairly cheap test and only needs to occur once you have decided to re-schedule that thread - so you are already in the expensive part of the scheduler and a few more instructions won't be noticable there). The most obvious approach would be to temporarily pin the thread whilst it's executing inside that page. -- Peter Jeremy pgp1rkYFkKvsZ.pgp Description: PGP signature
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
On 2009-Mar-27 14:19:16 -0400, Alexander Sack pisym...@gmail.com wrote: On Fri, Mar 27, 2009 at 1:31 PM, Poul-Henning Kamp p...@phk.freebsd.dk wrote: In message 49cd0405.1060...@samsco.org, Scott Long writes: I've been talking about this for years. All I need is help with the VM magic to create the page on fork. I also want two pages, one global for gettimeofday (and any other global data we can think of) and one per-process for static data like getpid/getgid. gettimeofday is likely to be a mixture of global and per-core data so possibly a 3rd page containing per-core data is warranted. I'm assuming folks are still in love with the TSC because it still the cheapest as oppose ACPI-fast or HPET to even contemplate this? That is its major advantage. It might be feasible to export all the data necessary to implement the complete CLOCK_*_FAST family. Also I thought at least PHK's comment (Sergey mentioned it) was true regardless of bus, that the TSC is not consistent across multiple packages (and for that matter I suppose cores) due to I *think* its ISA lineage so how does this work again? TSC is nothing to do with ISA. The easiest way to build a counter that runs at CPU clock rate is to put it very close to the CPU/core and have different counters for each CPU/core, without any synchronisation between the different counters. Won't the rate in which you tick up be sporadic over the course of the process scheduled on different cores? (i.e. depending on what core RDTSC happened to land on) RDTSC will wind up on the same core that your thread of execution is running on and this is defined by the scheduler. IE, it's up to the scheduler to ensure that the correct page of global (or per-cpu) data is mapped. -- Peter Jeremy pgpBbi5qSVT3w.pgp Description: PGP signature
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
On 2009-Mar-29 08:35:45 +0800, David Xu davi...@freebsd.org wrote: Julian Elischer wrote: interestingly it is even feasible to have a per-thread page.. it requires that the scheduler change a page table entry tough. I will knock his door at midnight if he added such a heavy weight task in the scheduler, TLB shutdown is horrible, and big code size squeezing out data from CPU cache is not idea model. scheduler should be as simple as just a context switching routine. If the TSC is not consistent between all cores (which is probably the most common situation at present), then using the TSC implies knowing which core you are executing on. From a userland perspective, the easiest way to do this is to have a page of data that varies depending on which core you are executing on. -- Peter Jeremy pgpEvuHXzCTQZ.pgp Description: PGP signature
Re: does Copyright on source files expire ?
On 2009-Mar-25 05:31:52 -0400, David Schultz d...@freebsd.org wrote: In the US, the rule that applies most of the time is that Copyright expires 70 years after the author dies, although there are many special cases where the term differs. And the '70' gets regularly extended following pressure from the big content owners. As a rule of thumb, you can expect (eg) 'Mickey Mouse' to never be released from Copyright. -- Peter Jeremy pgpXzcpGkGSaB.pgp Description: PGP signature
Re: x11 status
On 2009-Feb-25 07:53:08 +0100, Ed Schouten e...@80386.nl wrote: The XFree86 project has been dying ever since almost all the active development moved to the Xorg-project. Xorg has many new features that XFree86 doesn't have, like hardware compositing and improved device detection. And along the way, they've dropped things like integration testing, avoiding regressions and avoiding POLA violations. latest cvs image from Xfree86, and it built FAR easier that xorg, far faster, far simpler to configure ... Why should it matter how easy it is to build a piece of software? You can just run `make -C /usr/ports/x11/xorg install clean' or `pkg_add -r xorg'. Note that Chuck also mentioned faster (the conversion from imake to configure added something like 30% to the time to build X.org for absolutely no benefit - some pieces of X.org now take 4 times as long to configure as to build) and easier to configure. Whilst the ease of building a port doesn't really affect the end user, it does affect the port maintainer - a port that needs lots of tender care and feeding will lead to more rapid maintainer burnout. -- Peter Jeremy pgpd2mAZyFtJx.pgp Description: PGP signature
Re: impossible packet length ...
On 2009-Feb-08 10:45:13 +0200, Danny Braniss da...@cs.huji.ac.il wrote: Feb 6 18:00:13 warhol-00.cs.huji.ac.il kernel: bce0: discard frame w/o leading ethernet header (len 0 pkt len 0) ... Feb 6 19:00:00 warhol-00.cs.huji.ac.il amd[715]: Unknown $ sequence in rhost:=${RHOST};type:=nfsl;fs:=${FS};rfs:=$huldigC0#^ZM-^KoM- abase Feb 6 19:00:00 warhol-00.cs.huji.ac.il kernel: impossible packet length (2068989523) from nfs server sunfire:/dist which seems to point fingers at bce... It does rather suggest that bce is not behaving. What happens if you turn off checksum off-loading? This should make the kernel drop the corrupt packets instead of trying to process them. If practical, you could also try (temporarily) plugging in a different NIC. -- Peter Jeremy pgpUhFmGYTwWV.pgp Description: PGP signature
Re: critical floating point incompatibility
On 2009-Jan-28 11:24:21 -0800, Bakul Shah ba...@bitblocks.com wrote: On a mac, cc -m64 builds 64 bit binaries and cc -m32 builds 32 bit binaries. The following script makes it as easy to do so on a 64 bit FreeBSD -- at least on the few programs I tried. Ideally the right magic needs to be folded in gcc's builtin specs. #!/bin/sh args=/usr/bin/cc while [ .$1 != . ] do a=$1; shift case $a in -m32) args=$args -B/usr/lib32 -I/usr/include32 -m32;; *) args=$args $a;; esac done $args You also need to manually populate /usr/include32 since it doesn't exist by default and may still get bitten by stuff in /usr/local/include. Do you have a script (or installworld patches) to do this? amd64/112215 includes a first attempt at updating the gcc specs (though it's missing the include handling), as well as some of the remaining problems. Ideally x86_64 platforms run *all* i386 programs (that don't depend on a 32 bit kernel). Agreed. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpEBGdlUAosU.pgp Description: PGP signature
Re: 3x read to write ratio on dump/restore
On 2009-Jan-12 05:05:37 -0500, Yoshihiro Ota o...@j.email.ne.jp wrote: Jermey, I tought you wrote this, http://lists.freebsd.org/pipermail/freebsd-hackers/2007-February/019666.html. Yes, that is my message. I had forgotten it. If you dig back further, you'll find that I looked into the poor read behaviour of dump before the caching code existed (and one of the outcomes of that thread was the current caching code). -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpnsWptLOPL6.pgp Description: PGP signature
Re: kernel panic
On 2009-Jan-09 00:05:47 -0800, Kamlesh Patel shilp.ka...@yahoo.com wrote: How do i recover the system from this error. I can't reload the loader.old If you press any key during the first spinner, you should get a prompt similar to the following: FreeBSD/i386 BOOT Default: 0:ad(0,a)/boot/loader boot: You can then enter the name of the program you wish to run - eg /boot/loader.old (or directly load /boot/kernel/kernel) See the following for a more complete description: http://www.freebsd.org/cgi/man.cgi?query=bootapropos=0sektion=0manpath=FreeBSD+7.1-RELEASEformat=html -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgplWQBOkyzAN.pgp Description: PGP signature
Re: How to access kernel memory from user space
On 2008-Dec-22 18:05:34 -0600, Gerry Weaver ger...@compvia.com wrote: I am working on a driver that collects various network statistics via pfil. I have a simple array of structures that I use to store the statistics. I also have a user space process that needs to collect these statistics every second or so. The easiest (and hackiest) approach would be to kldsym(2) to locate the symbol in KVM and then mmap(2) the relevant part of /dev/kmem. The biggest downside is that the userland process needs to be group kmem. The other approach would be for your kernel driver to grow a character device node and directly support mmap. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpRmxXhQAlnw.pgp Description: PGP signature
Re: [Testers wanted] /dev/console cleanups
On 2008-Nov-19 02:47:31 -0800, Jeremy Chadwick [EMAIL PROTECTED] wrote: There's a known issue with the kernel message buffer though: it's not NULL'd out upon reboot. This is deliberate. If the system panics, stuff that was in the message buffer (and might not be on disk) can be read when the system reboots. If there is no crashdump, this might be the only record of what happened. Meaning, in some cases (depends on the BIOS or system), the kernel message buffer from single-user mode is retained even after a reboot! A user can then do dmesg and see all the nifty stuff you've done during single-user, which could include unencrypted passwords if mergemaster was tinkering with passwd/master.passwd, etc.. There shouldn't be unencrypted passwords, though there might be encrypted passwords visible. Rink Springer created a patch where the kernel message buffer will start with NULL to keep this from happening, but it needs to be made into a loader.conf tunable. I hope that never gets committed - it will make debugging kernel problems much harder. There is already a kern.msgbuf_clear sysctl and maybe people who are concerned about msgbuf leakage need to learn to use it. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgp13Q2HNhDcL.pgp Description: PGP signature
Re: Asynchronous pipe I/O
On 2008-Nov-05 17:40:11 +0400, rihad [EMAIL PROTECTED] wrote: Imagine this shell pipeline: sh prog1 | sh prog2 As given above, prog1 blocks if prog2 hasn't yet read previously written data (actually, newline separated commands) or is busy. What I want is for prog1 to never block: sh prog1 | buffer | sh prog2 There's also misc/mbuffer which is supposed to be an enhancement of misc/buffer - though I haven't used either. I have a program I wrote to do this but it's not in a releasable state. Wouldn't such an intermediary tool be a great way to boost performance for certain types of solutions? I've found that for dump|restore or dump|gzip, I can get quite significant speedups by adding a buffer that is several hundred MB in the middle. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpeKTc03ONqf.pgp Description: PGP signature
Re: memtest86+ can not link: binutils issue?
On 2008-Oct-30 18:08:35 +0200, Andriy Gapon [EMAIL PROTECTED] wrote: 1. obtain and extract http://www.memtest.org/download/2.01/memtest86+-2.01.bin.gz This is a compressed bootable image and can't be compiled. Possibly you mean http://www.memtest.org/download/2.01/memtest86+-2.01.tar.gz 2. run gmake: $ gmake gcc -E -traditional head.S -o head.s as -32 -o head.o head.s gcc -c -Wall -march=i486 -m32 -Os -fomit-frame-pointer -fno-builtin -ffreestanding -fPIC -fno-strict-aliasing reloc.c gcc -Wall -march=i486 -m32 -Os -fomit-frame-pointer -fno-builtin -ffreestanding -fPIC -c -o main.o main.c gcc -c -Wall -march=i486 -m32 -Os -fomit-frame-pointer -fno-builtin -ffreestanding test.c Blows up at this point for me: gcc -c -Wall -march=i486 -m32 -Os -fomit-frame-pointer -fno-builtin -ffreestanding test.c test.c:14:20: error: sys/io.h: No such file or directory test.c: In function 'beep': test.c:1410: warning: implicit declaration of function 'outb_p' test.c:1410: warning: implicit declaration of function 'inb_p' test.c:1417: warning: implicit declaration of function 'outb' gmake: *** [test.o] Error 1 I can't find sys/io.h in CVS or any declarations for outb_p or inb_p in my source tree. ld --warn-constructors --warn-common -static -T memtest_shared.lds \ -o memtest_shared head.o reloc.o main.o test.o init.o lib.o patn.o screen_buffer.o config.o linuxbios.o memsize.o pci.o controller.o random.o extra.o spd.o error.o dmi.o \ ld -shared -Bsymbolic -T memtest_shared.lds -o memtest_shared head.o reloc.o main.o test.o init.o lib.o patn.o screen_buffer.o config.o linuxbios.o memsize.o pci.o controller.o random.o extra.o spd.o error.o dmi.o head.o(.text+0x7): In function `startup_32': : undefined reference to `_GLOBAL_OFFSET_TABLE_' Segmentation fault (core dumped) gmake: *** [memtest_shared] Error 139 I can't help here. _GLOBAL_OFFSET_TABLE_ is related to the binutils PIC support and it appears that the linker doesn't like the code (in head.S) is explicitly referencing it. Not only linking fails, but ld even crashes. I agree this shouldn't happen. Can anybody suggest anything about this problem? It looks like stand-alone PIC code on FreeBSD needs some different incantations to Linux. My understanding is that several of the i386 bootstraps are relocatable so you might like to peruse the code in /usr/src/sys/boot/i386 for ideas. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpmHMDH1fDJe.pgp Description: PGP signature
Re: Why does adding /usr/lib32 to LD_LIBRARY_PATH break 64-bit binaries?
On 2008-Oct-24 10:43:04 +0200, Wojciech Puchar [EMAIL PROTECTED] wrote: 6.1-RELEASE-amd64 machine. If I add /usr/lib32 to my LD_LIBRARY_PATH it breaks all of my binaries on my 64-bit machine. what do you expect else? Well, the rtld should be smart enough to recognize 32-bit .so's and skip them when binding a 64-bit executable. Whilst having /usr/lib32 in LD_LIBRARY_PATH doesn't make sense from a solely FreeBSD perspective, I have done similar things when writing cross-platform scripts (to avoid having to use platform-dependent code). this will make system trying to bind 32-bit libs to 64-bit program. it can't work rtld shouldn't attempt to bind 32-bit libs to 64-bit programs. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpyzRP0cCICr.pgp Description: PGP signature
Re: Rare problems in upgrade process (corrupted FS?)
On 2008-Sep-26 12:22:55 +0200, Jordi Espasa Clofent [EMAIL PROTECTED] wrote: 1) I do the sync process with csup(1); next I go into /usr/src/sys/amd64/conf to edit the GENERIC file (I use a custimized kernels) and this file doesn't exists. You might like to check your CVSup site against http://www.mavetju.org/unix/freebsd-mirrors/ to confirm it is updating correctly. GENERIC should exist. * I reboot the machine (because of I suspect a very weird FS problem), boot in single user mode and do a 'fsck -fy'. Effectively, the fsck(8) found and repair several errors. Epecially, one error claims my attention: SUPERBLOCK. It might have been useful if you had kept a record of the exact messages. If you repeat the fsck, does it now report any problems? If you are using an up-to-date CVSup mirror, my next suggestion would be hardware problems. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpOJlfwlwM25.pgp Description: PGP signature
Re: the future of sun4v
[cc list trimmed] On 2008-Sep-08 01:14:39 -0700, Darren Reed [EMAIL PROTECTED] wrote: The critical issue for freebsd (and any operating system for that matter) on rock is how well does the kernel scale to a system with that many concurrent threads? Right now it doesn't. And based on some previous threads, there is a lot of redesign to do before it can. But stability needs to come before scalability. There's no point in FreeBSD pretending it can support some arbitrary number of CPUs when it panics due to races in one of the CPU subsystems - which is my understanding of the current sun4v state. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpzdYg5W07ZM.pgp Description: PGP signature
Re: sun4v arch
On 2008-Aug-23 21:39:34 -0700, Kip Macy [EMAIL PROTECTED] wrote: There really isn't any magic to bringing up a port. You compile it, install it, and then run it until it breaks. Once it breaks you spend a lot of time instrumenting the code to track down what went wrong. About what I expected. I've just bumped into your bsdtalk interview: http://cisx1.uma.maine.edu/~wbackman/bsdtalk/bsdtalk086.mp3 This appears to give a useful overview into the sun4v port. One thing you mention is that you'd started work on a virtual network driver. How far did this get and can you point me to the code, It seems that the latest OpenBSD runs on sun4v. I haven't investigated how well supported it is. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgp03P8d2mlpG.pgp Description: PGP signature
Re: sun4v arch
On 2008-Aug-23 22:40:55 -0500, Mark Linimon [EMAIL PROTECTED] wrote: My understanding is the the port is in a pre-alpha state due to unfinished work in the kernel, so expecting there to be any userbase is premature. Except that the wiki gives a far more optimistic picture. All of our 'new' architectures which are in this state have so few non- developer users that there is hardly any reason to submit PRs. AFAICT the active developers already know what's missing :-) That makes it very difficult for someone outside that group to come up to speed. I can't find anything in the freebsd-sun4v archvies. I was hoping that there would be a list somewhere of what state various subsystems were in and what remained to be done. wiki.freebsd.org sounds like the ideal place for this. On 2008-Aug-23 20:39:29 -0700, Garrett Cooper [EMAIL PROTECTED] wrote: Maybe some time should be spent looking at stuff from NetBSD to see whether or not they've solved some already critical porting pieces that FreeBSD lacks in this architecture? I can't find anything that suggests NetBSD runs on sun4v. Their sparc64 port only covers the US-I/II families and there's no mention of sun4v. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgp5YmG2dn3qK.pgp Description: PGP signature
Re: the future of sun4v
[Replies re-directed to freebsd-sun4v] On 2008-Aug-21 14:42:55 -0700, Kip Macy [EMAIL PROTECTED] wrote: I believe that there is a general expectation by freebsd users and developers that unsupported code should not be in CVS. Although sun4v is a very interesting platform for developers doing SMP work, I simply do not have the time or energy to maintain it. If someone else would like to step up and try his hand I would be supportive of his efforts. In the likely event that no one steps forward by the time that 7.1 is released I will ask that it be moved to the Attic. Since there are no other current SPARC CPUs that FreeBSD can run on (the US-II has been obsolete for about 6 years and FreeBSD won't run on any more recent sun4u chips), that will also remove the justification for maintaining a SPARC64 port. I don't have the knowledge or available time to maintain the sun4v port by myself but would be happy to be part of a team doing so. One impediment I have is that I don't have a T-1 or T-2 system that I can dedicate to FreeBSD. I could work on FreeBSD in a guest domain - but since FreeBSD doesn't support either the virtual disk or virtual network, actually getting FreeBSD running there presents somewhat of a challenge. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpiVxJMkEg51.pgp Description: PGP signature
Re: sun4v arch
On 2008-Aug-22 17:04:00 +0200, Kris Kennaway [EMAIL PROTECTED] wrote: Just so everyone is on the same page, what is needed to keep sun4v viable are people with experience with (or intention to learn about) low level architectural and implementation details of the FreeBSD kernel What documentation is currently accurate for this beyond the source code? The only things I can quickly find are: Design Implementation of FreeBSD 5.2 and FreeBSD Architecture Handbook. The former is getting quite old and I'm not sure how up-to-date the latter is kept. the sun4v hardware platform, Is the documentation at http://www.opensparc.net/opensparc-t1/index.html and http://www.opensparc.net/opensparc-t2/index.html adequate for this or is there additional information that is needed? Is there any tutorial style documentation on the low-level T1/T2 details? who know their way around things like pmap.c and other MD places where the kernel interfaces with the bare metal, I've poked around the low-level details of FreeBSD/i386 and /Alpha in the past, though I'm nothing like an expert at it. sun4v/sun4v is only about twice the size of a 6th Edition kernel... and who are willing to make a long term (multi-year) commitment to supporting the platform. Yes. Is there a summary of the open issues somewhere? There are no sun4v PRs open. http://wiki.freebsd.org/FreeBSD/sun4v effectively hasn't been touched since November 2006 and suggests that the only critical issue is lack of serial port support. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpQTwZ8BTsYV.pgp Description: PGP signature
Re: read with timeout ??
On 2008-Aug-08 16:30:49 -0400, Chuck Robey [EMAIL PROTECTED] wrote: such a email, I have to admit my own needs fall more into a plain-jame serial line, nothing a socket-oriented thing could help me with. If this is a normal serial port then termios(4) might help: Use non-canonical processing with VMIN = 0 and VTIME = 10 (see the section Noncanonical Mode Input Processing). -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpuhtxg3EmOZ.pgp Description: PGP signature
Re: Idea for FreeBSD
On 2008-Aug-06 19:14:51 -0400, [EMAIL PROTECTED] wrote: In Solaris 10 the Services Management Facility (SMF) was introduced. The main purpose of SMF appears to be to drum up business for Sun's training courses by radically changing Sol10 Administration for little benefit. Basically what it does, is take all the rc.d scripts and puts them into a database to manage. Everything is converted to XML and two basic commands (svcs and svcadm) are used to manage everything. So you take each line from inetd.conf (literally) and wrap it in several KB of XML. This definitely adds to bloat and doesn't even obey the spirit of XML (since the content of each inetd.conf entry remains opaque). I haven't looked at what happens to /etc/inittab or the rc.d scripts but I expect that it's similar. It's not clear what benefit this brings. The svcs and svcadm commands are among the most arcane I have bumped into during my 20 years of administering Unix. I agree that some of the process management facilities of SMF are better than exists for most FreeBSD daemons but don't believe that all the other baggage is worth the improvement. With FreeBSD, I can configure virtually all the system via a single text file - which is easily found and kepy under configuration control. With Sol10, there are random bits of configuration spread all over the system and there is no obvious way to control configuration. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpXrc0UnsFtr.pgp Description: PGP signature
Re: strange issue reading /dev/null
On 2008-Aug-07 12:19:20 -0500, Sean C. Farley [EMAIL PROTECTED] wrote: Grr! Optimization should not be a requirement for checking for uninitialized variables. Yes, gcc adds fun to development. This is documented: `-Wuninitialized' Warn if an automatic variable is used without first being initialized or if a variable may be clobbered by a `setjmp' call. These warnings are possible only in optimizing compilation, because they require data flow information that is computed only when optimizing. If you do not specify `-O', you will not get these warnings. Instead, GCC will issue a warning about `-Wuninitialized' requiring `-O'. That explanation makes sense. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpbGTh3Gd3ll.pgp Description: PGP signature
Re: Laptop suggestions?
On 2008-Jul-27 17:23:46 -0400, Zaphod Beeblebrox [EMAIL PROTECTED] wrote: we'd need a method of remembering what file handles were connected to so that they could be reopened (in this, I envision some type of text string... maybe a URI/URL). As a bonus, this would give us process migration between systems, too (assuming the URI were portable between self same systems --- which isn't horribly hard with nfs mounts and whatnot). What you are describing here sounds more like the process checkpointing functionality that Softway (I think it was) developed sometime last century. There should be a paper on it in an AUUG Conference Proceedings somewhere. Process checkpointing is somewhat different to suspend/resume: With suspend/resume, you are saving the entire system state - which is basically a matter of dumping physical RAM to disk and being able to restore it later. You don't need to be able to isolate individual processes and there's no need to 'reopen' file handles because they will automatically re-instantiate when you restore the kernel state that included them being open. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpyhwTa6iaix.pgp Description: PGP signature
Re: Laptop suggestions?
On 2008-Jul-25 13:36:37 +0300, Aggelidis Nikos [EMAIL PROTECTED] wrote: From my perspective freebsd should advertise(*) the laptops that work with it, out of the box, so that new users {like me} know what to buy; Who do you suggest is going to do this? Buying one of every type of laptop, installing (or working out how to install) FreeBSD and checking which bits of the laptop do/don't work with which version of FreeBSD is an expensive exercise in both time and effort. The best that exists at present is http://laptop.bsdgroup.de/freebsd/ - but it suffers from the problem that by the time someone has bought a laptop and checked it out, that model is obsolete. In my case, I bought my laptop because I knew someone who had the identical model. and large corporations have a benefit for promoting OS compatibility other than Windows(tm). The vendors don't seem interested in doing this - I suspect that they are pressured not to support anything other than Winbloze (you might notice that two very high profile Linux-only laptops have recently grown Winbloze variants). Successive generations of laptops have become less and less free-OS-friendly. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpCI8GuCUbIO.pgp Description: PGP signature
Re: profiling broken on RELENG_7/i386
On 2008-Jul-04 13:01:11 +0400, Dmitry Morozovsky [EMAIL PROTECTED] wrote: It seems we step on a bug in gcc in RELENG_7/i386 It is triggered at least by profiling program which uses getopt(3): I think it's actually in the profiling initialisation code. If you try to run sample code under gdb, you can see that .mcount() is not preserving %ecx, though main() assumes it does. (gdb) disas $eip Dump of assembler code for function main: 0x080481d0 main+0:lea0x4(%esp),%ecx 0x080481d4 main+4:and$0xfff0,%esp 0x080481d7 main+7:pushl 0xfffc(%ecx) 0x080481da main+10: push %ebp 0x080481db main+11: mov%esp,%ebp 0x080481dd main+13: push %ecx 0x080481de main+14: sub$0x14,%esp 0x080481e1 main+17: call 0x8051b50 .mcount 0x080481e6 main+22: mov0x4(%ecx),%eax 0x080481e9 main+25: mov(%eax),%eax 0x080481eb main+27: mov%eax,0x8(%esp) 0x080481ef main+31: mov(%ecx),%eax 0x080481f1 main+33: mov%eax,0x4(%esp) 0x080481f5 main+37: movl $0x8066b0a,(%esp) 0x080481fc main+44: call 0x8051b00 printf 0x08048201 main+49: mov$0x0,%eax 0x08048206 main+54: add$0x14,%esp 0x08048209 main+57: pop%ecx 0x0804820a main+58: pop%ebp 0x0804820b main+59: lea0xfffc(%ecx),%esp 0x0804820e main+62: ret End of assembler dump. (gdb) x/10x $esp 0xbfbfeadc: 0x0804815f 0x0001 0xbfbfeb08 0xbfbfeb10 0xbfbfeaec: 0x 0x 0x 0x 0xbfbfeafc: 0x 0x (gdb) info regi eax0xbfbfeb08 -1077941496 ecx0x1e968 125288 edx0x8051d1a134552858 ebx0x1 1 esp0xbfbfeadc 0xbfbfeadc ebp0xbfbfeb00 0xbfbfeb00 esi0xbfbfeb10 -1077941488 edi0x0 0 eip0x80481d00x80481d0 eflags 0x282642 cs 0x33 51 ss 0x3b 59 ds 0x3b 59 es 0x3b 59 fs 0x3b 59 gs 0x1b 27 ... [step through .mcount] ... (gdb) stepi main (argc=Error accessing memory address 0x1b: Bad address. ) at x.c:4 4 printf(Hello %d %s\n, argc, argv[0]); (gdb) info regi eax0x1 1 ecx0x1b 27 edx0x804815f134512991 ebx0x1 1 esp0xbfbfeab0 0xbfbfeab0 ebp0xbfbfeac8 0xbfbfeac8 esi0xbfbfeb10 -1077941488 edi0x0 0 eip0x80481e60x80481e6 eflags 0x246582 cs 0x33 51 ss 0x3b 59 ds 0x3b 59 es 0x3b 59 fs 0x3b 59 gs 0x1b 27 -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpvlUdyjzYFW.pgp Description: PGP signature
Re: how can i get file name knowing its descriptor?
On 2008-Jul-03 14:08:12 +0300, Uladzislau Rezki [EMAIL PROTECTED] wrote: I've been writing a small kernel module, that provides information about modification of the filesystem to user_land/userspace through the character device. I'm using FreeBSD 4.10 4.10 has not been supported for several years now. I strongly recommend you look at upgrading to at least 6.3. So, my question is: Is there any way to get file name knowing its descriptor? The simple answer is no. That said, you could try having a look at how lsof works (whilst it runs in userland, it needs to grovel around in the kernel datastructures much the same as your module would need to. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpvMoGfEnFHe.pgp Description: PGP signature
Re: Sysinstall is still inadequate after all of these years / sorry I started flame war
On 2008-Jul-03 23:04:10 -0700, Rob Lytle [EMAIL PROTECTED] wrote: FreeBSD partition, and install OpenBSD which has impeccable documentation. Having tried to make sense of the OpenBSD carp documentation, I can only assume that is meant as a joke. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpFAY1QqD20G.pgp Description: PGP signature
TCP not being proactive about recoving lost packets
I am trying to ftp mysql-5.1.25-rc.tar.gz from ftp.easynet.be and noticed that progress appeared to have ceased and the ETA increasing. Looking at a tcpdump of the FTP data socket showed: 10:31:17.273106 IP minx.ftp.be.easynet.net.57796 myhost.mydomain.xxx.yyy.56432: . 4054413516:4054414976(1460) ack 635248902 win 92 10:31:17.372968 IP myhost.mydomain.xxx.yyy.56432 minx.ftp.be.easynet.net.57796: . ack 1460 win 28692 10:31:17.709750 IP minx.ftp.be.easynet.net.57796 myhost.mydomain.xxx.yyy.56432: . 118260:119720(1460) ack 1 win 92 10:31:17.709807 IP myhost.mydomain.xxx.yyy.56432 minx.ftp.be.easynet.net.57796: . ack 1460 win 28692 10:31:17.713318 IP minx.ftp.be.easynet.net.57796 myhost.mydomain.xxx.yyy.56432: . 119720:121180(1460) ack 1 win 92 10:31:17.713368 IP myhost.mydomain.xxx.yyy.56432 minx.ftp.be.easynet.net.57796: . ack 1460 win 28692 10:33:17.717063 IP minx.ftp.be.easynet.net.57796 myhost.mydomain.xxx.yyy.56432: . 1460:2920(1460) ack 1 win 92 10:33:17.816684 IP myhost.mydomain.xxx.yyy.56432 minx.ftp.be.easynet.net.57796: . ack 2920 win 28692 10:33:18.126643 IP minx.ftp.be.easynet.net.57796 myhost.mydomain.xxx.yyy.56432: . 121180:122640(1460) ack 1 win 92 10:33:18.12 IP myhost.mydomain.xxx.yyy.56432 minx.ftp.be.easynet.net.57796: . ack 2920 win 28692 10:33:18.128224 IP minx.ftp.be.easynet.net.57796 myhost.mydomain.xxx.yyy.56432: . 122640:124100(1460) ack 1 win 92 10:33:18.128239 IP myhost.mydomain.xxx.yyy.56432 minx.ftp.be.easynet.net.57796: . ack 2920 win 28692 10:35:18.130354 IP minx.ftp.be.easynet.net.57796 myhost.mydomain.xxx.yyy.56432: . 2920:4380(1460) ack 1 win 92 10:35:18.229382 IP myhost.mydomain.xxx.yyy.56432 minx.ftp.be.easynet.net.57796: . ack 4380 win 28692 10:35:18.549832 IP minx.ftp.be.easynet.net.57796 myhost.mydomain.xxx.yyy.56432: . 124100:125560(1460) ack 1 win 92 10:35:18.549855 IP myhost.mydomain.xxx.yyy.56432 minx.ftp.be.easynet.net.57796: . ack 4380 win 28692 10:35:18.552361 IP minx.ftp.be.easynet.net.57796 myhost.mydomain.xxx.yyy.56432: . 125560:127020(1460) ack 1 win 92 10:35:18.552376 IP myhost.mydomain.xxx.yyy.56432 minx.ftp.be.easynet.net.57796: . ack 4380 win 28692 The FTP server resends an old packet then 2 new packets. FreeBSD ACKs each packet with the next packet it wants. Then there's a 2 minute timeout before the FTP server responds. This ahs been going on for about 45 minutes now. The client is running 7-STABLE from mid-May. Shouldn't it continue to regularly send ACKs where it knows there is outstanding data? -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpHaQial8pE6.pgp Description: PGP signature
Re: timestamping for kernel messages (like Solaris and Linux)
On 2008-Jun-08 10:24:53 +0300, Niki Denev [EMAIL PROTECTED] wrote: Has anyone thought about implementing an option to prepend all kernel console messages with timestamps, like Linux and Solaris do? The only time I've seen Solaris do this is when the console message is syslog'd - which FreeBSD also does. Is it just a matter of hacking up the kernel printf() implementation? Pretty much. Any possible caveats? The kernel works in UTC only and has only a very restricted ability to translate between epoch seconds and a human-readable date/time (it's currently only used to talk to the RTC). -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpvxVbhYKCp4.pgp Description: PGP signature
Re: ntpd jail problem
On 2008-Jun-08 11:32:54 +0100, [EMAIL PROTECTED] wrote: I'm running an openntpd instance on the host machine, which syncs the clock from the pool at pool.ntp.org. From the log output, ntpd claims to be synced and the time does seem to be correct. I'm then running another openntpd in a jail which doesn't set the time, just serves it to clients. I've never used openntpd but for the base ntpd, you should be able to just use 'server 127.127.1.0' to make it trust (and not alter) the base system time. Note that this openntpd will not have access to the stratum information from the main ntpd but will have a fixed value and may need to be adjusted using a 'fudge' command (or equivalent). I'd be interested in knowing why you chose this approach rather than just syncing clients to the [open]ntpd instance in the host machine. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgpxEy6X1tjEd.pgp Description: PGP signature
Re: virtio drivers for freebsd
On 2008-Jun-06 14:26:24 +0200, Sylvain Desbureaux [EMAIL PROTECTED] wrote: I'm currently using linux KVM as an hypervisor and I would like to use an old freebsd4 machine as a guest. I agree that virtualising FreeBSD is useful but I suspect your chances of getting guest patches for FreeBSD 4.x are very low. FreeBSD 4.x is no longer supported by the FreeBSD project and there are significant changes in the kernel architecture between 4.x and later versions. For new features to be added to FreeBSD, they must be first added to the latest version and then back-ported to older versions - the backport to 4.x would probably require significant effort. KVM has now special drivers, based on the virtio specifications. These drivers are compiled for windows and linux but unfortunately not for BSD. I think that they are pretty simple in the guest side (75kb of code for linux for example, for network and block device) but I'm not very good in kernel driver compilation. Note that FreeBSD does not have block-mode devices. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgplWbmSPFofe.pgp Description: PGP signature