Re: devstat overhead VS precision
On Sat, Apr 13, 2013 at 12:59:49PM +0300, Alexander Motin wrote: Hi. It is long known that collecting disk and GEOM statistics may cause significant processing overhead under high IOPS. On my recent high-IOPS benchmarks performance difference was reaching three times! Last time situation improved a lot by more active use of TSC, but there are still many systems where TSCs are not synchronized. I propose to switch that statistics from using binuptime() to getbinuptime() to solve the problem globally. From one side getbinuptime() resolution is limited by 1ms, but since time is usually averaged over the many I/Os, additional sub-millisecond precision will come from sampling. Since most of tools now show request processing times up to 0.1ms, that precision should be sufficient. I believe real disk performance is more important that n-th digit in some statistics. The following patch does the change and makes disk performance irrelevant to the timecounter performance: http://people.freebsd.org/~mav/devstat_time.patch Are there any objections against it? No objections here, but I wonder if you were able to compare the results somehow before and after the change so we have some hard numbers to show that we don't lose much by applying the change. On a mostly unrelated note when two threads (T0 and T1) call get*time() on two different cores, but T0 does that a bit earlier is it possible that T0 can get later time than T1? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpClJvKZmY1R.pgp Description: PGP signature
Re: devstat overhead VS precision
On Mon, Apr 15, 2013 at 10:18:15PM +0300, Konstantin Belousov wrote: On Mon, Apr 15, 2013 at 08:42:03PM +0200, Pawel Jakub Dawidek wrote: On a mostly unrelated note when two threads (T0 and T1) call get*time() on two different cores, but T0 does that a bit earlier is it possible that T0 can get later time than T1? Define earlier first. If you have taken sufficient measures to prevent preemption and interruption, e.g. by entering spinlock before the fragment that calls get*, then no, it is impossible, at least not with any x86 timekeeping hardware we use. On the other hand, if interrupts are allowed, all bets are off. So if we consider only one thread, it is not possible for it to obtain time t0, be scheduled to different CPU and obtain t1 where t1 t0? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://mobter.com pgpRmT5bNwABq.pgp Description: PGP signature
Re: kmem_map auto-sizing and size dependencies
On Fri, Jan 18, 2013 at 08:26:04AM -0800, m...@freebsd.org wrote: Should it be set to a larger initial value based on min(physical,KVM) space available? It needs to be smaller than the physical space, [...] Or larger, as the address space can get fragmented and you might not be able to allocate memory even if you have physical pages available. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpLKog0RIkYD.pgp Description: PGP signature
Re: [RFQ] make witness panic an option
On Thu, Nov 15, 2012 at 04:39:55PM +, Attilio Rao wrote: On 11/15/12, Adrian Chadd adr...@freebsd.org wrote: On 15 November 2012 05:27, Giovanni Trematerra giovanni.tremate...@gmail.com wrote: I really do think that is a very bad idea. When a locking assertion fails you have just to stop your mind and think what's wrong, no way to postpone on this. Not all witness panics are actually fatal. For a developer who is sufficiently cluey in their area, they are quite likely able to just stare at the code paths for a while to figure out why the incorrectness occured. The problem is that such mechanism can be abused, just like the BLESSING one and that's why this is disabled by default. WITNESS is a development tool. We don't ship production kernels with WITNESS even compiled in. What is more efficient use of developer time: going through full reboot cycle every time or reading the warning from console, unloading a module, fixing the bug and loading it again? And if this option is turned off by default what is the problem? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpqNuuWS6QFO.pgp Description: PGP signature
Re: [RFQ] make witness panic an option
On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote: On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek p...@freebsd.org wrote: WITNESS is a development tool. We don't ship production kernels with WITNESS even compiled in. What is more efficient use of developer time: going through full reboot cycle every time or reading the warning from console, unloading a module, fixing the bug and loading it again? And if this option is turned off by default what is the problem? Yes, so, why do you write here? I'm trying to understand why do you object. Until now the only concern you have that I found is that you are afraid of it being abused. I don't see how this can be abused if it is turned off by default. If someone will commit a change that will turn it on by default, believe me, I'll unleash hell personally. As I said, WITNESS is development tool, a very handy one. This doesn't mean we can't make it even more handy. It is there to help find bugs faster, right? Adrian is proposing a change that will make it help to find and fix bugs maybe even faster. Go ahead and fix BLESSED, make it the default, etc. This is another story, but BLESSED is much less controversial to me. It is turned off by default in assumption that all the code that runs in our kernel is developed for FreeBSD, which is not true. For example ZFS is, I think, the biggest locking consumer in our kernel (around 120 locks), which wasn't originally developed for FreeBSD and locking order was verified using different tools. Now on FreeBSD it triggers massive LOR warnings from WITNESS, eventhough those are not bugs. At some point I verified many of them and they were all false-positives, so I simply turned off WITNESS warnings for ZFS locks. Why? Because BLESSED is turned off in fear of abuse, and this is turn is the cause of mentioned hack in ZFS. I have enough of your (not referred to you particulary but to the people which contributed to this and other thread) to not be able to respect others opinion. As I said I cannot forbid you guys from doing anything, just go ahead, write the code and commit it, albeit completely bypassing other people's opinion. I'm sorry, I wasn't aware that your opinions are set in stone. I hoped that with some new arguments you may want to reconsider:) -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpnNWcx8GwNH.pgp Description: PGP signature
Re: [RFQ] make witness panic an option
On Sun, Nov 25, 2012 at 01:37:19PM +, Attilio Rao wrote: On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek p...@freebsd.org wrote: On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote: On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek p...@freebsd.org wrote: WITNESS is a development tool. We don't ship production kernels with WITNESS even compiled in. What is more efficient use of developer time: going through full reboot cycle every time or reading the warning from console, unloading a module, fixing the bug and loading it again? And if this option is turned off by default what is the problem? Yes, so, why do you write here? I'm trying to understand why do you object. Until now the only concern you have that I found is that you are afraid of it being abused. I don't see how this can be abused if it is turned off by default. If someone will commit a change that will turn it on by default, believe me, I'll unleash hell personally. So I don't understand what are you proposing. You are not proposing to switch BLESSING on and you are not proposing to import Adrian's patches in, if I get it correctly. I don't understand then. I propose to get Adrian's patches in, just leave current behaviour as the default. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgprDLu11pa5N.pgp Description: PGP signature
Re: [RFQ] make witness panic an option
On Sun, Nov 25, 2012 at 01:48:23PM +, Attilio Rao wrote: On Sun, Nov 25, 2012 at 1:47 PM, Pawel Jakub Dawidek p...@freebsd.org wrote: On Sun, Nov 25, 2012 at 01:37:19PM +, Attilio Rao wrote: On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek p...@freebsd.org wrote: On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote: On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek p...@freebsd.org wrote: WITNESS is a development tool. We don't ship production kernels with WITNESS even compiled in. What is more efficient use of developer time: going through full reboot cycle every time or reading the warning from console, unloading a module, fixing the bug and loading it again? And if this option is turned off by default what is the problem? Yes, so, why do you write here? I'm trying to understand why do you object. Until now the only concern you have that I found is that you are afraid of it being abused. I don't see how this can be abused if it is turned off by default. If someone will commit a change that will turn it on by default, believe me, I'll unleash hell personally. So I don't understand what are you proposing. You are not proposing to switch BLESSING on and you are not proposing to import Adrian's patches in, if I get it correctly. I don't understand then. I propose to get Adrian's patches in, just leave current behaviour as the default. So if I tell that I'm afraid this mechanism will be abused (and believe me, I really wanted to trimm out BLESSING stuff also for the same reason) and you say you can't see how there is not much we can discuss. This is not what I said. I would see it as abuse if someone will suddenly decided to turn off locking assertions by default in FreeBSD base. If he will turn that off on his private machine be it to speed up his development (a good thing) or to shut up important lock assertion (a bad thing) this is entirely his decision. He can already do that having all the source code, its just more complex. Make tools, not policies. BLESSING is totally different subject. You were afraid that people will start to silence LORs they don't understand by committing blessed pairs to FreeBSD base. And this situation is abuse and I fully agree, but I also still think BLESSING is useful, although I recognize it might be hard to prevent mentioned abuse. In case of Adrian's patch nothing will change in how we enforce locking assertions in FreeBSD base. You know how I think, there is no need to wait for me to reconsider, because I don't believe this will happen with arguments like I don't think, I don't agree, etc. I provide valid arguments with I hope proper explanation, you choose not to address them or ignore them and I hope this will change:) -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpaSF1ixWska.pgp Description: PGP signature
Re: Training wheels for commandline (was Re: Pull in upstream before 9.1 code freeze?)
On Sat, Jul 07, 2012 at 11:25:07AM +0200, Wojciech Puchar wrote: something they probably don't even know about, than to skilled users to turn it off. If this feature is going to prints quite a few extra lines, let's just add one more line saying: To disable this message run: echo set 31337mode ~/.tcshrc -- should i - from now, understand that this way of extending OS is considered right (i mean going down to newbies instead of going up) by FreeBSD developers? Not exactly. The from now part is a bit misleading. This is not starting now, we try hard to make FreeBSD easier to use, more consistent and friendlier in general for a long time now. In your terminology making FreeBSD easier for newcomers is going down implies that going up is to make it harder for newcomer. I hate to break it to you, but you are living upside down. Please answer it is important for me, and many other people for a future. You should definiately pay more attention, as this is happening every day. Everyone was newcomer once. I didn't succeed on my first attempt to install FreeBSD, neither on the second attempt. It took me few tries to do it right. I knew nothing about UNIX back then. I consider myself as someone who improved FreeBSD a bit, but I could as easly gave up after first two failed attempts to install it and move to something easier. How many people gave up after first or second attempt and never looked back? -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpVfudPm2SF3.pgp Description: PGP signature
Re: Pull in upstream before 9.1 code freeze?
On Thu, Jul 05, 2012 at 12:10:17AM -0600, Warner Losh wrote: On Jul 4, 2012, at 4:08 PM, Doug Barton wrote: On 07/04/2012 15:01, Mike Meyer wrote: On Wed, 04 Jul 2012 14:19:38 -0700 Doug Barton do...@freebsd.org wrote: On 07/04/2012 11:51, Jason Hellenthal wrote: What would be really nice here is a command wrapper hooked into the shell so that when you type a command and it does not exist it presents you with a question for suggestions to install somewhat like Fedora has done. I would also like to see this feature, which is pretty much universal in linux at this point. It's very handy. I, on the other hand, count it as one of the many features of Linux that make me use FreeBSD. First, I agree that being able to turn it off should be possible. But I can't help being curious ... why would you *not* want a feature that tells you what to install if you type a command that doesn't exist on the system? Because I find on Linux it often gets it wrong and winds up being useless noise. Mostly, though, it is because I mistype commands more than I type commands that should be there, but aren't. It is even cooler than I thought initially. It punishes you for making typos:) Cool. I think this is very useful for newcomers. The only thing that is missing is a one-liner how to disable this feature next to instruction how to install a package containing the missing command. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpeGNSklXll5.pgp Description: PGP signature
Re: Training wheels for commandline (was Re: Pull in upstream before 9.1 code freeze?)
On Thu, Jul 05, 2012 at 12:15:44PM +0200, Jonathan McKeown wrote: On Thursday 05 July 2012 11:03:32 Doug Barton wrote: If the new feature gets created, and you don't want to use it, turn it off. No problem. No. I think this is entirely the wrong way round. If the new feature is created and you want it, turn it on. Don't make me turn off something I didn't want in the first place. [...] This feature is targeted at new users, for whom it is harder to turn on something they probably don't even know about, than to skilled users to turn it off. If this feature is going to prints quite a few extra lines, let's just add one more line saying: To disable this message run: echo set 31337mode ~/.tcshrc -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpxEGzpKfXff.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Thu, Jun 28, 2012 at 08:33:17AM -0700, Marcel Moolenaar wrote: On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote: All of the above is ugly, U'm afraid :( Indeed. The only sane way is to put the metadata in a partition of its own. Every compliant OS will respect that and consequently will not scribble over the data unintentionally. Any other scheme that puts valuable data in some undocumented or unregistered location is violating the GPT spec right away and is susceptible to being clobbered unintentionally. If the user runs: # gpart create -s GPT /dev/mirror/foo for me it is obvious that he wants to partition the mirror device and not individual disks. Because the mirror was configured earlier, do you expect gmirror to somehow detect that someone is writting GPT metadata later and magically place GPT metadata on the raw disk and move mirror's metadata to some magic partition? Not to mention that the mirror itself doesn't have to be configured on top of raw disks. And not to mention that the mirror may never be partitioned. If GPT in your opinion is limited only to raw disks then I guess the best way to fix that is to refuse to configure GPT on anything except raw disks (which was already proposed by Andrey?). In my opinion this is unacceptable, but I think this is what you are suggesting. One of the GEOM design goals was to be flexible. Let the user decide in what order he wants to configure various layers. How do you know that in every possible scenerio software mirroring should come after partitioning and encryption after mirroring? Why can't we provide flexible tools to the user and let him decide? Maybe GPT nesting violates standards, but why can't we support it as an extention, really? I recognize the need to warn users if they use FreeBSD-specific features. We do that with non-standard APIs. So how about this. Let's modify gpart(8) to print a warning if GPT is configured on something else than raw disk. Let's the warning say that such configuration is non-standard and problems are expected if the disk is shared between other OSes. In my opinion that's fair. With such a warning in place, I think we can allow users to decide on their own if they really want that or not. Then, we can also improve FreeBSD boot loader to play nice with FreeBSD-specific extensions. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp74cN3XpwPl.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Thu, Jun 28, 2012 at 02:54:43PM -0700, Marcel Moolenaar wrote: On Jun 28, 2012, at 12:49 PM, Alexander Leidinger wrote: Or are you suggesting to convince all BIOS vendors to include the ability to boot from some kind of FreeBSD private partitioning scheme (not MBR as it is not suitable, not GPT as you are not OK to use it on a gmirror)? I would be having less problems if the mirroring didn't force the backup GPT header in anything but the last sector. [...] GPT backup header is placed in the last sector of the mirror device, just like the user asked. Gmirror doesn't force anything. User decides to put GPT partitioning on the mirror device instead of raw disk. Gmirror doesn't even know and doesn't have to know how the user uses data area on the mirror device. [...] If the metadata was somewhere else, then we wouldn't need to kluge various places to deal with the ambiguity and visible interoperability problems of the various tools and OSes. [...] Where is somewhere else, exactly? If somewhere else on this disk, then where? At the begining of the disk? Then you would complain that it keeps metadata where the primary header should be located and also MBR metadata, BSDlabel metadata, etc. Somewhere in the middle of the disk? Some future GPTng may want to use the same spot, but also gmirror-unaware boot loader will see corrupted data (shifted by one sector). Come on... If somewhere else is not on this disk, then I'm sorry, but this is totally impractical. Disks are the place you store stuff. In 99% of the cases there is no other place to store it, but the disk itself. Should we ask users to use additional disk to keep mirror's metadata? [...] Thus, it's not that I object to the mirroring per se, just to the mirroring as it is currently implemented with gmirror. Do you know software RAID (=1) or volume manager that doesn't keep metadata on component disks? PS. We are discussing two totally different things here: 1. Is placing GPT on anything but raw disk violates the spec? I can agree that it does and I'm happy with gpart(8) growing a warning. 2. How to do software mirroring. Besides trying really hard I'm not sure what alternative are you proposing. Could you be more specific and describe how gmirror should be implemented in your opinion? What about multipathing? In case the disk is attached via two paths but multipath is not enabled, the OS sees the same disk (and the same identical unique disk identifier) multiple times. Is this a violation of the spec too? It's the same disk, isn't it? The OS can actually use the property of the ID to infer that it has already seen this disk and not create multiple device nodes. You cannot trust some id that is found on disk to be unique, as all your assumptions break when the user decides to dd(1)-copy content of this disk to another disk, for example. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpDtjuGB9EcQ.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Wed, Jun 27, 2012 at 08:22:25AM -0400, John Baldwin wrote: I don't think so. Most common case is to configure partitions on top of a mirror. Mirroring partitions is less common. Mostly because of hardware RAIDs being popular. You don't expect hardware RAID vendor to mirror partitions. Partition editors for other OS's won't work, but only because they don't support gmirror. If they wouldn't recognize and support some hardware (or pseudo-hardware) RAIDs there will be the same problem. Hardware RAIDs hide the metadata from the disk that the BIOS (and disk editors) see. Thus, putting a GPT on a hardware RAID volume works fine as the logical volume is always seen by all OS's consistently. [...] Only if you won't connect this disk to a different controller. [...] The same is even true of the software RAID that graid supports since the metadata is defined by the vendor and thus the logical volume is always seen other OS's consistently. But is it seen without metadata by the boot loader? What I'm trying to say is that it is fair to expect from the user to not use gmirror-configured disk on different OS. If the user wants to use this disk in different OS then he has to use format that is recognized by both. Because gmirror is supported by FreeBSD we should improve the support by teaching boot loader about it. Pretending gmirror is special and recommending to mirror partitions with it instead of raw disks is not the solution. I really can't see how gmirror is different in this regard from any other software RAID or volume manager. If you try to use disk that contains unrecognized metadata the behaviour is undefined (but hopefully not a panic). -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpuYtYuIiw2R.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote: On Jun 26, 2012, at 10:37 AM, John Baldwin wrote: GPT really wants the backup header at the last LBA. I know you can set it, but I've interpreted that as a way to see if the primary header is correct or not. It seems to me that GPT tables created in this fashion (inside a GEOM provider) will not work properly with partition editors for other OS's. I'm hesitant to encourage the use of this as I do think putting GPT inside of a gmirror violates the GPT spec. Agreed. Guys. This doesn't violate the GPT spec in any way. The spec is narrow-minded if it talks only about raw disks, but you should think about gmirror as pseudo-hardware RAID. That's all. If putting GPT on top of RAID array is spec violation, then I guess we just have to live with it. While it is a nice trick to use the last sector for meta data, it does create 2 problems. 1 is mentioned above. [...] It doesn't really matter where gmirror puts its metadata. If gmirror would keep its metadata in the first sector, gpart/gpt will find its metadata in the last sector and will complain about missing primary header. [...] The second is that when there's different metadata in the first *and* the last sector, you can't decide which is to take precedence without also looking at the other and know how to interpret it. We have not solved this second problem at all. We do get reports about the problems though. At best we're handwaving or kluging. This is different kind of problem. It took me a while to realize that, but now I know:) The real problem is that not all metadata formats are suitable for autodetection. That's all. The metadata I use in my GEOM classes play nice with autodetection. The solution is very easy - keep size of the disk device within metadata. This allows gmirror to figure out if it is configured on raw disk, last slice or last partition within last slice, etc. If GPT would keep disk size in its metadata the second problem you mentioned would not exist. And to be honest GPT kinda does that by having backup header's LBA stored in the primary header. And this is fine as long the primary header is valid. The same problem is with things like UFS labels. There is no way to properly support them using GEOM autodetection, because there is no provider size in UFS superblock. UFS superblock contains file system size, but it is not the same, as one can create smaller file system than the underlying disk device. I think it's unwise to depend on FreeBSD-specific extensions or features in industry-standard partitioning schemes and as such make the use of foreign tools hard if not impossible. If you plan to use the given disk with FreeBSD only, what's the problem? Partitioning is not the end of the world. Even if you use industry-standard partitioning schemes what file system are you going to use to actually access your data? FAT? Of course if you do share your disk between various OSes then probably your best bet is to use MBR or GPT on raw disk and FAT file system. But if you use your disk with FreeBSD only, then I see no reason to not to leverage FreeBSD-specific features (be it gmirror, geli or zfs). A much more flexible approach is to support out-of-band configuration data. This allows us to mirror GPT disks without having to become non- standard as it removes the need to use the last sector for meta-data. The ability to construct GEOM hierarchies unambiguously is very important and our current approach has proven to not deliver on that. This is actually impacting existing FreeBSD consumers already, like Juniper. So, se should not go deeper into this rabbit hole. We should finally solve this problem for real... Marcel, nothing stops anyone from implementing GEOM mirror class that uses no on-disk metadata. GEOM is not a limiting factor here. GEOM does provide mechanism for autoconfiguration, but it is totally optional and GEOM class might choose not to use it. As an example you can take a look at two other GEOM classes of mine: gconcat(8) and gstripe(8). You can use 'label' subcommand to store metadata on component disks, which will take advantage of GEOM autodetection and autoconfiguration. You can also use 'create' subcommand to create ad hoc provider that stores no metadata and makes use of entire disks, which also means it won't be automatically created on next boot. For Juniper it might be more handy to use out-of-band configuration as you know the hardware you are running on, so you know where the disks are exactly, etc. My company build appliances too, so I have been there. For most of our users automatic configuration is simply better, as they can shuffle disks around and not wonder if the system will boot or not. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http
Re: [CFC/CFT] large changes in the loader(8) code
On Wed, Jun 27, 2012 at 10:45:35AM -0700, Marcel Moolenaar wrote: On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote: As for sharing disk with other OS. If you share the disk with OS that doesn't support gmirror, you shouldn't use gmirror in the first place. You probably want to use only formats that are recognized by all your OSes. This statement is ridicuous by virtue of not being in touch with reality and by making gmirror useless for such wide range of cases that one can question why we have it at all. Put differently: a mirroring class is a fairly basic and useful thing to have. Limiting it's use is nothing but artificial and follows from having to use the underlying provider to store metadata. This then changes the view of the underlying providing to consumers above gmirror in a way that makes the presence or absence of gmirror visible. Solving the visibility problem makes gmirror useful all the time. I see that as a better way of looking at it than simply blurting out that you shouldn't use gmirror when certain awkward and artifical conditions apply. I'm sorry, Marcel, but what you describe here has nothing to do with reality. To be able to implement realiable mirroring you have to use on-disk metadata. There is no way around that. You can implement non-redundant GEOM classes without using on-disk metadata, but out-of-band configuration in case of mirroring is simply naive. How do you detect that components are out of sync, for example? And when it comes to visablity. Are you suggesting that gmirror should present entire underlying provider to upper layers? Including its metadata? I hope not, because we went through that hell already (remember skipping first 16 sectors by UFS, as BSDlabel metadata might be there? The same for swap?). I think I did pretty good job by making the metadata as simple as possible - I use exactly one sector at the end of the target device. I'm really having a hard time to think of a simpler format. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpHuBBkXk10K.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote: Hi All, Some time ago i have started reading the code in the sys/boot. Especially i'm interested in the partition tables handling. I found several problems: 1. There are several copies of the same code in the libi386/biosdisk.c and common/disk.c, and partially libpc98/biosdisk.c. 2. ZFS probing is very slow, because the ZFS code doesn't know how many disks and partitions the system has: http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 3. The GPT support doesn't check CRC and even doesn't know anything about the secondary GPT header/table. Just a quick note here. At some point when I was adding GPT attributes to allow for test starts I greatly improved, at least parts of, the GPT implementation. I did implement support for both CRC checksum verification and fallback to backup GPT header when primary is broken. And the code is still in sys/boot/common/gpt.c. So my question would be what do you mean by this sentence? So, i have created the branch and committed the changes: http://svnweb.freebsd.org/base/user/ae/bootcode/ The patch is here: http://people.freebsd.org/~ae/boot.diff What i already did: 1. The partition tables handling now is machine independent, and it is compatible with the kernel's GEOM_PART implementation. There is new API for disk drivers in the loader to get information about partitions and tables: common/Makefile.inc common/part.c common/part.h 2. The similar and general code from the disk drivers merged in the disk.c: common/disk.c common/disk.h i386/libi386/libi386.h i386/libi386/biosdisk.c userboot/test/test.c userboot/userboot/userboot_disk.c userboot/userboot.h 3. ZFS code now uses new API and probing on the systems with many disks should be greatly increased: zfs/zfs.c i386/loader/main.c 4. The gptboot now searches the backup GPT header in the previous sectors, when it finds the GEOM:: signature in the last sector. PMBR code also tries to do the same: common/gpt.c i386/pmbr/pmbr.s 5. Also the pmbr image now contains one fake partition record. When several first sectors are damaged the kernel can't detect GPT (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1) command, but the old pmbr image has an empty partition table and loader doesn't able to boot from GPT, when there is no partition record in the PMBR. Now it will be able. When pmbr is installed via 'gpart bootcode' command, the kernel correctly modifies this partition record. So, this is only for the first rescue step. 6. I have changed userboot interface. I guess there is none consumers except the one test program. But if it isn't that, i can make it compatible. Any comments are welcome. -- WBR, Andrey V. Elsukov -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpiIPR0p9Pav.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 06:01:26PM +0400, Andrey V. Elsukov wrote: On 26.06.2012 16:57, Pawel Jakub Dawidek wrote: On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote: Hi All, Some time ago i have started reading the code in the sys/boot. Especially i'm interested in the partition tables handling. I found several problems: 1. There are several copies of the same code in the libi386/biosdisk.c and common/disk.c, and partially libpc98/biosdisk.c. 2. ZFS probing is very slow, because the ZFS code doesn't know how many disks and partitions the system has: http://www.freebsd.org/cgi/query-pr.cgi?pr=148296 http://www.freebsd.org/cgi/query-pr.cgi?pr=161897 3. The GPT support doesn't check CRC and even doesn't know anything about the secondary GPT header/table. Just a quick note here. At some point when I was adding GPT attributes to allow for test starts I greatly improved, at least parts of, the GPT implementation. I did implement support for both CRC checksum verification and fallback to backup GPT header when primary is broken. And the code is still in sys/boot/common/gpt.c. So my question would be what do you mean by this sentence? Yes, gptboot does that, but the loader/zfsloader doesn't. So there might be a situation when gptboot does boot, but loader(8) can't. I see. I don't know if I'll find time for a proper review, but it is really great that you are working on cleaning up this huge mess. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpLgEysD3gTw.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote: 4. The gptboot now searches the backup GPT header in the previous sectors, when it finds the GEOM:: signature in the last sector. PMBR code also tries to do the same: common/gpt.c i386/pmbr/pmbr.s GPT really wants the backup header at the last LBA. I know you can set it, but I've interpreted that as a way to see if the primary header is correct or not. [...] My interpretation is different: The way to verify if the header is valid is to check its checksum, not to check if the backup header location in the primary header points at the last LBA. Of course if primary header's checksum is incorrect it is hard to trust that the backup header location is correct. And we need the backup header when the primary header is invalid... [...] It seems to me that GPT tables created in this fashion (inside a GEOM provider) will not work properly with partition editors for other OS's. I'm hesitant to encourage the use of this as I do think putting GPT inside of a gmirror violates the GPT spec. I don't think so. Most common case is to configure partitions on top of a mirror. Mirroring partitions is less common. Mostly because of hardware RAIDs being popular. You don't expect hardware RAID vendor to mirror partitions. Partition editors for other OS's won't work, but only because they don't support gmirror. If they wouldn't recognize and support some hardware (or pseudo-hardware) RAIDs there will be the same problem. In other words, IMHO, our problem is that FreeBSD's boot code doesn't recognize/support gmirror's metadata. What Andrey is proposing is to recognize the metadata and act accordingly - in case of a gmirror we simply need to skip it. In the future we will have the same problem with graid - until we add support for it to the boot code, we won't be able to boot from it. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp3XvXHY46CU.pgp Description: PGP signature
Re: [CFC/CFT] large changes in the loader(8) code
On Tue, Jun 26, 2012 at 02:41:31PM -0700, Kevin Oberman wrote: Long ago I saw a proposal to create a dedicated partition on GPT to hold the metadata. With the large number of partitions available on GPT, tying up one just for GEOM seems like a low price and it moves the device GEOM out of the realm of FreeBSD unique and subject to serious issues when/if a disk is shared with some other OS. I have seen little comment on this and have never seen any argument that that it could not work. I think this is an issue that will continue to bite users unless it is fixed. I don't really see how dedicating a partition for metadata can work or is good idea, sorry. As for sharing disk with other OS. If you share the disk with OS that doesn't support gmirror, you shouldn't use gmirror in the first place. You probably want to use only formats that are recognized by all your OSes. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgpWHeMC9knsD.pgp Description: PGP signature
Re: [RFC] last(1) with security.bsd.see_other_uids support
On Tue, Jun 05, 2012 at 11:31:01PM +0200, Jilles Tjoelker wrote: Also, the attack surface of such a daemon may be smaller than that of a setuid/setgid program. Really? I don't see that. With current patch and setgid to utmp the process can only read some files that don't even contain very sensitive data (like passwords). Any privileged daemon is much bigger threat. Also, do we really want a daemon running all the time just to be able to parse utx files? Alternatively, the daemon could be a setgid program that is spawned by the utmpx APIs when needed. Still seems a bit too far for my taste. Spawning a daemon somewhere from within library doesn't sound like a good idea to me... At least until we have something like launchd that can start such services on demand. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp5gmb6H1FxE.pgp Description: PGP signature
Re: [RFC] last(1) with security.bsd.see_other_uids support
(_SC_NGROUPS_MAX); if (ngroups_max == -1) ngroups_max = NGROUPS_MAX; ngroups_max++; + if ((groups = malloc(sizeof(gid_t) * (ngroups_max))) == NULL) + err(1, malloc); When this goes into library you has to return an error here. + ngroups = ngroups_max; + (void) getgrouplist(pw-pw_name, pw-pw_gid, groups, ngroups); You know that getgrouplist(3) returns groups from the system files and not actuall process groups? Was that intended? IMHO you should use getgroups(2) here. And again you ignore return value. + for (cnt = 0; cnt ngroups; ++cnt) { + gid = groups[cnt]; + group = getgrgid(gid); + /* User is in utmp or wheel group, they can see all */ + if (strncmp(utmp, group-gr_name, 4) == 0 || strncmp(wheel, group-gr_name, 5) == 0) { strncmp(3) is bad idea here. If the user is a member of utmpfoo group or wheelx group you turn off restrictions. I'd really use getgroups(2) and look for GID_WHEEL or _UTMP_GID. @@ -212,7 +255,30 @@ struct idtab { /* Load the last entries from the file. */ if (setutxdb(UTXDB_LOG, file) != 0) err(1, %s, file); + + /* drop setgid now that the db is open */ Style: Sentence should start with capital letter and end with a period. + setgid(getgid()); And if setgid(2) fails? + /* Lookup current user information */ Style: Sentence should end with a period. + pw = getpwuid(getuid()); And if getpwuid(3) fails? + len = sizeof(see_other_uids); + if (sysctlbyname(security.bsd.see_other_uids, see_other_uids, len, NULL, 0)) sysctlbyname(3) doesn't return bool. + see_other_uids = 0; + restricted = is_user_restricted(pw, see_other_uids); + while ((ut = getutxent()) != NULL) { + /* Skip this entry if the invoking user is not permitted + * to see it */ + if (restricted + !(ut-ut_type == BOOT_TIME || + ut-ut_type == SHUTDOWN_TIME || + ut-ut_type == OLD_TIME || + ut-ut_type == NEW_TIME || + ut-ut_type == INIT_PROCESS) + strncmp(ut-ut_user, pw-pw_name, sizeof(ut-ut_user))) That's one complex if. And again strncmp(3) used instead of strcmp(3). Also strncmp(3) doesn't return bool. If getpwuid(3) failed earlier you have NULL pointer dereference here. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://tupytaj.pl pgp2ssyiX0rVx.pgp Description: PGP signature
Re: NFS mount inside jail fails
On Tue, May 17, 2011 at 10:17:12PM +0200, Alexander Leidinger wrote: On Tue, 17 May 2011 12:56:40 -0700 Sean Bruno sean...@yahoo-inc.com wrote: Silly thing I ran into today. User wanted to NFS mount a dir inside a jail. After I groaned about the security implication of this, I noted that there is a sysctl that looks like it should allow this. Namely, security.jail.mount_allowed. I noted that setting this follows a path that *should* have allowed this silly thing to happen, except that the credentials in the nfsclient were not setup correctly. As you noticed, this is supposed to allow to mount inside a jail, IF the FS you want to mount is marked as secure/safe to do so. Nearly no FS is marked as such, as nobody wants to guarantee that it is safe (root in a jail should not be able to panic a system by trying to mount a corrupt/malicious FS-image) and secure (not possible to get elevated access/privileges). For NFS there is theoretically the problem that the outgoing address on requests could be the one of the physical host instead of the IP of the jail. If this is true in practice, I do not know. This could be the reason why NFS is not marked with VFCF_JAIL. It is not marked with VFCF_JAIL, because I just had no time to audit that it is safe. It might be safe in theory. There are some file systems types that can't be securely mounted within a jail no matter what, like UFS, MSDOFS, EXTFS, XFS, REISERFS, NTFS, etc. because the user mounting it has access to raw storage and can corrupt it in a way that it will panic entire system. There are other file systems that don't require access to raw storage for the user doing the mount and chances are they are safe to mount from within a jail, like ZFS (user can have access to ZFS datasets, but don't need access to ZFS pool), NFS, SMBFS, NULLFS, UNIONFS, PROCFS, FDESCFS, etc. I added VFCF_JAIL flag, so there is general mechanism to mark file systems as jail-friendly, but back then I only needed it for ZFS. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgpsb21WHbLTg.pgp Description: PGP signature
Re: Add SUM sysctl
On Mon, Apr 18, 2011 at 08:24:57AM -0400, John Baldwin wrote: On Saturday, April 16, 2011 10:24:44 am rank1see...@gmail.com wrote: After compilation of kernel and world in MUM, kernel is installed in MUM, but to install world, we reboot into SUM, then install world. (HANDBOOK) Now, in case of GELI usage AND if upgrading is taking place, i.e; 8.2 - 8.3, once you reboot into SUM to install world, you are doomed, BECAUSE ... Kernel will bitch (GELI part), about world-kernel mismatch and you won't be able to install world as you cant decrypt geom providers!! The only way to save yourself in that case is to restore /boot/kernel.old, or one is doomed. This seems broken to me. An 8.3 kernel+modules should be able to handle GELI devices with an 8.2 world. If they can't, it means someone broke the ABI. Even a 9.0 kernel should work fine with an 8.x-stable world. This is generally not expected to have a bit of the system encrypted. You either have whole root encrypted and there is no userland involved to attach it or you have some secure partition encrypted. I don't fully understand how you can boot your system and then need to attach GELI provider to be able to install world. If you booted fine then your system is available and not encrypted. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgptHXwgaj9Ei.pgp Description: PGP signature
Re: looking for error codes
On Fri, Apr 01, 2011 at 06:18:54PM +0300, Andriy Gapon wrote: on 01/04/2011 18:04 Andrew Duane said the following: AFAIK, FreeBSD does not really detect read-only media. This was something I had to add as a small project here at work, and was considering cleaning up to try to get into CURRENT. If there's a real need for it, I could speed that up. Yes, that's exactly the problem that I am looking at. So if you have anything to share it will be greatly appreciated at least by me. But I think many more people could benefit from it (e.g. those having SD/SDHC/etc cards). Once you detect read-only media, I suggest to implement the support by adding new DISKFLAG_READONLY to disk(9) API and simply deny write access in g_disk_access() when DISKFLAG_READONLY is set. -- Pawel Jakub Dawidek http://www.wheelsystems.com FreeBSD committer http://www.FreeBSD.org Am I Evil? Yes, I Am! http://yomoli.com pgppOnSwo9rXW.pgp Description: PGP signature
Re: glabel force sectorsize patch
On Sun, Aug 08, 2010 at 03:57:44AM +0200, Ivan Voras wrote: Hi, In order to help users having 4k sector drives which the system recognizes as 512 byte sector drives, I'm proposing a patch to glabel which enables it to use a forced sector size for its native-labeled providers. It is naturally only usable with glabel-native labels (those created by glabel label) and not partition and file system labels because we cannot add arbitrary new fields to metadata of those types. The patch is here: http://people.freebsd.org/~ivoras/diffs/glabel_ssize.patch [...] This mechanism is a band-aid until there's a better way of dealing with 4k drives. So why do you want to obfuscate glabel with it? For people to start depend on it? Once we start supporting 4kB sectors what do we do with such a change? Remove it and decrease version number? What people will do with providers already labeled this way? If its temporary, just allow to list providers you want to increase sector size in /boot/loader.conf. Once we start supporting it properly people might simply remove it from loader.conf and it should just work. Glabel is not for that and I don't agree for such obfuscation. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp9g74Rergrb.pgp Description: PGP signature
Re: glabel force sectorsize patch
On Sun, Aug 08, 2010 at 02:02:17PM +0200, Ivan Voras wrote: On 8.8.2010 12:30, Pawel Jakub Dawidek wrote: So why do you want to obfuscate glabel with it? For people to start depend on it? Once we start supporting 4kB sectors what do we do with such a change? Remove it and decrease version number? What people will do with providers already labeled this way? If its temporary, just allow to list providers you want to increase sector size in /boot/loader.conf. Once we start supporting it properly people might simply remove it from loader.conf and it should just work. Glabel is not for that and I don't agree for such obfuscation. Of course, there are good and bad sides to it. My take on it is that the only bad side is that it really isn't glabel's primary function to (optionally) fixup geometry, while the good sides are: It isn't its secondary function either. * glabel is in GENERIC and judging by the mailing lists' traffic it is one of the better used parts of the system so people are familiar with it. It is also already used as a perfectly valid fixup for device renaming, making both UFS and ZFS more stable for usage. That's an excellent argument. But you know what? The em(4) is also in GENERIC, why not to add it in there? * You can't really make people depend on glabel both because it is in GENERIC and because of it storing metadata in the last sector, making the rest of the drive completely usable without it in the event native 4k sector support is grown. I never said that. I do want people to depend on glabel, because it is free of such ugly hacks, so I know it won't bite them in the future. I don't want people to start depend on the fact that glabel supports changing sector sizes. Once we start supporting 4kB sectors properly people configuration will stop working, because glabel won't be able to read its metadata anymore. Your hack will break all configurations that started to depend on your hack. In what I proposed, GEOM provider will be presented to glabel (or any other GEOM class) as 4kB provider and everything will just work, also after adding proper support for 4kB sectors. I'd like to hear comments from the wider audience. In respect with your comment, I will compromise: as 4k sector drives have become available over the counter more than 6 months ago and so far I think this is the first effort to give some support for them, I will commit this patch before 9.0 code freeze only if no other support gets developed. I'll repeat. You won't commit this patch, because it is totally wrong solution and can only do a lot of damage in the future. If you look forward, even temporary solutions can be done right. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp8NDzCMjfAM.pgp Description: PGP signature
Re: glabel force sectorsize patch
On Sun, Aug 08, 2010 at 02:57:20PM +0200, Marius Nünnerich wrote: On Sun, Aug 8, 2010 at 14:02, Ivan Voras ivo...@freebsd.org wrote: I'd like to hear comments from the wider audience. In respect with your comment, I will compromise: as 4k sector drives have become available over the counter more than 6 months ago and so far I think this is the first effort to give some support for them, I will commit this patch before 9.0 code freeze only if no other support gets developed. I do not like this at all. Even if it's just for the KISS and POLA principles. A geom should do one thing and do it right imo. Why not write a new geom class that does what you want? New GEOM class only for sectorsize conversion that can operate on metadata will be useful, not only to solve this particular problem. Although keep in mind that if at some point disks will be detected and presented as 4kB providers to the GEOM, this class won't be able to find its metadata anymore (as it was stored in the last 512 bytes, not in the last 4 kilobytes). -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpLef8MwEhAp.pgp Description: PGP signature
Re: GEOM_ULZMA
On Tue, Mar 02, 2010 at 08:32:20PM +0100, Dimitry Andric wrote: On 2010-03-02 09:47, Alexandr Rybalko wrote: Definiatelly separately, not sure where. There is ongoing discussion somwhere on importing this algorithm to the base for tar(1) to use, it would be best to have only one copy of code in the tree. I have already said, that it would be good for embedded platforms have only one copy of the code for the kernel and userland. It is not thought of how done it. I think Pawel means the *source* code in this case, not the executable code. E.g. lzma source should most likely go under /usr/src/contrib, and be built separately for kernel and userland. If it is going to be used be the kernel it has to be under sys/. And yes, I was talking about one copy of the source, not executable. I think it would be bad idea to do compression in the kernel for userland applications for many reasons - the most important one is security. Look at projects like Capsicum where Robert closed for example gzip in a tight sandbox and gzip is not even set-uid and giving it chance to gain kernel access when bug is found is very, very bad. Another reason is performance. You can see how much faster, eg. openssl crypto is when doing it in userland and when forcing it to use software crypto from the opencrypto kernel framework. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpechSxEvBZl.pgp Description: PGP signature
Re: GEOM_ULZMA
On Fri, Feb 19, 2010 at 04:36:44PM +0200, Alexandr Rybalko wrote: Hi, I wrote a module GEOM_ULZMA (such as GEOM_UZIP, but compression with lzma), [...] Wouldn't it be better to modify geom_uzip to be universal decompression class with various algorithms implemented as plugins? This is bascially what I did for the LABEL class - before we had VOL_FFS class only for UFS labels. [...] in connection with this is an issue best left lzma code in the file geom_ulzma.c or store lzma library separately. If separately, then where better? Definiatelly separately, not sure where. There is ongoing discussion somwhere on importing this algorithm to the base for tar(1) to use, it would be best to have only one copy of code in the tree. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpXqmkopDvFy.pgp Description: PGP signature
Re: Deadlock between GEOM and devfs device destroy and process exit.
On Sat, Jan 30, 2010 at 12:44:51PM +0100, Pawel Jakub Dawidek wrote: Maybe I'll add how I understand what's going on: GEOM calls destroy_dev() while holding the topology lock. Destroy_dev() wants to destroy device, but can't because there are threads that still have it open. The threads can't close it, because to close it they need the topology lock. The deadlock is quite obvious, IMHO. Guys, changing destroy_dev() to destroy_dev_sched() in geom_dev.c fixes the problem for me (at least it makes race window so small that I can't reproduce it). Is there anyone who isn't happy with such a change? -- Pawel Jakub Dawidek http://www.wheel.pl p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpAaq49ZfpjR.pgp Description: PGP signature
Re: Deadlock between GEOM and devfs device destroy and process exit.
On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: Hi. Experimenting with SATA hot-plug I've found quite repeatable deadlock case. Problem observed when several SATA devices, opened via devfs, disappear at exactly same time. In my case, at time of unplugging SATA Port Multiplier with several disks beyond it. All I have to do is to run several `dd if=/dev/adaX of=/dev/null bs=1m ` commands and unplug multiplier. That causes predictable I/O errors and devices destruction. But with high probability several dd processes getting stuck in kernel. [...] I observed the same thing yesterday while stress-testing HAST: 3659 2504 3659 0 DE+ GEOM top 0x8079a348 dd 3658 2102 2102 0 DE+ GEOM top 0x8079a348 hastd 2 0 0 0 DL devdrn 0x85b1bc68 [g_event] Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit path, which is already held by the g_event thread. Interesting backtraces: db bt 2 [...] _sleep(85b1bc68,8079aab8,4c,80711ab3,64,...) at _sleep+0x339 destroy_devl(5,0,80711c53,85b1bcb0,804945cd,...) at destroy_devl+0x20f destroy_dev(86a10a00,8070ea93,86a09800,860888e0,0,...) at destroy_dev+0x2f g_dev_orphan(86a09800,8070f424,871038d8,90,6,...) at g_dev_orphan+0x6d g_run_events(8079a378,0,4c,8070c221,64,...) at g_run_events+0x1c0 g_event_procbody(0,85b1bd38,80713228,343,85d0b7f8,...) at g_event_procbody+0x8a [...] db bt 3658 [...] sleepq_wait(8079a348,0,8070f822,3,0,...) at sleepq_wait+0x63 _sx_xlock_hard(8079a348,86974240,0,8070ea66,c8,...) at _sx_xlock_hard+0x496 _sx_xlock(8079a348,0,8070ea66,c8,2000,...) at _sx_xlock+0xc0 g_dev_close(85f8ee00,4003,2000,86974240,86974240,...) at g_dev_close+0xbd devfs_close(dc49eaac,80745707,8,8,868be984,...) at devfs_close+0x2b2 VOP_CLOSE_APV(80753ac0,dc49eaac,80726500,128,2,...) at VOP_CLOSE_APV+0xc5 vn_close(868be984,4003,85fd5500,86974240,0,...) at vn_close+0x190 vn_closefile(86a20968,86974240,86a20968,0,dc49eb5c,...) at vn_closefile+0xe4 devfs_close_f(86a20968,86974240,0,0,86a20968,...) at devfs_close_f+0x2b _fdrop(86a20968,86974240,14,80719d1a,0,dc49eb98,1,86975000,8635c22c,8635c22c,721,8071264b,dc49ebb8,804f87d0,8635c22c,8,8071264b,721) at _fdrop+0x43 closef(86a20968,86974240,721,71e,869742e4,...) at closef+0x290 fdfree(86974240,0,80712fdd,107,864c4330,...) at fdfree+0x3ea exit1(86974240,0,dc49ed2c,806d830a,86974240,...) at exit1+0x513 sys_exit(86974240,dc49ecf8,86974240,dc49ed2c,202,...) at sys_exit+0x1d [...] db bt 3659 [...] sleepq_wait(8079a348,0,8070f822,3,0,...) at sleepq_wait+0x63 _sx_xlock_hard(8079a348,863e06c0,0,8070ea66,c8,...) at _sx_xlock_hard+0x496 _sx_xlock(8079a348,0,8070ea66,c8,2000,...) at _sx_xlock+0xc0 g_dev_close(86a10a00,3,2000,863e06c0,863e06c0,...) at g_dev_close+0xbd devfs_close(dc4f6aac,80745707,8,8,86aa6c3c,...) at devfs_close+0x2b2 VOP_CLOSE_APV(80753ac0,dc4f6aac,80726500,128,2,...) at VOP_CLOSE_APV+0xc5 vn_close(86aa6c3c,3,870d4080,863e06c0,80cbac08,...) at vn_close+0x190 vn_closefile(871028f8,863e06c0,871028f8,0,dc4f6b5c,...) at vn_closefile+0xe4 devfs_close_f(871028f8,863e06c0,0,0,871028f8,...) at devfs_close_f+0x2b _fdrop(871028f8,863e06c0,8071809c,40e,0,805354ab,8071809c,8071df19,8635d42c,8635d42c,721,8071264b,dc4f6bb8,804f87d0,8635d42c,8,8071264b,721) at _fdrop+0x43 closef(871028f8,863e06c0,721,71e,863e0764,...) at closef+0x290 fdfree(863e06c0,0,80712fdd,107,86153088,...) at fdfree+0x3ea exit1(863e06c0,100,dc4f6d2c,806d830a,863e06c0,...) at exit1+0x513 sys_exit(863e06c0,dc4f6cf8,863e06c0,dc4f6d2c,202,...) at sys_exit+0x1d [...] db show lock 0x8079a348 class: sx name: GEOM topology state: XLOCK: 0x85d0d000 (tid 18, pid 2, g_event) waiters: exclusive -- Pawel Jakub Dawidek http://www.wheel.pl p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpJLOrEZhLnv.pgp Description: PGP signature
Re: Deadlock between GEOM and devfs device destroy and process exit.
On Sat, Jan 30, 2010 at 12:27:49PM +0100, Pawel Jakub Dawidek wrote: On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote: Hi. Experimenting with SATA hot-plug I've found quite repeatable deadlock case. Problem observed when several SATA devices, opened via devfs, disappear at exactly same time. In my case, at time of unplugging SATA Port Multiplier with several disks beyond it. All I have to do is to run several `dd if=/dev/adaX of=/dev/null bs=1m ` commands and unplug multiplier. That causes predictable I/O errors and devices destruction. But with high probability several dd processes getting stuck in kernel. [...] I observed the same thing yesterday while stress-testing HAST: 3659 2504 3659 0 DE+ GEOM top 0x8079a348 dd 3658 2102 2102 0 DE+ GEOM top 0x8079a348 hastd 2 0 0 0 DL devdrn 0x85b1bc68 [g_event] Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit path, which is already held by the g_event thread. Maybe I'll add how I understand what's going on: GEOM calls destroy_dev() while holding the topology lock. Destroy_dev() wants to destroy device, but can't because there are threads that still have it open. The threads can't close it, because to close it they need the topology lock. The deadlock is quite obvious, IMHO. I believe the problem could be solved by dropping the topology lock in g_dev_orphan() when calling destroy_dev(dev), but it is hard to say if it is safe to drop the topology lock there. Maybe Poul-Henning could take a look. -- Pawel Jakub Dawidek http://www.wheel.pl p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpInlBIRuozT.pgp Description: PGP signature
Re: ZFS group ownership
On Sat, Sep 12, 2009 at 01:49:36PM +0200, Giulio Ferro wrote: [...] Now I try to do the same on a zfs partition on the same machine This is what I see with ls --- ls -la total 4 drwxrwx--- 3 www www 4 Sep 12 13:43 . drwxr-xr-x 4 rootwheel 4 Sep 12 13:43 .. drwxrwx--- 2 gferro gferro 2 Sep 12 13:43 asda -rw-rw 1 gferro gferro 0 Sep 12 13:43 qweq --- As you can see, both file and directory belongs now to gferro and not www. This means that other users won't even be able to read my files / dir, let alone modify them. What I ask now is: is this a bug or a feature? This is a bug. I changed default ZFS behaviour (which is SYSV) to match BSD behaviour (ie. inherit group ownership from the parent directory), but it become broken during v6 - v13 switch. Could you file PR for this, I should be able to fix it before 8.0-RELEASE. -- Pawel Jakub Dawidek http://www.wheel.pl p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpoGUKsyza8F.pgp Description: PGP signature
Re: sosend() and mbuf
On Mon, Aug 03, 2009 at 09:25:27PM +, Maslan wrote: No my code doesn't work, I thought it may be because that soaccept() -which is not found in man 9- is non-blocking, so i've to put my code in a thread. Now i got another problem, when I open a text file from this thread, the kernel crashes, I'm sure that its the thread. kthread_create((void *)thread_main, NULL, NULL, RFNOWAIT, 0, thread); void thread_main(){ struct thread *td = curthread; int ret; int fd; ret = f_open(/path/to/file.txt, fd); printf(%d\n, ret); tsleep(td, PDROP, test tsleep, 10*hz); f_close(fd); kthread_exit(0); } int f_open(char *filename, int *fd){ struct thread *td = curthread; int ret = kern_open(td, filename, UIO_SYSSPACE, O_RDONLY, FREAD); if(!ret){ *fd = td-td_retval[0]; return 1; } return 0; } I've to finish up this problem to go back for the first one. Can you figure out what's wrong with this code, it works when I call thread_main() rather than kthread_create((void *)thread_main, . When you did kern_open() without creating kernel thread, it worked, because kern_open() used file descriptor table from your current (userland) process. In FreeBSD 7.x kthread_create() creates a process without file descriptor table, so you can't use kern_open() and actually you shouldn't do this either. Take a look at sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c, where you can find functions to do what you want. I guess you already considered doing all this in userland?:) -- Pawel Jakub Dawidek http://www.wheel.pl p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpwbxTvE99e9.pgp Description: PGP signature
Linker deadlock.
Hi. Linker can easly deadlock when we try to load the same kernel module from two processes at the same time. This is because we drop kld_sx in linker_load_file() and reacquire it, which leads to LOR, because we already held vnode lock at this point. Interesing backtraces below. First process: db tr 3066 Tracing pid 3066 tid 100090 td 0x8514b240 sched_switch(8514b240,0,104,177,bb6bbb2e,...) at sched_switch+0x40e mi_switch(104,0,80681605,1ca,0,...) at mi_switch+0x200 sleepq_switch(8514b240,0,80681605,237,80a281ec,...) at sleepq_switch+0x14d sleepq_wait(80a281ec,0,8067a18b,3,0,...) at sleepq_wait+0x63 _sx_xlock_hard(80a281ec,8514b240,0,8067a1cf,1a0,...) at _sx_xlock_hard+0x2c6 _sx_xlock(80a281ec,0,8067a1cf,1a0,0,...) at _sx_xlock+0x99 linker_load_module(853a1264,0,83ba8940,83ba893c,83ba8938,...) at linker_load_module+0xa4a linker_load_dependencies(84fb8500,bb74,8539f000,2adc,156000,...) at linker_load_dependencies+0x194 link_elf_load_file(806b74e0,8557e4c0,83ba8c24,17c,0,...) at link_elf_load_file+0x4f0 linker_load_module(0,83ba8c4c,8067a1cf,3cd,280cb730,...) at linker_load_module+0x8db kern_kldload(8514b240,8592d400,83ba8c70,0,b395eb11,...) at kern_kldload+0xc8 [...] db show lock 0x80a281ec class: sx name: kernel linker state: XLOCK: 0x8514bd80 (tid 100117, pid 3065, zpool) waiters: exclusive Second process: db tr 3065 Tracing pid 3065 tid 100117 td 0x8514bd80 sched_switch(8514bd80,0,104,177,bb7e358b,...) at sched_switch+0x40e mi_switch(104,0,80681605,1ca,50,...) at mi_switch+0x200 sleepq_switch(8514bd80,0,80681605,237,8523d9c0,...) at sleepq_switch+0x14d sleepq_wait(8523d9c0,50,806906bb,4,0,...) at sleepq_wait+0x63 __lockmgr_args(8523d9c0,80100,8523da28,0,0,...) at __lockmgr_args+0x9a5 vop_stdlock(83bd2660,8508aa80,2,80100,8523d968,...) at vop_stdlock+0x65 VOP_LOCK1_APV(806c3560,83bd2660,806d2ac0,8523d968,80100,...) at VOP_LOCK1_APV+0xa5 _vn_lock(8523d968,80100,8068815b,802,804c9cb4,...) at _vn_lock+0x5e vget(8523d968,80100,8514bd80,1b7,8065d00f,...) at vget+0xc9 cache_lookup(85090158,83bd2a00,83bd2a14,0,84f3b400,...) at cache_lookup+0x4c2 nfs_lookup(83bd2838,80688e43,806d2720,8,85090158,...) at nfs_lookup+0x101 VOP_LOOKUP_APV(806c3560,83bd2838,8068783d,1bd,83bd2a00,...) at VOP_LOOKUP_APV+0xe5 lookup(83bd29e8,8068783d,e0,c0,8506e52c,...) at lookup+0x52e namei(83bd29e8,81159a38,80a352b4,4,8067be1f,...) at namei+0x48b vn_open_cred(83bd29e8,83bd2a4c,0,84f3b400,0,...) at vn_open_cred+0x2ba vn_open(83bd29e8,83bd2a4c,0,0,806b2a00,...) at vn_open+0x33 linker_lookup_file(3,0,3,8514bd80,0,...) at linker_lookup_file+0x163 linker_load_module(0,83bd2c4c,8067a1cf,3cd,280cb730,...) at linker_load_module+0x7bd kern_kldload(8514bd80,85a7e400,83bd2c70,0,b395eb11,...) at kern_kldload+0xc8 [...] db show vnode 0x8523d968 vnode 0x8523d968: tag nfs, type VREG usecount 1, writecount 0, refcount 189 mountedhere 0 flags () v_object 0x852489b0 ref 0 pages 372 lock type nfs: EXCL by thread 0x8514b240 (pid 3066) with exclusive waiters pending #0 0x804c2e5d at __lockmgr_args+0xa6d #1 0x80546c85 at vop_stdlock+0x65 #2 0x8065dcd5 at VOP_LOCK1_APV+0xa5 #3 0x805627ee at _vn_lock+0x5e #4 0x80557419 at vget+0xc9 #5 0x805444b2 at cache_lookup+0x4c2 #6 0x805c3b51 at nfs_lookup+0x101 #7 0x8065ee65 at VOP_LOOKUP_APV+0xe5 #8 0x8054a9be at lookup+0x52e #9 0x8054b5eb at namei+0x48b #10 0x805621da at vn_open_cred+0x2ba #11 0x80562463 at vn_open+0x33 #12 0x804f45e8 at link_elf_load_file+0x68 #13 0x804c0f9b at linker_load_module+0x8db #14 0x804c1568 at kern_kldload+0xc8 #15 0x804c1624 at kldload+0x74 #16 0x80650513 at syscall+0x283 #17 0x80634e40 at Xint0x80_syscall+0x20 [...] -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp7AAbGMNK4D.pgp Description: PGP signature
Re: Linker deadlock.
On Sun, Aug 03, 2008 at 02:09:26PM +0300, Kostik Belousov wrote: Source line backtraces would be nicer, since gcc inliner forces me to make a guess. It seems that linker_load_module() calls linker_load_file() that drops and reaquires the linker lock. Then, it seems that dropping the module' vnode lock around the call to linker_load_dependencies() should help. Yes, it doesn't deadlock now, thanks! diff --git a/sys/kern/link_elf.c b/sys/kern/link_elf.c index 2664ba9..52b3f8f 100644 --- a/sys/kern/link_elf.c +++ b/sys/kern/link_elf.c @@ -802,7 +802,9 @@ link_elf_load_file(linker_class_t cls, const char* filename, goto out; link_elf_reloc_local(lf); +VOP_UNLOCK(nd.ni_vp, 0); error = linker_load_dependencies(lf); +vn_lock(nd.ni_vp, LK_EXCLUSIVE | LK_RETRY); if (error) goto out; #if 0/* this will be more trouble than it's worth for now */ diff --git a/sys/kern/link_elf_obj.c b/sys/kern/link_elf_obj.c index d8e9219..657dd0e 100644 --- a/sys/kern/link_elf_obj.c +++ b/sys/kern/link_elf_obj.c @@ -798,7 +798,9 @@ link_elf_load_file(linker_class_t cls, const char *filename, link_elf_reloc_local(lf); /* Pull in dependencies */ + VOP_UNLOCK(nd.ni_vp); error = linker_load_dependencies(lf); + vn_lock(nd.ni_vp, LK_EXCLUSIVE | LK_RETRY); if (error) goto out; -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpbFGLaCqhgl.pgp Description: PGP signature
Re: crypto(9) and maxoplen
On Mon, Jul 21, 2008 at 02:10:00PM +0200, Patrick Lamaizi?re wrote: Le Sun, 20 Jul 2008 21:39:55 +0200, Pawel Jakub Dawidek [EMAIL PROTECTED] a écrit : Hello, In the opencrypto framework the function crypto_register() has an argument 'maxoplen'. http://fxr.watson.org/fxr/source/opencrypto/crypto.c#L625 Does somebody know what was the goal of this parameter? It is not used by the framework. The man page of crypto(9) says : For each algorithm the driver supports, it must then call crypto_register(). The first two arguments are the driver and algorithm identifiers. The next two arguments specify the largest possible operator length (in bits, important for public key operations) and flags for this algorithm. I'm asking if it can help for this problem: the glxsb driver can perform AES-CBC algorithm only with 128 bits key and may be 'maxoplen' was intended for this case. Without something to specify the key's length, the driver is selected by the framework even with keys != 128 bits. So it fails when the session is opened. This prevents setkey/ipsec to work with key length != 128 bits if the driver is loaded. If I read code properly, there is currently no way for a driver to say to the opencrypto framework that only AES-CBC with 128bit key is supported. A driver can only state that it supports AES-CBC, that's all. As a workaround the driver should implement AES-CBC-192 and AES-CBC-256 in software. Yes, but my question is about the maxoplen parameter. Was it intended for this case? Why we keep this parameter? Can't help here, no idea. Eventhough it isn't something I'd like to see implemented. 'maxoplen' is just a little better than what we have now. And what if a driver supports 192 or 256 bits only? IMHO, It is far easier to hack the OCF to use this parameter than to implement a workaround. It would be a better solution, by sample we may want to use the driver for AES-128 and another hardware that provides AES 192/256. Another (the best?) solution would be for the crypto framework to select another driver if the driver's newsession() fails. There are many improvements that could be done in opencrypto framework, believe me. One of the things that annoys me a lot is that if you want to use IPsec with a driver that support only encryption, you have to implement hash functions in software for the given driver. Feel free to work on this, but be sure to avoid solutions like this maxoplen thing, which bascially isn't really a step further. Choosing another driver on newsession failure sounds reasonable, although we may lose informations like 'the caller wanted hardware crypto only'. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpXFV2jtyaNI.pgp Description: PGP signature
Re: crypto(9) and maxoplen
On Sat, Jul 19, 2008 at 12:58:13AM +0200, Patrick Lamaizi?re wrote: Hello, In the opencrypto framework the function crypto_register() has an argument 'maxoplen'. http://fxr.watson.org/fxr/source/opencrypto/crypto.c#L625 Does somebody know what was the goal of this parameter? It is not used by the framework. The man page of crypto(9) says : For each algorithm the driver supports, it must then call crypto_register(). The first two arguments are the driver and algorithm identifiers. The next two arguments specify the largest possible operator length (in bits, important for public key operations) and flags for this algorithm. I'm asking if it can help for this problem: the glxsb driver can perform AES-CBC algorithm only with 128 bits key and may be 'maxoplen' was intended for this case. Without something to specify the key's length, the driver is selected by the framework even with keys != 128 bits. So it fails when the session is opened. This prevents setkey/ipsec to work with key length != 128 bits if the driver is loaded. If I read code properly, there is currently no way for a driver to say to the opencrypto framework that only AES-CBC with 128bit key is supported. A driver can only state that it supports AES-CBC, that's all. As a workaround the driver should implement AES-CBC-192 and AES-CBC-256 in software. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpGJl1Bb23wd.pgp Description: PGP signature
Re: Is there any way to increase the KVM?
On Thu, Jun 05, 2008 at 04:00:13PM +0200, Ivan Voras wrote: Pawel Jakub Dawidek wrote: If we're comparing who has bigger... :) beast:root:~# zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT tank732G604G128G82% ONLINE - but: beast:root:~# zfs list | wc -l 1932 No panics. PS. I'm quite sure the ZFS version I've in perforce will fix most if not all 'kmem_map too small' panics. It's not yet committed, but I do want to MFC it into RELENG_7. At the risk of sounding repetitive, can you try a simple test on your ZFS pools, to see if you can panic the kernel? Do this: * install blogbench and bonnie++ from ports/benchmarks * run: blogbench -c 100 -d . -i 30 -r 50 -W 10 -w 10 bonnie++ -d . -s 16G -n 80 in parallel, until completion or crash. It shouldn't take too long to complete the above benchmarks, so you probably won't invest too much time in it even if it doesn't crash. Both completed successfully (i386, 1GB of RAM, dual core CPU). Can you now go and revert all the FUD you spread? You probably need to invest much more time than that. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgprEWLvjmJUi.pgp Description: PGP signature
Re: Is there any way to increase the KVM?
On Thu, Jun 05, 2008 at 02:10:02PM +0100, Hugo Silva wrote: Pawel Jakub Dawidek wrote: PS. I'm quite sure the ZFS version I've in perforce will fix most if not all 'kmem_map too small' panics. It's not yet committed, but I do want to MFC it into RELENG_7. Any guesstimate as to when the MFC will happen ? Hard to tell, really. The number of changes is huge, so it's hard to predict how much I'd need to fix after commit to HEAD. Two months sounds possible. I'll provide patches for RELENG_7 probably earlier. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpfZZHd4pl5o.pgp Description: PGP signature
Re: AMD Geode LX crypto accelerator (glxsb)
On Fri, Jun 06, 2008 at 11:41:35PM +0200, Patrick Lamaizi?re wrote: Dears, I'm trying to port the glxsb driver from OpenBSD to FreeBSD 7-STABLE (via the NetBSD port). Cool. The glxsb driver supports the security block of the Geode LX series processors. The Geode LX is a member of the AMD Geode family of integrated x86 system chips. Driven by periodic checks for available data from the generator, glxsb supplies entropy to the random(4) driver for common usage. glxsb also supports acceleration of AES-128-CBC operations for crypto(4). I think that most of the work is done, except the random generator. Source in progress for 7-STABLE: http://user.lamaiziere.net/patrick/glxsb.c http://user.lamaiziere.net/patrick/glxsb.tar.gz (c+Makefile) Credits to OpenBSD and NetBSD, Thanks! Well, it seems to work but i've got few problems to test the module : - How check the encryption/decryption ? Openssl seems ok, i've got quite the same results as NetBSD on a Soekris net5501 box. But i must use -engine cryptodev, why ? This is ok, as you may not want to use it, right? $ openssl speed -evp aes-128-cbc -engine cryptodev -elapsed engine cryptodev set. ...CUT... type16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes aes-128-cbc 1151.08k 4134.25k 11936.49k 22504.83k 25576.36k When i test ssh -c aes128-cbc hostname, ssh does not use the crypto device. I receive a crypto_newsession() followed by a crypto_freesession(), i mean i don't receive any crypto_process(). Have you tried to put some debug to opencrypto? I believe openssh should use it automatically, at least this was the case some time ago, AFAIR. So how can I be sure that the datas are well encrypted ? Try comparing result of openssl encryption with and without '-engine cryptodev'. Remember to use -nosalt (and maybe -raw) prevent openssl from putting salt in front of the ciphertext. Also, I've got some questions to finish the driver: - between arc4rand() and read_random(), witch function shall i use ? arc4rand() is preferred. - Shall I lock the sessions ? The padlock driver uses a mutex to lock the sessions http://fxr.watson.org/fxr/source/crypto/via/padlock.c?v=FREEBSD7#L211 Is it usefull ? Drivers ubsec, safe and hifn don't lock the sessions at all. You should and they should as well. - during crypto_process() the driver uses s = splnet();. I'm not sure about this ? Drop this one. - The driver does a busy wait to check the completion of the encryption. I think it would be beter to use the interrupt. I will look later. I remember looking at that code sometime ago and that bit is really lame, so lame that I think they would do it in a different way if that was possible. Maybe it's worth contacting OpenBSD/NetBSD and ask? There might be a good reason for that. - Any comment is welcome, this is my first work on a driver. Looks good:) I can do a final review and commit once you are done and if I'll be able to start my Soekris and test it. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgppAoXvRI1QS.pgp Description: PGP signature
Re: Is there any way to increase the KVM?
On Thu, Jun 05, 2008 at 01:53:37AM +0800, Tz-Huan Huang wrote: On Thu, Jun 5, 2008 at 12:31 AM, Dag-Erling Sm??rgrav [EMAIL PROTECTED] wrote: Tz-Huan Huang [EMAIL PROTECTED] writes: The vfs.zfs.arc_max was set to 512M originally, the machine survived for 4 days and panicked this morning. Now the vfs.zfs.arc_max is set to 64M by Oliver's suggestion, let's see how long it will survive. :-) [EMAIL PROTECTED] ~% uname -a FreeBSD ds4.des.no 8.0-CURRENT FreeBSD 8.0-CURRENT #27: Sat Feb 23 01:24:32 CET 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ds4 amd64 [EMAIL PROTECTED] ~% sysctl -h vm.kmem_size_min vm.kmem_size_max vm.kmem_size vfs.zfs.arc_min vfs.zfs.arc_max vm.kmem_size_min: 1,073,741,824 vm.kmem_size_max: 1,073,741,824 vm.kmem_size: 1,073,741,824 vfs.zfs.arc_min: 67,108,864 vfs.zfs.arc_max: 536,870,912 [EMAIL PROTECTED] ~% zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT raid 1.45T435G 1.03T29% ONLINE - [EMAIL PROTECTED] ~% zfs list | wc -l 210 Haven't had a single panic in over six months. Thanks for your information, the major difference is that we runs on 7-stable and the size of our zfs pool is much bigger. I'm don't think the panics are related to pool size. More to the load and characteristics of your workload. [EMAIL PROTECTED] uname -a FreeBSD cml2.csie.ntu.edu.tw 7.0-STABLE FreeBSD 7.0-STABLE #40: Sat May 31 10:29:16 CST 2008 [EMAIL PROTECTED]:/usr/local/obj/usr/local/src/sys/CML2 amd64 [EMAIL PROTECTED] sysctl -h vm.kmem_size_min vm.kmem_size_max vm.kmem_size vfs.zfs.arc_min vfs.zfs.arc_max vm.kmem_size_min: 0 vm.kmem_size_max: 1,610,612,736 vm.kmem_size: 1,610,612,736 vfs.zfs.arc_min: 16,777,216 vfs.zfs.arc_max: 67,108,864 [EMAIL PROTECTED] zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT sun11.3T 9.03T 2.30T79% ONLINE - [EMAIL PROTECTED] zfs list | wc -l 295 If we're comparing who has bigger... :) beast:root:~# zpool list NAMESIZEUSED AVAILCAP HEALTH ALTROOT tank732G604G128G82% ONLINE - but: beast:root:~# zfs list | wc -l 1932 No panics. PS. I'm quite sure the ZFS version I've in perforce will fix most if not all 'kmem_map too small' panics. It's not yet committed, but I do want to MFC it into RELENG_7. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp9rbBT2lsbh.pgp Description: PGP signature
Re: Is there any way to increase the KVM?
On Sat, May 31, 2008 at 01:52:56PM +0800, Tz-Huan Huang wrote: Hi, Our nfs server is running 7-stable/amd64 with 8G ram, the size of zfs pool is 12T. We have set vm.kmem_size and vm.kmem_size_max to 1.5G, but the kernel still panics by kmem_map too small often. Could you also try to decrease vfs.zfs.arc_max? -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpJ8VT5v06Ug.pgp Description: PGP signature
Re: Security Flaw in Popular Disk Encryption Technologies
On Sat, Feb 23, 2008 at 02:08:54PM +1300, Atom Smasher wrote: article below. does anyone know how this affects eli/geli? from the geli man page: detach - Detach the given providers, which means remove the devfs entry and clear the keys from memory. does that mean that geli properly wipes keys from RAM when a laptop is turned off? Yes, geli tries to clear sensitive informations on detach (mostly keys). I use a script to suspend my laptop, which detach my encrypted partition before suspend. In perforce I've suspend/resume geli(8) subcommands that helps a bit here - on 'geli suspend' command the keys are cleared and all I/O requests are suspended until 'geli resume' provides proper keys. This way one doesn't have to unmount file systems to allow 'geli detach' to succeed. Of course even if keys are cleared there could still be important data in RAM (eg. file system's buffer cache). -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpikfNoxpT5s.pgp Description: PGP signature
Re: A TrustedBSD voluntary sandbox policy.
On Wed, Nov 07, 2007 at 10:20:28PM -0500, [EMAIL PROTECTED] wrote: I'm considering developing a policy/module for TrustedBSD loosely based on the systrace concept - A process loads a policy and then executes another program in a sandbox with fine grained control over what that program can do. I'm aiming for a much simpler implementation, however. No interaction. No privilege elevation (only restriction). No system call rewriting, only access control. The interface will look something like this: (cat EOF deny all allow file_open /etc/passwd allow file_open /dev/tty allow sock_connect 127.0.0.1 80 allow sock_connect 208.77.188.166 80 rlimit core 0 rlimit cpu 20 rlimit nofile 10 EOF ) | sandbox /bin/ls -alF /bin Please note that the 'policy' given on the command line is purely for the sake of example, no syntax or semantics have been decided upon. The implementation appears to be simple, as far as I'm aware. I'm sure there will be thorns and problems - that's what I'm here to find out. The 'sandbox' process compiles the policy text into a binary structure in userland, loads the binary structure into the kernel module via a system call implemented with mac_syscall(), sets various rlimits and then runs /bin/ls with execve(). When the process exits, the memory for the binary structure is freed. I would like, at this stage, to know if the above model is seriously incompatible with the way the MAC framework works, it's not entirely clear either way having read other policies such as mac_biba, mac_stub etc. For example - how to know when a process has exited? Policy for an executed process would be kept in a small hash table, indexed by process id. The policy will be enabled when the process sucessfully calls execve() for the first time and will be destroyed when the process exits. If we're not notified when a process has exited, we can't remove policy from the table. Also, what should be done when a process decides to fork() or execve()? It'd be rather unfortunate if the process could break out of the sandbox just by executing another process but blocking all attempts to fork() or execve() would make classes of programs unusable. First problem is that it is hard to operate on file paths. MAC passes a locked vnode to you and you cannot go from there to a file name easly. You could do it by comparsion: call VOP_GETATTR(9) on the given vnode, do the same for /etc/passwd and others and compare their inodes and file system ids. Performance hit may be significant for complex policies. You can register yourself for process_exit, process_fork and process_exec in-kernel events and do your cleanups from your event handler. Take a look at EVENTHANDLER(9). -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpnSAKoJorcw.pgp Description: PGP signature
Re: kern.ngroups (non) setting ... new bounty ?
On Tue, Sep 25, 2007 at 09:51:06AM -0700, rsync.net wrote: It has been impossible to change kern.ngroups - at least for several years now. It was not fixed in either 5.x or 6.x : http://lists.freebsd.org/pipermail/freebsd-bugs/2007-January/022140.html It is seemingly a difficult problem: http://www.atm.tut.fi/list-archive/freebsd-stable/msg09969.html [1] However it should be solved - we can't be the only ones out there trying to add a UID to more than 16 groups... - The rsync.net code bounties have been fairly successful this year - two of the five projects have been completed, and the large vmware 6 on FreeBSD project is now underway. We'd like to add a new bounty for this kern.ngroups issue. We are posting to -hackers today to get some feedback on how long this will take and how much money might reasonably be expected to lure this work. --rsync.net Support [1] Is it indeed true that these programs are broken by not following NGROUPS_MAX from syslimits.h? I don't see how they can be broken. They may not see more than 16 groups, but they shouldn't blow up. The only possibility of bad usage I see is something like this: gid_t gids[NGROUPS_MAX]; int gidsetlen; gidsetlen = getgroups(0, NULL); getgroups(gidsetlen, gids); But I guess the most common use is: gid_t gids[NGROUPS_MAX]; int gidsetlen; gidsetlen = getgroups(NGROUPS_MAX, gids); Binaries using the latter method should be just fine. BTW. The latter method is what all utilities from the base system use. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpjoAh2xp0gD.pgp Description: PGP signature
Re: Hierarchical jails - any current work?
On Wed, Sep 19, 2007 at 01:30:44PM -0600, James Gritton wrote: Pawel Jakub Dawidek wrote: Something like this: http://garage.freebsd.pl/mljail.README I did it some time ago, and this is one of the feature for new jail implementation with is beeing designed Yes, that's just the thing I'm talking about, so it looks like I have indeed be reinventing something. (The jail scheduling work of cdjones it something else I'm interested in, but for another time). Now the question becomes: how much jail work is out there, and what's the likelihood is it seeing the light of day in a released kernel? I hate to be going about coding stuff that's been done before (well, actually I enjoy coding it but you know...), but I only ever see snippets of jail work mentioned here and there and nothing ever seems to get anywhere official. I figured the place to talk about this was the freebsd-jail mailing list, but it seems to be mostly for stuff like getting app X to work in a jail or the current jail rc scripts have this or that deficiency. That's why I cross-mailed to freebsd-hackers - maybe more appropriate there? Where's the secret place people really go to communicate this kind of thing? I've done a lot of work in the general jail-like area, and while much of it it the same as others' I'd like to share what isn't. Of course, with other people's jail-related projects staying on the sidelines so long - and that by those with @freebsd.org stature - one wonders if there's a point. I don't mean to sound down on anything, just wondering what the state of the jail community is. Or where it is. We are not hidding anything, don't worry:) We just had developers summit in Denmark when we talked about future jail design. We also talked about this at the developers summit in Milan last year. Currently we have the big picture and quite a few details, I wouldn't call it finished project, because it's not, but we moved forward definiately. Once we polish the notes taken at devsummit we will publish them on a wiki page and give some time to the community to comment on that. If you want to work on jails I would hold on before the wiki page is ready, because I suspect there will be a lot of work to do. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpq6qmf7wlIN.pgp Description: PGP signature
Re: Hierarchical jails - any current work?
On Tue, Sep 18, 2007 at 03:03:12PM -0600, James Gritton wrote: I've been doing some work on a hierarchical jail setup, but I've got this nagging feeling it's been done before. Does anyone know of such an existing project? If not, I'll put forward my own code. Something like this: http://garage.freebsd.pl/mljail.README I did it some time ago, and this is one of the feature for new jail implementation with is beeing designed. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpMC8inVuBxL.pgp Description: PGP signature
Re: VFS locking questions
On Fri, Aug 03, 2007 at 09:29:33PM +0200, Ulf Lilleengen wrote: Hi, I have a couple of questions regarding VFS, since I'm trying to SMPify the fdescfs code in an effort to get some experience with VFS and freebsd locking... What is really LK_INTERLOCK? When should it be used? When should one acquire it (with VI_LOCK i assume), and what are the semantics? Vnode internal lock (v_interlock, VI_LOCK()) is used to protect various field in the vnode structure (those marked with 'i' letter in vnode.h). You pass the LK_INTERLOCK flag to functions like lockmgr(), vn_lock(), VOP_UNLOCK() when you already hold vnode's interlock. This way if one of those functions needs vnode's interlock internally, it knows if you already hold it or not (thus the function needs to acquire it on its own). We could probably just use mtx_owned() inside those functions. Let's say I have a function that should return a locked vnode. I lock the hash-table with a regular mutex. Then, when I traverse the list, I check if the entry is what I look for. If it is, I call VI_LOCK() on the vnode, use vget to increment refcount, and then use vn_lock(vp, LK_EXCLUSIVE...) to lock the vnode before the function returns. Is this correct behaviour? Instead of doing what you suggest: VI_LOCK(vp); vget(vp, LK_INTERLOCK, td); vn_lock(vp, LK_EXCLUSIVE, td); You can simply call: vget(vp, LK_EXCLUSIVE, td); This is why: - You haven't passed LK_INTERLOCK, so vget() will lock it by itself if needed (it does need it). - You passed LK_EXCLUSIVE, so vget() will return locked vnode. The LK_INTERLOCK bothers me a bit, because I'm not 100% sure on how it works. It probably mostly an optimization and probably protection before some races, so you can call various functions with vnode's interlock already held. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpfMgZKdrlaY.pgp Description: PGP signature
Re: ufs_rename: fvp == tvp (can't happen), but it did
On Sun, Jan 14, 2007 at 07:18:04PM +0100, Attila Nagy wrote: On 2007.01.12. 20:06, Pawel Jakub Dawidek wrote: Silent data corruptions happens, look for example at the problem with 4T volume under FreeBSD thread on [EMAIL PROTECTED] I'd suggest configuring geli with data authentication on top of the FC array. geli will detect silent data corruptions. Data corruption was the first thing, which came into my mind, I am currently trying to reproduce this on another machine. geli's data authentication is a good thing, but ZFS's ability to actually correct the errors (in this case, at least) is even more better. :) Is there a newer patch for ZFS than this: http://people.freebsd.org/~pjd/patches/zfs_20061117.patch.bz2 ? I as far as I can see, you've put a tremendous amount of work into it in perforce... There is no newer patch yet. It's quite time consuming to create such a patch, test it, etc. so I'm trying to avoid doing it:) -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpKlbcekRHHt.pgp Description: PGP signature
Re: ufs_rename: fvp == tvp (can't happen), but it did
On Fri, Jan 12, 2007 at 11:35:44AM +0100, Attila Nagy wrote: On 01/11/07 02:01, Kris Kennaway wrote: On Sun, Jan 07, 2007 at 12:44:39PM +0100, Attila Nagy wrote: On 2007.01.07. 1:11, [EMAIL PROTECTED] wrote: It sounds as if the caller of ufs_rename() is confused. You could try setting a breakpoint on the printf(), or change it to a panic() to get a dump, and try to figure out who the caller is and what is going on. Yes this would be very good, especially if this wouldn't be a production machine or if I could reproduce this on a test system. But neither of this are true. :( Maybe I will try it on a sleepless night, in the maintenance window, thanks for the idea. Try forcing a fsck, sometimes bizarre FS panics are due to filesystem corruption. I've already thought of that, but in that case the FC array must be bad, since going with only the locally attached disks in the mirror, the error doesn't appear... Silent data corruptions happens, look for example at the problem with 4T volume under FreeBSD thread on [EMAIL PROTECTED] I'd suggest configuring geli with data authentication on top of the FC array. geli will detect silent data corruptions. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpLSEWhISiCi.pgp Description: PGP signature
Re: iSCSI disconnects dilema
On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote: Hi, While I think I have almost solved the problem of network disconnects, It downed on me a major problem: When a 'local' disk crashes, the kernel will probably hang/panic/crash. if i don't try to recover, then there is no change in the above scenario. if i try to recover, then the client does not know that it should umount/fsck/mount. While all this seems familiar, removing a floppy/disk-on-key while it's mounted, we could always say you shouldn't have done that!, with a network connection, it can happen very often - rebooting the target, a network hickup, etc. So, any ideas? In my opinion it should be done this way: You have a queue of I/O requests. You send the to the other end and wait for confirmation. Until confirmation is received, you keep the requests queued. If the other end dies, you try to reconnect (until some timeout expires, the processes which send those requests will just wait), if you reconnect successfully, you resend not-confirmed requests, if you won't be able to reconnect, you just pass the errors up. This is what I did in ggate and it seems to work. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpmejRxKe25f.pgp Description: PGP signature
Re: Best practices for using gjournal with gmirror?
On Wed, Jan 10, 2007 at 11:21:01PM -0500, John Nielsen wrote: I have a few questions for pjd (or anyone else) about using gjournal, particularly when used with gmirror. 1) I'm running 6-STABLE and plan to test with gjournal6_20061030.patch (from the mailing list; updated version of 20061024 that applies cleanly). Is there a better/newer version for -STABLE that I should use instead? There probably should be a newer version as there were some minor changes after I committed the code to HEAD. I'll try to create a new patch during the weekend. 2) When using gjournal and for a gmirror volume, does the journal need to be mirrored as well to maintain redundancy? If so, when storing the journal on the same physical disks as the mirror, is it better to mirror at the slice level (journal and fs on different partitions in the same mirror) or at the partition level (journal and fs each have their own mirror) or does it matter? The problem with mirroring each partition/slice separately is that when you have a crash, on boot, gmirror will start to rebuild all partitions at once, which may be problematic. On the other hand, when you mirror each partition/slice separately, and some partitions weren't modified in last few seconds before the crash, gmirror will not resync them on boot, so not entire disk will be synchronized. When you run gjournal on top of gmirror/graid3 there is no need for resync after a crash, so bascially all cons against mirroring the whole disks and against mirroring partitions are no longer true. Both configurations will work the same. In that case I'd suggest mirroring the whole disks, because when one of your disks dies, you may just replace it and be down with it. If you mirror partitions separately, you first have to create partitions and insert each of them into their mirrors, which is more complex than simple 'gmirror insert foo newdisk'. 3) I remember reading where pjd said that gjournal plus gmirror or graid3 would eliminate the need to re-sync the array after a crash. While clearly a design goal, is that actually the case with the version of the patch mentioned above? If so, are any config changes needed or will it just happen automagically? No, you need to: # gmirror configure -F mirror_name 4) In the same vein as 3)--does a gjournal volume need to be fsck'ed after a crash? If not, will it just work (e.g. fsck -p sees that the filesystem is clean) or does it need to be disabled somehow? Gjournaled file system has to be fscked, but only to handle orphaned files. Such fsck on multiterabyte provider takes seconds, not hours. 5) Finally, how dangerous is this code? I realize it's experimental and only plan to use it with data that has recent backups, but how much should I worry about it blowing up my system or corrupting my files? I'm using it in production, my customer using it in production on large number of FreeBSD servers and I also have heard already many success stories, BUT I still consider the code to be experimental. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpGYgK8t204N.pgp Description: PGP signature
Re: FW: FreeBSD: driver for ssl hardware accelerator board based on broadcom bcm5825, bcm5862 chips
On Sun, Dec 17, 2006 at 07:38:30AM +0200, Alex Aronson wrote: -Original Message- From: Alex Aronson [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 13, 2006 10:48 AM To: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]' Subject: FreeBSD: driver for ssl hardware accelerator board based on broadcom bcm5825, bcm5862 chips Hello, I am working on FreeBSD driver for bcm5825 (5862) based board. Would you please help me. First of all I tried to work with bcm5820 based board. I installed FreeBSD 6.1 and load ubsec module: kldload ubsec In dmesg I saw that module recognized board (ubsec0: Broadcom 5820), crypto module was also loaded. After that I run openssl test (openssl version 0.9.7e-p1 25 Oct 2004) openssl speed rsa1024 -engine ubsec can't use that engine 830:error:2507006C:DSO support routines:DSO_load:functionality not supported:/usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dso/d so_lib.c:239: 830:error:84069067:ubsec engine:UBSEC_INIT:dso failure:/usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/engine/ hw_ubsec.c:390: 830:error:260B806D:engine routines:ENGINE_TABLE_REGISTER:init failed:/usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/engine/e ng_table.c:182: What am I missing? There is no libubsec.so in the system. Any help will be appreciated. '-engine ubsec' will try to use userland driver. If you loaded ubsec.ko and cryptodev.ko, you should use '-engine cryptodev'. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpGOj1T0uvk2.pgp Description: PGP signature
Re: SEEK_HOLE and SEEK_DATA for sparse files any takers?
On Wed, Nov 08, 2006 at 02:38:36AM +0100, Pedro F Giffuni wrote: Hi; From http://blogs.sun.com/bonwick/date/200512 At this writing, SEEK_HOLE and SEEK_DATA are Solaris-specific. I encourage (implore? beg?) other operating systems to adopt these lseek(2) extensions verbatim (100% tax-free) so that sparse file navigation becomes a ubiquitous feature that every backup and archiving program can rely on. It's long overdue. It should be mentioned that linux adopted them and they would help the ZFS port. I've some starting code for this and I'm planning to implement them, at least for ZFS. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpu3L8mZGSH6.pgp Description: PGP signature
Re: Yet another magic symlinks implementation
On Sat, Nov 04, 2006 at 11:56:29AM +0300, Andrey V. Elsukov wrote: Hi, All! I've ported NetBSD magic symlinks implementation to FreeBSD. The description of magiclinks can been found here: http://www.daemon-systems.org/man/symlink.7.html Patch here: http://butcher.heavennet.ru/patches/kernel/magiclinks/ From what I know NetBSD removed mount flag and switched to global sysctl to enable/disable this feature. Would be good to know why and eventually do the same. I like the idea and I probably can work on getting it to the tree. Creating perforce account for you would be a good start. Would you like to work there? -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpd5wcOLlCS9.pgp Description: PGP signature
Re: fsync: giving up on dirty
On Fri, Aug 25, 2006 at 09:22:26PM -0500, Eric Anderson wrote: I got this error today, while some very heavy disk access was occurring: Aug 25 13:47:07 snapshot1 kernel: fsync: giving up on dirty Aug 25 13:47:07 snapshot1 kernel: 0xff01bbb99a20: tag devfs, type VCHR Aug 25 13:47:07 snapshot1 kernel: usecount 1, writecount 0, refcount 445 mountedhere 0xff023ee20800 Aug 25 13:47:07 snapshot1 kernel: flags () Aug 25 13:47:07 snapshot1 kernel: v_object 0xff01c34afb60 ref 0 pages 16386 Aug 25 13:47:07 snapshot1 kernel: lock type devfs: EXCL (count 1) by thread 0xff023f11d980 (pid 46)#0 0x803eeaa6 at lockmgr+0x5f6 Aug 25 13:47:07 snapshot1 kernel: #1 0x8065e8d1 at VOP_LOCK_APV+0x81 Aug 25 13:47:07 snapshot1 kernel: #2 0x8047015b at vn_lock+0x6b Aug 25 13:47:07 snapshot1 kernel: #3 0x805719be at ffs_sync+0x1fe Aug 25 13:47:07 snapshot1 kernel: #4 0x80472045 at vfs_write_suspend+0x95 Aug 25 13:47:07 snapshot1 kernel: #5 0x80b794a5 at g_journal_switcher+0xa55 Aug 25 13:47:07 snapshot1 kernel: #6 0x803e3cdb at fork_exit+0xbb Aug 25 13:47:07 snapshot1 kernel: #7 0x805f39ce at fork_trampoline+0xe Aug 25 13:47:07 snapshot1 kernel: Aug 25 13:47:07 snapshot1 kernel: dev label/vol11-data.journal Aug 25 13:47:07 snapshot1 kernel: GEOM_JOURNAL: Cannot suspend file system /vol11 (error=35). I'm aware of this, but it is harmless. On journal switch gjournal cannot synchronize the file system, so it will try again later. It should be probably better logged (as a warning). -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpjKHQzQnVJy.pgp Description: PGP signature
Re: 6-STABLE snapshot (background fsck) lock-up
On Sat, Aug 26, 2006 at 07:23:36AM -0500, Eric Anderson wrote: Hmm - had another panic. Again, screen shots are here: http://www.googlebit.com/freebsd/snapshots/gjournal_panic2/ I can't find panic message. What was it? -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpbQZNmAZmgd.pgp Description: PGP signature
Re: 6-STABLE snapshot (background fsck) lock-up
On Sat, Aug 26, 2006 at 08:19:40PM -0500, Eric Anderson wrote: On 08/26/06 07:44, Pawel Jakub Dawidek wrote: On Sat, Aug 26, 2006 at 07:23:36AM -0500, Eric Anderson wrote: Hmm - had another panic. Again, screen shots are here: http://www.googlebit.com/freebsd/snapshots/gjournal_panic2/ I can't find panic message. What was it? It was a deadlock. This looks like VM related problem - g_event thread is waiting for free pages, but it never get them. Are you able to connect serial console to this machine and provide also output from 'alltrace' if it happens again? -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpKdzGuvhbg9.pgp Description: PGP signature
Re: 6-STABLE snapshot (background fsck) lock-up
On Tue, Aug 22, 2006 at 03:38:15PM -0500, Eric Anderson wrote: Did you get a chance to look at those screenshots? I'm curious to know if you also think it is gjournal related. I've stopped loading gjournal, and I've had no other related deadlocks. This patch was not yet merged to RELENG_6, can you try it? http://people.freebsd.org/~pjd/patches/vfs_subr.c.3.patch -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpiRQF0kNjEn.pgp Description: PGP signature
Re: 6-STABLE snapshot (background fsck) lock-up
On Tue, Aug 22, 2006 at 03:38:15PM -0500, Eric Anderson wrote: Did you get a chance to look at those screenshots? I'm curious to know if you also think it is gjournal related. I've stopped loading gjournal, and I've had no other related deadlocks. I'm out of town tomorrow, I'll try to take a look when I'm back. We saw snapshot/gjournal related deadlocks, but all were fixed, maybe there is a fix which wasn't comitted. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpiXHCQ9oJQU.pgp Description: PGP signature
Re: ENOMEM @ RELENG_6 graid3
On Mon, Jun 26, 2006 at 07:34:51PM +0400, Dmitry Morozovsky wrote: Dear colleagues, turning on bootverbose reveals additional info to ad10: FAILURE - out of memory in start under load this machine (5 ata disks, most of their space allocated for 2 graid3's) many messages like ENOMEM 0xc6e834a4 on 0xc493c080(ad8) ENOMEM 0xc703fdec on 0xc4960480(ad10) ENOMEM 0xc6b49528 on 0xc4901400(ad0) ENOMEM 0xc6c378c4 on 0xc493ca80(ad4g) ENOMEM 0xc662b210 on 0xc4900b00(ad0f) ENOMEM 0xc6b33630 on 0xc493c380(ad4) ENOMEM 0xc7320d68 on 0xc4901400(ad0) ENOMEM 0xc6bd6948 on 0xc493c380(ad4) ENOMEM 0xc7299dec on 0xc493c200(ad6) ENOMEM 0xc6d91528 on 0xc495f700(ad6g) ENOMEM 0xc47b07bc on 0xc4960480(ad10) ENOMEM 0xc7c22bdc on 0xc493c080(ad8) Machine is rather stable; however, it panics two or three times on /ftp: bad dir ino 3454117 at offset 444: mangled entry panic: ufs_dirbad: bad dir Any hints to debug? I hope ENOMEM errors are not related to your panic, because on ENOMEM GEOM should repeat the request a bit later. Will be good to know if you have simlar panics without graid3. For example on a plain disk, but with 2kB sector size (you can do it with gnop(8)). You can also try gstripe(8) your disks with small stripesize, eg. 512 bytes and use gnop(8) on top of it to change sector size, so all disks will be used, in case there is a problem with your controller. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpRAmqgGDrBu.pgp Description: PGP signature
Re: freebsd 5.3, gmirror raid 1, PROBLEM
On Mon, May 29, 2006 at 05:44:04PM -0400, sara lidgey wrote: + Hi All, + + I've been running a server using FreeBSD 5.3 and gmirror to mirror two identical IDE hard drives. Its been running great for over a year. But recently everything went down and when I reboot and put a monitor on it I get the following errors on screen: + + GEOM_MIRROR: Device gm0: provider ad1 disconnected + GEOM_MIRROR: Device gm0: provider mirror/gm0 destroyed + GEOM_MIRROR: Device gm0: rebuilding provider ad0 stopped + + Fatal trap 12: page fault while in kernel mode... (this is followed by details about the fault) + + These errors are preceded by other related error information that flys by on the screen and I have no way of seeing them again. + + Does anyone now what steps I should take to figure what is going on and try to recover data or get the machine to boot? Can you provide more info? There should be more interesting informations before those you pasted. There was a lot of fixes to gmirror in 6.1, so you may consider an upgrade. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpobLE2XMUuz.pgp Description: PGP signature
Re: bus_dmamap_load_uio and uiomove
On Tue, May 30, 2006 at 02:49:22PM +0200, Jimmy Olgeni wrote: + + Hello, + + Just quick busdma question... + + I'm currently upgrading a custom device driver to use bus_dmamap_load_uio rather than uiomove. Everything works fine, but calls to write fail unless I set uio-uio_resid + to 0 by hand (as I'm not using uiomove anymore). + + Am I supposed to set uio_resid by hand when using bus_dmamap_load_uio, or is there a better way to signal that all the data in uio was used? From what I see, bus_dmamap_load_uio() is using uio_resid as the number of bytes to proceed, so it has to be set before the call. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpmp8Q0Urm4t.pgp Description: PGP signature
Re: Fingerprint Authentication
On Fri, May 05, 2006 at 03:58:06PM +0200, Fredrik Lindberg wrote: + Alin-Adrian Anton wrote: + Fredrik Lindberg wrote: + + But that would sort of defeat the whole purpose of biometric authentication and you could really just use public keys instead + which would be a lot faster and easier than scanning your finger + at each login. :) + + Unless you locally encrypt your private key with information gathered by the fingerprint reader, as a password. + + That's exactly the problem with, at least, UPEKs driver. If you scan + one of your fingers twice you'll get two different BioAPI records. + That's different as in two binary data blobs which aren't equal. + To match these records with each other, you hand them over to the + driver which, as far as I know, hand them over to the hardware + which in turn performs some black magic and then tell you if + the records match or not. That's right, but the idea with asymmetric crypto is very accurate. Such fingerprint reader should have a secure chip with your private key and on authentication, you should provide data from your finger scan and data to sign - on match, it should return signed data, which you can use to continue authentication process. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpwxmt56juB3.pgp Description: PGP signature
Re: Using open system call in KLD
On Mon, Mar 06, 2006 at 02:10:10AM -0800, Anupam Deshpande wrote: + I successfully created a file using kern_open(). + Now I want to 'write to' or 'read from' the file.What functions should + I use for that purpose? This is not so trivial as it is in userland (but you already know that:)). Here are functions I created for one of my projects: http://people.freebsd.org/~pjd/misc/kernio/subr_kernio.c http://people.freebsd.org/~pjd/misc/kernio/kernio.h There are only open/close/write functions - no read function as I didn't needed it, so you must create one for your own. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpkLPh9TonWL.pgp Description: PGP signature
Re: File creation using KLD
On Sun, Feb 05, 2006 at 11:42:04PM +0530, Pranav Sawargaonkar wrote: + Hi + I want to create a file on disk using KLD and then tryout some reading and + writing stuff on that file,so can any one suggest me any solution i.e. + functions to use and locks which i need to carry out this. This is a bit tricky, ie. there is no clean API for this, but it is of course possible. There are few frameworks in the kernel that do exactly this. One of them is alq(9), so take a look at sys/kern/kern_alq.c. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpH2IDlMMU2n.pgp Description: PGP signature
More user developers friendly memguard.
Here is the patch: http://people.freebsd.org/~pjd/patches/kern_malloc.c.3.patch It allows to configure memory type to debug without recompilling the kernel. It also allows to debug kernel modules with memguard. The rules: 1. If memory type is compiled into the kernel vm.memguard_desc should be configured in /boot/loader.conf. 2. If memory type is in kernel module, vm.memguard_desc sysctl should be configured before loading the module. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp2yAA7gXdRK.pgp Description: PGP signature
Re: accessing NetBSD filesystem
On Sun, Dec 18, 2005 at 04:16:16PM +0100, [EMAIL PROTECTED] wrote: + On Sun, Dec 18, 2005 at 01:54:18AM +0100, Gilbert Fernandes wrote: + + The FreeBSD UFS is the FFS accessed through the VFS layer, but basically + the format is the same. If you want to have access, from FreeBSD, to + NetBSD partitions, make sure the NetBSD partitions have been formated + using FFSv2 which is the port of UFS to NetBSD. There are some + differences though : no ACL support nor snapshots available there. + + FFS v1 and v2 are both working. I'm using that everyday. The one part + which needs attention is soft updates: FreeBSD / DragonFly have it as + permanent flag, NetBSD as mount option. Interesting. In FreeBSD fsck(8) works differently for SU-enabled FS, so having SU as a mount option won't be possible (if we want to protect our users from a foot-shooting). And because of the way SU works, it is possible to run background fsck, as the only problems are unreferenced objects (inodes, blocks, etc.). -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpBMllV0FSGm.pgp Description: PGP signature
Re: unable to build geom_gate
On Fri, Dec 16, 2005 at 05:27:11PM +0700, Vitaliy Ovsyannikov wrote: + Hello, freebsd-hackers. + + Please, look at the output and help if you can: + + # tar -yxf geom_gate.tbz + # cd geom_gate + # make [...] Why don't you just use ggate from the base system? -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp0Bf63j8rE3.pgp Description: PGP signature
Re: SSH From within a Jail
On Sun, Nov 13, 2005 at 09:26:05PM +0100, Koen Martens wrote: + Just remembered something else: do you jexec into the jail, or do + you do a proper logon (eg. ssh into the jail). I think that if you + jexec into the jail and then try to ssh, you might have a problem + because you aren't really logged in to the jail and thus have no + (psuedo) tty associated with your session.. I just saw this thread. Yes, you are right, I can confirm this. To be able to ssh to another server from within a jail, you need to log in to the jail properly (have access to your terminal), so jexec won't work here. Try to ssh into the jail and then ssh to another box. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpRTiVDkIhMn.pgp Description: PGP signature
Re: GEOM for multipath? How?
On Thu, Nov 10, 2005 at 02:48:15PM +0100, Poul-Henning Kamp wrote: + In message [EMAIL PROTECTED], Sergey + Babkin writes: + From: Danny Howard [EMAIL PROTECTED] + + Hey ... yes, I recall there being issues with the QLogic drivers ... I + wonder if anyone has given the mpt drivers a shot? I was able to speak + with an engineer at Engenio (now owned by LSI) and she said there were + some issues with the QLogic dual-port cards that were interesting to + her, but the LSI dual-port cards behaved differently ... + + QLogic worked fine in multi-path configuration with UnixWare. + I think LSI and Adaptec did too. The only trick is to make sure + that the IRQs of the cards are not shared between the cards or with + any other device. + + I suspect it is not the card as much as the driver, but I am not sure. I was able to modify the driver in a way multipathing started to work (no more hanging request when path was disconnected). It was hackish, but worked, so I'm quite sure it's driver's fault. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpPdJmurRePN.pgp Description: PGP signature
Re: [PATCH] IPv6 support for ggate
On Thu, Oct 27, 2005 at 11:04:50PM -0500, Craig Boston wrote: + Hi hackers: + + Today I had a need to run ggate over an IPv6-only network. I was a + little surprised that it didn't seem to like that, but not discouraged. + So here's a patch that adds IPv6 support for ggated(8) and ggatec(8) + ;) Thanks a lot! Unfortunately I don't have time to setup test environment (I don't use ipv6 at all) and it can take a while before I'll be ready for committing this (if noone else beat me on this). I'll be grateful if you could file PR and send me its number. Thanks! -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp1u3R0vdcRB.pgp Description: PGP signature
Re: Kernel Source Divergence, Security (was: booting gbde-encrypted filesystem)
On Sun, Jul 31, 2005 at 04:07:27PM +0200, Poul-Henning Kamp wrote: + In message [EMAIL PROTECTED], Allan Fields writes: + + Yes, this is all very nice, but when is someone actually going to + commit it? ;) + + I'm (as always) short of time, and GBDE is not the top priority + for me for the time being. + + So I am more than happy to see people band together and improve + gbde. + + The main work necessary is to polish the userland program and that + is relatively trivial programming, so anyone should be able to pick + that up: just go for it. + + Giving gbde a taste function so that the root filesystem can be + protected by GBDE, this is also OK by me in principle, but I'd like + to review the patch before it gets committed because there are a + large number of dragons. + + In P4:phk_gbde there is the beginning of hw-crypto support through + opencrypto(9), if somebody wants to work on that, get in touch with + me. I'm starting to wonder if we couldn't create one storage-crypto-base and rewrite gbde, geli on top of it. geli(8) is complete, ie. you can use any command on attached and detached providers, you can backup your metadata, protect your passphrase with PKCS#5v2, use files as a key part, etc. gbde(8) (userland tool) is not finished (all those things I've in geli already are on its todo list). I've plan for another crypto-storage class, which will provide privacy and integrity verification (the very thing we are missing now). I want another class, because it will be slower than geli in both crypto-time and disk-access-time aspects. Another possibility is to integrate two classes and allow user to decide if he wants privacy, integrity verification or both. If someone can spend time on integreting gbde crypto scheme into geli where userland part is complete, where crypto(9) is used already, etc. that'd be cool. The truth is, that the main difference between gbde/geli is how crypto is used on disk, the other elements (managing keys, protecting passphrases, metadata backups, encrypted root partition, etc.) are or could be the same. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp7lyHzfctn3.pgp Description: PGP signature
Re: booting gbde-encrypted filesystem
On Fri, Jul 29, 2005 at 01:18:10PM +0800, Ronnel P. Maglasang wrote: + Hello, + + I think there was already a thread on this. I just + want to raise the question again if anyone has successfully + booted an gdbe-encrypted filesystem (everything encrypted except + the bootloader). The passphrase is entered at the bootloader prompt + or embedded in the bootloader. This is not not possible with current GBDE. I've patches which allows this here: http://people.freebsd.org/~pjd/patches/gbde.patch -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgprEhgdp0jjZ.pgp Description: PGP signature
Re: booting gbde-encrypted filesystem
On Fri, Jul 29, 2005 at 09:56:18AM +0200, Jeremie Le Hen wrote: + This is not not possible with current GBDE. + I've patches which allows this here: + + http://people.freebsd.org/~pjd/patches/gbde.patch + + This is great. Do you intend to commit it someday ? I know the GELI + framework allows to use an encrypted root partition, but it would be + interesting for GBDE users to be provided such a fonctionnality. I sent those patches to phk@ few months ago now. If he decided to add such functionality he is welcome to use them:) I'm not going to commit it by myself. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpBzeFSPfvLU.pgp Description: PGP signature
Re: Google SoC idea
On Mon, Jun 06, 2005 at 06:11:23PM +0200, Ivan Voras wrote: + I have an idea that I could implement through Google's Summer of Code + project, but as I have little experience with stuff it involves (kernel + programming / disks / filesystem optimization), so I expect any answer + from It won't work or It's useless to It can't be done. :) + + The idea is this: to implement sort of GEOM-layer disk data journaling + system. I imagine it to be a GEOM class using two lower-level devices: + one for data and one for the journal (this way, the journal device can + be on a fast and small disk). Such journaled device could be used to + host any filesystem, probably mounted with synchronoues-access, and it + will result in faster write access by keeping the writes sequential in + the journal device. Journal information will be commited to the data + disk periodically by a separate log-writer thread, or when it gets full. + The data disk will be consistent so it can be used without it's + journal part (after a clean disconnect/rebuild) if needed. At the + worst case, I think this will help performance in cases when there's a + burst of write activity followed by a period of IO idleness. + + I've made the above idea more-or-less from my head in one afternoon, so + it's perfectly possible that I'm missing some vital point or that it's + complete nonsense :) + + Does it make sense to do it this way? Is it worth applying for the SoC? Not sure. Basically this is simlar what softupdate does, I think. From another point of view softupdates are only available for UFS. You probably wants to hear scottl and phk opinions (CCed). -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpZfsCV0hMzR.pgp Description: PGP signature
Re: ggate failures.
On Sat, Apr 09, 2005 at 05:19:43PM -0400, David Gilbert wrote: + I have two systems, each with 4 300 gig SATA disks. Let's call them + m0 and m1. M1 exports it's disks with ggated ... on two private GigE + networks. M0, on those same two GigE networks, imports them with + ggatec. M0, then does the following: + + MirrorDisks + === + s0ggate0 da0s1g + s1ggate1 da1s1g + s2ggate2 da2s1g + s3ggate3 da3s1g + + And then: + + concat Disks + == = + v0 s0 s1 s2 s3 + + (so v0 is a concatination of 4 mirrors that consist of a local and + remote disk, each) + + Now... This all works, and we create a filesystem on v0. The problem + arises that whenever a lot of activity occurs on v0 (untaring a copy + of /usr is sufficient), the ggate links break down. An example + message from the dmesg: + + GEOM_MIRROR: Request failed (error=5). ggate2[WRITE(offset=25989184, length=8192)] + + Now... I don't know a lot about ggate, but this appears trivial to + trigger. Has anyone tried similar configurations and is there any + wisdom about ggate configurations? Set kern.geom.gate.debug to 1 and send output which is generated on failures. I've much improved ggate in perforce, but it needs some polishing still... -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpSqGJL3Gspx.pgp Description: PGP signature
Re: JKH Task: Stack saving/tracing functionality.
On Mon, Apr 11, 2005 at 09:08:33AM -0400, Jeff Roberson wrote: + I have proprietary code from a previous employer of mine that implements + some really useful debugging features. I'm looking for someone who is + interested in cleaning it up, making it architecture indepenent, and + getting it running on current. The code basically allows you to save and + manipulate stack information. + + This would be very useful for things like lockmgr, which right now we + can't really pass file:line information down to without making #ifdef mess + of all of the APIs as options DEBUG_LOCKs does somewhat today. Lockmgr + would have a buffer which contained the last N EIPs up the callstack, and + this information could be queried and printed using a simple api. + + Interested parties please email me. We can discuss this and I can provide + source. It would be probably useful for wintess, so when first order is stored, it can be stored with stack and on LOR, both backtraces can be shown. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpIIuh0a028w.pgp Description: PGP signature
Re: Idea about skeleton jail
On Mon, Jan 31, 2005 at 11:13:04PM -0800, Justin Hopper wrote: + We are considering open sourcing all of our stuff, to contribute back + what we can to the OS that allowed us to build our entire company. I'd + really like to see what others have done to make jails more manageable, + as it seems like there is so much that can be done but not many people + are working on it. It seems jails have the potential to become an + incredible way to virtually partition servers, and it would not be that + hard to implement solid tools for managing them. We have things like + JID-aware top and tools for automated jail builds, but it would be great + to work with some FreeBSD heavies to finish up clean development of + things like jail resource restrictions (CPU,MEM,#PROCS,etc) and perhaps + a clean and universally useful way to easily configure and launch full + jail environments. Yes, it would be useful (I mean CPU/MEM/#PROCS limits), but as I understand there are two kinds of opinions about jails. First is that it should be extended and allow to create a real virtual server and second is that it should be light-weight. + Pawel had some really interesting ideas for jails, but it seems that + he's too busy to work on them at the moment. Speaking of which, his + multiple IPs patch for 5.3 is still broken, and I haven't been able to + find what the problem is =( Could you describe the brokeness? I've made some fixes a week or something ago, I just created a patch against HEAD if you want to try it: http://people.freebsd.org/~pjd/patches/jail_2005020101.patch There can still be some remaining issues, but I don't have time for more detailed tests. The thing that can be useful IMHO is possibility to use reboot(8)/shutdown(8), etc. inside a jail, but... I'm unfortunately too busy with other (probably less interesting, but profitable) projects. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpiLz0M1Jpi6.pgp Description: PGP signature
Re: Idea about skeleton jail
On Wed, Feb 02, 2005 at 12:52:17AM +0800, Xin LI wrote: + ??? 2005-02-01?? 11:40 +0100???Pawel Jakub Dawidek? + The thing that can be useful IMHO is possibility to use + reboot(8)/shutdown(8), etc. inside a jail, but... + I'm unfortunately too busy with other (probably less interesting, but + profitable) projects. + + Quick question: Is this mean we can have init(8) running in jail? Yes, I started a branch for this work (pjd_jailinit), but... -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp2fMX5uvVRx.pgp Description: PGP signature
Re: Idea about skeleton jail
On Tue, Feb 01, 2005 at 01:31:11PM -0800, Justin Hopper wrote: + I've made some fixes a week or something + ago, I just created a patch against HEAD if you want to try it: + + http://people.freebsd.org/~pjd/patches/jail_2005020101.patch + + There can still be some remaining issues, but I don't have time for more + detailed tests. + + Excellent, I'll try the patch here in a couple of minutes. Can you tell + me what the known issues are with the patch? Perhaps I can lend a hand + on helping to resolve them. Frankly, I don't know. It just needs detailed testing. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp1aIJTzcoLj.pgp Description: PGP signature
Re: 5.3-STABLE: handle_workitem_freefile panic
On Sun, Jan 23, 2005 at 05:38:55PM +0300, Dmitry Morozovsky wrote: + On Sun, 23 Jan 2005, Dmitry Morozovsky wrote: + + DM I'm building debig kernel now and enabling kernel dumps, just to be sure, but + DM it seems some sporadic file system inconsistencies... + + Hmmm... it reveals geom_mirror I use (over two SATA drives) both does not + support dumping, and hides underlying ad*[a-h] partitions. Should I gmirror + distinct partitions instead of whole ad4 and ad6? You cannot dump to GEOM providers, especially from rank 1 geoms. It is better to create more mirrors for every partition you want to mirror (or slice) instead of whole disk. One of the benefits is that gmirror marks mirror as clean if there were no WRITE requests in few seconds, so even after power failure resynchonization is not needed. When you've many smaller mirrors, after unclean shutdown you probably don't need rebuild all mirrors. The argument against could be that when you synchronize many mirrors on the same disks in parallel, your disks are less happy (in one big mirror scenario, disk's heads don't have to jump from one place to another so often). -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpFHrvUC4R0m.pgp Description: PGP signature
Re: 5.3-STABLE: handle_workitem_freefile panic
On Sun, Jan 23, 2005 at 07:24:26PM +0300, Dmitry Morozovsky wrote: + However: how can the be achieved the following goal: have mirrored swap (to + keep redundancy and HA) and a place to dump panic images to, modulo having + scratch disk and/or scratch unused partition? When you have dedicated mirror only for swap (e.g. mirror on ad0s1b and ad2s1b) you probably should be able to dump into ad[02]s1b (but I didn't test it). -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgphQS8RFv4X6.pgp Description: PGP signature
Re: geom mirror and gbde
On Fri, Jan 21, 2005 at 09:56:58AM +0100, Attila Nagy wrote: + Hello, + + I would like to use gbde on a geom mirror, but /etc/rc.d/gbde fails if + there is a slash in the device name. + + I don't know what would be the clean solution, I used the attached diff + to solve the problem. + + Please review it and if there is a better solution, commit it. Acha! I fixed gbde(8) to accept devices with / in them, but forgot about rc.d/gbde. + @@ -81,16 +81,17 @@ + for device in $gbde_devices; do + parent=${device%.bde} + parent=${parent#/dev/} + -eval lock=\${gbde_lock_${parent}-\${gbde_lockdir}/${parent}.lock\} + +parent_=`echo ${parent} | sed s/\//_/g` + +eval lock=\${gbde_lock_${parent_}-\${gbde_lockdir}/${parent_}.lock\} + if [ -e /dev/${parent} -a ! -e /dev/${parent}.bde ]; then + echo Configuring Disk Encryption for ${parent}. Only this part is needed. Committed to HEAD, MFC after 1 week. Thanks! -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpyfgPaUZaUx.pgp Description: PGP signature
Re: Freeze when using atapicam
On Wed, Jan 05, 2005 at 09:26:00PM +0100, Olivier Certner wrote: + Hello all, + + Would someone have the time to look at my previous post dated January 4th, + 15:43 GMT on the freebsd-questions mailing list? (I haven't read you post on questions@, but...) I've a hang on boot when I use atapicam with my DVD-RW. With CD-ROM everything is ok. I'm able to boot and work without any problems on my DVD-RW only with atapi DMA turned off in /boot/loader.conf: hw.ata.atapi_dma=0 -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpioiOgvunDB.pgp Description: PGP signature
Re: Odd geom behaviour.
On Mon, Dec 20, 2004 at 10:51:11PM -0500, David Gilbert wrote: Content-Description: message body text + I have a set of 12 disks. 2x9G and 10x4.5G. I have a setup whereby I + run a gmirror on each pair of disks and then a gconcat on the + mirrors. Attached is a copy of the gmirror and gconcat lists. + + Now... I shutdown -r this machine (which happens to be an alpha) and + it shuts down happily. However, _every_ time it reboots, it wishes to + rebuild the mirrors: + + GEOM_MIRROR: Device m1 created (id=4055141955). + GEOM_MIRROR: Device m1: provider da1 detected. + GEOM_MIRROR: Device m1: provider da2 detected. + GEOM_MIRROR: Device m1: provider da2 activated. + GEOM_MIRROR: Device m1: provider mirror/m1 launched. + GEOM_MIRROR: Device m1: rebuilding provider da1. + + (x5 more for the other mirrors). Now this isn't particularly bad, I + suppose, except that the machine is occupied for some number of + minutes after boot with this activity. No fsck ... the filesystem is + happy. + + The machine is available for testing should someone want to look at + it. In fact, the machine is part of my retrocluster of hardware + running FreeBSD and NetBSD (if someone needs hardware with serial + consoles to debug, this is the purpose of the retrocluster). + + Anyways... ideas? + + (note that in these files, the mirrors are still rebuilding) What system version are you using? If this is 5.3 you should place: swapoff=YES to your /etc/rc.conf and use shutdown(8) command to reboot/turn off your machine. This is fixed in HEAD in much more clean way already. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp3zciUToQjB.pgp Description: PGP signature
Re: rc.shutdown and jails
On Sat, Dec 11, 2004 at 12:44:12AM -0800, Julian Elischer wrote: + I think we should introduce an init process for jails.. + + It would be responsible for all that the normal init is responsible for + except for being the default parent.. (some might argue for that too). + Sending it a particular signal would notify it to + send shutdown signals to all its compatriots in the jail etc. I started to work on this in perforce: pjd_jailinit. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpaK1nzuq661.pgp Description: PGP signature
Re: Hand on gmirror (Was: Re: gmirror bugs, how many?)
On Wed, Dec 08, 2004 at 02:10:02AM +0200, Alexandr Kovalenko wrote: + Hello, Pawel Jakub Dawidek! + + This is known race, which is already fixed in HEAD. I want to commit it + soon. + + Any plans on backporting it to RELENG_5 (RELENG_5_3 maybe?)? I'm going to MFC it probably this weekend to RELENG_5, RELENG_5_3 is closed for changes like this one. + To be on original topic - is there any way to make a mirror from live + system? I mean I have running FreeBSD on da0 and I want to make a + gmirror on it (I'm planning to add second drive soon). How to avoid + those disklabel warnings correctly? You still need to reboot. The whole instruction you should find in freebsd-geom@ mailing list archives, I wrote about this few times, AFAIR. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpHbqJJEZamw.pgp Description: PGP signature
Re: Hand on gmirror (Was: Re: gmirror bugs, how many?)
On Wed, Dec 08, 2004 at 11:40:44AM +0200, Alexandr Kovalenko wrote: + + To be on original topic - is there any way to make a mirror from live + + system? I mean I have running FreeBSD on da0 and I want to make a + + gmirror on it (I'm planning to add second drive soon). How to avoid + + those disklabel warnings correctly? + + You still need to reboot. The whole instruction you should find in + freebsd-geom@ mailing list archives, I wrote about this few times, AFAIR. + + Could you please remember subject of that thread? + + I was able to make a gmirror on live system using + kern.geom.debugflags=16, but problem with disklabel remains. You cannot do this on live system, because you need to mount root file system on top of the mirror and remounting root file system is not possible. You need to create the mirror on 2nd disk first, etc. Even if you store metadata on disk (with debugflags=16), changes will not be updated on 2nd disk, because I/O requests go to the disk provider, not to the mirror provider. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpCpwiAJCThe.pgp Description: PGP signature
Re: Multiple IPs in jail
On Wed, Dec 08, 2004 at 09:51:31AM -0800, Justin Hopper wrote: + Thanks for understanding my question, Devon. I guess at this point I'll + patch a system here and begin testing with it, and hopefully PJD or PHK + or somebody else @freebsd will respond with any plans to roll this + functionality into the base system. It's really not a problem if there + is no plans to do it, I just don't want to spend a lot of time fiddling + with a patch and then find it in the base system in 5.4 or something. My patch still has some issues. I updated the patch against HEAD from a minute ago: http://people.freebsd.org/~pjd/patches/jail_2004120901.patch I don't have time to work on this right now, so can't say if/when it'll be committed. -- Pawel Jakub Dawidek http://www.wheel.pl [EMAIL PROTECTED] http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpSjldRH7uzw.pgp Description: PGP signature
Re: Hand on gmirror (Was: Re: gmirror bugs, how many?)
On Mon, Nov 29, 2004 at 12:52:40PM -0200, Jo?o Carlos Mendes Lus wrote: + + Indeed, the -h option is what I wanted and the bug is in the + + manual. What would happen if I change the disc ID in this case? + + Your disk will not be detected as a mirror component, because hardcoded + name is different. + + Oops. Is there a check for that? For example, let's say that ad0s1 got + renamed to ad1s1, and hardcoded a reference to ad0s1. In this case, + there is a disk called ad0s1 in the system. Is gmirror smart enough in + this case? In this case ad1s1 will not be connected to the mirror (but don't worry, ad0s1 will not be connected as well). + + sigesc::root jcmendes [553] disklabel mirror/vol0 + + # /dev/mirror/vol0: + + 8 partitions: + + #size offsetfstype [fsize bsize bps/cpg] + + a: 16498864 16unused0 0 + + c: 164988800unused0 0 # raw part, + + don't edit + + sigesc::root jcmendes [554] + + + + Seems good until now. Except for the offset 16 of the a partition. + + Is this necessary? The man page says that the only sector reserved + + for metadata is the provider's last one. + + Ehh, blame disklabel(8). First 16 sectors are reserved for boot code. + + And why this does not happen with ad0s1, etc? I think it should, only using sysinstall for this will not allocate those sectors. Anyway, it has nothing to do with gmirror. -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! pgpcMmdf82g2J.pgp Description: PGP signature
Re: Hand on gmirror (Was: Re: gmirror bugs, how many?)
On Mon, Nov 29, 2004 at 07:27:51PM -0200, Jo?o Carlos Mendes Lus wrote: + I finally got the system to boot with gmirror fully enabled. But I got + this during boot: + + + GEOM_MIRROR: Device vol0 created (id=3592859320). + GEOM_MIRROR: Device vol0: provider ad0s1 detected. + GEOM_MIRROR: Device vol0: provider ad1s1 detected. + GEOM_MIRROR: Device vol0: provider ad1s1 activated. + GEOM_MIRROR: Cannot update metadata on disk ad0s1 (error=1). + GEOM_MIRROR: Device vol0: provider ad0s1 activated. + GEOM_MIRROR: Device vol0: provider mirror/vol0 launched. + GEOM_MIRROR: Cannot update metadata on disk ad0s1 (error=1). + GEOM_MIRROR: Device vol0: provider ad0s1 disconnected. This is known race, which is already fixed in HEAD. I want to commit it soon. -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! pgpoTcPvSZWCS.pgp Description: PGP signature
Re: Hand on gmirror (Was: Re: gmirror bugs, how many?)
On Fri, Nov 26, 2004 at 05:55:51PM -0200, Jo?o Carlos Mendes Lus wrote: + Pawel Jakub Dawidek wrote: [...] + What error do you get when you try to do this? + + Step by step: + + - The system has started with a preloaded geom_mirror: [...] + - There is a running mirror partition: [...] + - Now let's try to remove (disable was my intention, a bad idea): + + sigesc::root jcmendes [524] gmirror unload + Could not unload module: Device not configured. + sigesc::root jcmendes [525] gmirror list + sigesc::root jcmendes [526] gmirror load + Command 'load' not available. + sigesc::root jcmendes [527] gmirror list + sigesc::root jcmendes [528] kldstat + Id Refs AddressSize Name + 1 13 0xc040 3126c4 kernel + 21 0xc0713000 10be8geom_mirror.ko + 3 14 0xc0724000 59340acpi.ko + 41 0xc106a000 6000 linprocfs.ko + 51 0xc107 18000linux.ko + 61 0xc1183000 2000 fade_saver.ko + sigesc::root jcmendes [529] ls -l /dev/mirror/ + total 1 + dr-xr-xr-x 2 root wheel 512 Nov 26 12:19 . + dr-xr-xr-x 5 root wheel 512 Nov 26 12:19 .. + sigesc::root jcmendes [530] + + - Well, something not good happened. The device did not unload, and do + not list any device anymore. Trying to reload it has no effect. + - This used to work before preloading it in loader.conf, but then I + would not be able to boot a mirror partition. [...] Not working 'unload' command is because of bug in GEOM. Now, to avoid deadlock you get an error (ENXIO), but mirror will be destroyed. The next 'unload' should be ok. To avoid those errors, you should first stop all mirrors (unsing 'stop' command) and then unload kernel module. BTW. There is no 'reload' command. + Indeed, the -h option is what I wanted and the bug is in the + manual. What would happen if I change the disc ID in this case? Your disk will not be detected as a mirror component, because hardcoded name is different. + sigesc::root jcmendes [553] disklabel mirror/vol0 + # /dev/mirror/vol0: + 8 partitions: + #size offsetfstype [fsize bsize bps/cpg] + a: 16498864 16unused0 0 + c: 164988800unused0 0 # raw part, + don't edit + sigesc::root jcmendes [554] + + Seems good until now. Except for the offset 16 of the a partition. + Is this necessary? The man page says that the only sector reserved + for metadata is the provider's last one. Ehh, blame disklabel(8). First 16 sectors are reserved for boot code. -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! pgpWBT4JUfk0K.pgp Description: PGP signature
Re: gmirror bugs, how many?
On Fri, Nov 26, 2004 at 02:56:15AM -0200, Jo?o Carlos Mendes Lus wrote: + Pawel Jakub Dawidek wrote: + First mistake - wrong order. Create a mirror, than partition a mirror + provider. + + Is this a constraint in the design? Im my point of view, geom would + treat all block devices equally, no matter if they are whole disks or + single partitions. You're right and it does so. + If this is not the case, them maybe this should be noted in the man page. + + Note that sometimes it is not necessary to have a whole disk redundant. + I could use part of it to temporary data, for example. I've done this + with vinum in 4-stable more than once: I get two disks, each with a copy + of the root partition (which I intended to mirror with gmirror), a swap + partition, a mirror vinum subdisk and a stripe vinum subdisk. Note that + in this case the data integrity and cost is more important than + continuous operation. If a disk fail, the server will stop, but no + *important* data will get lost. This is the scenario which I was testing. You can do that with gmirror. All I'm saying is that you first should create a mirror, then create slices and partitions on top of mirror provider, because you want to use mirror/vol0s1a, not ad0s1a. Note, that mirror/vol0 is one sector shorter than ad0 and imagine a situation when gmirror stores metadata in the same place where BSD stores it - if you first create a mirror and then partitions on ad0 you'll overwrite gmirror metadata. This very painful, that in MBR metadata is visible on traffic providers and I don't want to repeat that mistake. + + Now, lets reboot. I could not unload geom_mirror, since it was + + preloaded during boot, is this expected? The device could not be + + unloaded, but the volume disapeared (gmirror list, ls /dev/mirror). + + If there is no mirror configured, you should be able to unload it. + + Before putting it in /boot/loader.conf, unload worked, even with mirror + devices configured, IIRC. Only after loader.conf preloading this + problem appeared. What error do you get when you try to do this? + The man page says only: + + -h Hardcode providers' names in metadata. + + and does not explain when I should use this. + + Do you mean that if I want it to use ad1s1 as the provider, and not ad1, + -h is what I want? Only when you share the last sector between those two providers. You can still create ad1s1, which is one sector shorter. + + + Is there any gmirror hacker around to fix these? + + There is nothing to fix. + + Surely there is. At least the manual. I've to agree here.:) + And even if gmirror is correct, there's also the problem shown with + disklabel in my previous email. What problem is there when you use proper order of doing things? -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! pgpGwxxWqUz6n.pgp Description: PGP signature
Re: FreeBSD Kernel buffer overflow
On Sat, Sep 18, 2004 at 09:13:42PM -0700, Julian Elischer wrote: + +#if (__i386__) (INVARIANTS) + + KASSERT(new_sysent-nargs = 0 new_sysent-nargs = + i386_SYS_ARGS, + + invalid number of syscalls); + +#endif + + +*old_sysent = sysent[*offset]; +sysent[*offset] = *new_sysent; +return 0; + + + Why panic the machine at this point? Just refuse to install the syscall + and return an error. + + and the test for INVARIANTS is un-needed.. KASSERT only compiles to anything + when INVARIANTS is defined. ...and it should be '#ifdef', not '#if'. ...and the panic message should be inside (). -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! pgpzjpAm2AMY1.pgp Description: PGP signature
Re: FreeBSD Kernel buffer overflow
On Fri, Sep 17, 2004 at 12:37:12PM +0300, Giorgos Keramidas wrote: + % +#ifdef INVARIANTS + % + KASSERT(0 = narg narg = 8, (invalid number of syscall args)); + % +#endif Maybe: KASSERT(0 = narg narg = sizeof(args) / sizeof(args[0]), (invalid number of syscall args)); So if we decide to increase/decrease it someday, we don't have to remember about this KASSERT(). -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! pgpSWfnBU9LRz.pgp Description: PGP signature
Re: FreeBSD Kernel buffer overflow
On Sat, Sep 18, 2004 at 02:18:55AM -0700, Don Lewis wrote: + On 18 Sep, Pawel Jakub Dawidek wrote: + On Fri, Sep 17, 2004 at 12:37:12PM +0300, Giorgos Keramidas wrote: + + % +#ifdef INVARIANTS + + % + KASSERT(0 = narg narg = 8, (invalid number of syscall args)); + + % +#endif + + Maybe: + KASSERT(0 = narg narg = sizeof(args) / sizeof(args[0]), + (invalid number of syscall args)); + + So if we decide to increase/decrease it someday, we don't have to remember + about this KASSERT(). + + What keeps the attacker from installing two syscalls, the first of which + pokes NOPs over the KASSERT code, and the second of which accepts too + many arguments? First of all, this is not protection from an attacker, but help for bad programmers. + If you think we really need this bit of extra security, why not just + prevent the syscall with too many arguments from being registered by + syscall_register()? At least that keeps the check out of the most + frequently executed path. Good point, this is much better place for it. -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! pgp95AlGUtH0A.pgp Description: PGP signature
Re: kern___getcwd() returns ENOTDIR
On Sun, Jun 27, 2004 at 11:12:20AM -0700, David Schultz wrote: + On Sun, Jun 27, 2004, Kentucky Mandeloid Mo. wrote: + I'm writng a smal kernel module that catches file access syscalls. + At every syscall I need a full name of file is being passed to a syscall. + I'm getting it with a path passed to syscall and if path is not starting + with / I get current working directory of process using kern___getcwd(). + In every syscall all works just fine except rmdir unlink. + Sometimes in unlink and everytime in rmdir it returns not a directory error. + I know already that kern___getcwd() works through vnode cache and this method + is not a reliable way to get file names. + So is there any other way get cwd of a proccess? + + linux_getcwd() works in more cases than kern___getcwd(), but it + has other problems. What problems does it have? Could you provide more details? Was it discusses when patch for changing kern___getcwd() with linux_getcwd() was introduced? -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! pgpI1d26GbBJ8.pgp Description: PGP signature
Re: api for sharing memory from kernel to userspace?
On Wed, May 19, 2004 at 05:29:07AM -0700, Alfred Perlstein wrote: + I need to share about 100megs of memory between kernel and userspace. + + The memory can not be paged and should appear contig in the process's + address space. Any suggestions? + + I need a way to either: + map user memory into the kernel's address space. + map kernel memory into the user's address space. + + I was looking at pmap_qenter() but it didn't see attractive because + it's for short term mappings, this mapping will exist for quite a + while. In mapping kernel memory into user's address space I am interested as well for GEOM Gate and other evil projects. -- Pawel Jakub Dawidek http://www.FreeBSD.org [EMAIL PROTECTED] http://garage.freebsd.pl FreeBSD committer Am I Evil? Yes, I Am! pgpuGPDqfSSPA.pgp Description: PGP signature