Re: Why is TUNABLE_INT discouraged?
2010/8/7 Ivan Voras ivo...@freebsd.org: 2010/8/8 Dag-Erling Smørgrav d...@des.no: Garrett Cooper gcoo...@freebsd.org writes: Dag-Erling Smørgrav d...@des.no writes: Perhaps. I don't remember all the details; I can't find a discussion in the list archives (other than me announcing the change in response to a bug report), but there must have been one, either on IRC or in Karlsruhe. In any case, I never removed TUNABLE_INT(), so... It does matter for integers on 64-bit vs 32-bit architectures though, right Not sure what you mean. The original issue was that someone had used TUNABLE_INT() for something that was actually a memory address. I changed it to TUNABLE_ULONG(). Of course, if your tunable is a boolean value or something like maxprocs, an int is fine - but so is a long. Why would someone express a tunable in a memory address (not being sarcastic... I just don't see why it makes sense right now, but if there's a valid reason I'm more than happy to be educated :)..)? Semantically valid but using TUNABLE_INT to hold pointers is a developer bug, not the fault of the API, and there's nothing wrong with int as a data type in this context. Unless there is a real hidden danger in using TUNABLE_INT (and/or adding TUNABLE_UINT etc.) in the expected way, I'd vote for either removing the cautioning comment or rewriting it to say something like developers are hereby warned that ints cannot hold pointers on all architectures, if it is indeed such a little known fact among kernel developers :P *grins cheekily in agreement* :). Thanks, -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: glabel force sectorsize patch
On Sun, Aug 08, 2010 at 03:57:44AM +0200, Ivan Voras wrote: Hi, In order to help users having 4k sector drives which the system recognizes as 512 byte sector drives, I'm proposing a patch to glabel which enables it to use a forced sector size for its native-labeled providers. It is naturally only usable with glabel-native labels (those created by glabel label) and not partition and file system labels because we cannot add arbitrary new fields to metadata of those types. The patch is here: http://people.freebsd.org/~ivoras/diffs/glabel_ssize.patch [...] This mechanism is a band-aid until there's a better way of dealing with 4k drives. So why do you want to obfuscate glabel with it? For people to start depend on it? Once we start supporting 4kB sectors what do we do with such a change? Remove it and decrease version number? What people will do with providers already labeled this way? If its temporary, just allow to list providers you want to increase sector size in /boot/loader.conf. Once we start supporting it properly people might simply remove it from loader.conf and it should just work. Glabel is not for that and I don't agree for such obfuscation. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp9g74Rergrb.pgp Description: PGP signature
Re: glabel force sectorsize patch
On Sun, Aug 08, 2010 at 02:02:17PM +0200, Ivan Voras wrote: On 8.8.2010 12:30, Pawel Jakub Dawidek wrote: So why do you want to obfuscate glabel with it? For people to start depend on it? Once we start supporting 4kB sectors what do we do with such a change? Remove it and decrease version number? What people will do with providers already labeled this way? If its temporary, just allow to list providers you want to increase sector size in /boot/loader.conf. Once we start supporting it properly people might simply remove it from loader.conf and it should just work. Glabel is not for that and I don't agree for such obfuscation. Of course, there are good and bad sides to it. My take on it is that the only bad side is that it really isn't glabel's primary function to (optionally) fixup geometry, while the good sides are: It isn't its secondary function either. * glabel is in GENERIC and judging by the mailing lists' traffic it is one of the better used parts of the system so people are familiar with it. It is also already used as a perfectly valid fixup for device renaming, making both UFS and ZFS more stable for usage. That's an excellent argument. But you know what? The em(4) is also in GENERIC, why not to add it in there? * You can't really make people depend on glabel both because it is in GENERIC and because of it storing metadata in the last sector, making the rest of the drive completely usable without it in the event native 4k sector support is grown. I never said that. I do want people to depend on glabel, because it is free of such ugly hacks, so I know it won't bite them in the future. I don't want people to start depend on the fact that glabel supports changing sector sizes. Once we start supporting 4kB sectors properly people configuration will stop working, because glabel won't be able to read its metadata anymore. Your hack will break all configurations that started to depend on your hack. In what I proposed, GEOM provider will be presented to glabel (or any other GEOM class) as 4kB provider and everything will just work, also after adding proper support for 4kB sectors. I'd like to hear comments from the wider audience. In respect with your comment, I will compromise: as 4k sector drives have become available over the counter more than 6 months ago and so far I think this is the first effort to give some support for them, I will commit this patch before 9.0 code freeze only if no other support gets developed. I'll repeat. You won't commit this patch, because it is totally wrong solution and can only do a lot of damage in the future. If you look forward, even temporary solutions can be done right. -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgp8NDzCMjfAM.pgp Description: PGP signature
Re: Why is TUNABLE_INT discouraged?
On Sun, Aug 08, 2010 at 01:30:19AM +0200, Ivan Voras wrote: 2010/8/8 Dag-Erling Sm??rgrav d...@des.no: Garrett Cooper gcoo...@freebsd.org writes: Dag-Erling Sm??rgrav d...@des.no writes: Perhaps. ??I don't remember all the details; I can't find a discussion in the list archives (other than me announcing the change in response to a bug report), but there must have been one, either on IRC or in Karlsruhe. In any case, I never removed TUNABLE_INT(), so... It does matter for integers on 64-bit vs 32-bit architectures though, right Not sure what you mean. ??The original issue was that someone had used TUNABLE_INT() for something that was actually a memory address. ??I changed it to TUNABLE_ULONG(). ??Of course, if your tunable is a boolean value or something like maxprocs, an int is fine - but so is a long. Semantically valid but using TUNABLE_INT to hold pointers is a developer bug, not the fault of the API, and there's nothing wrong with int as a data type in this context. I agree with Ivan here. If we can't find an actual reason to universally prefer long I'd like to remove this comment. As a counterpoint to the preference for long I can offer a reason to prefer int: the same variable is often exposed by both a tunable and a sysctls, and a sysctl using long can have 32-bit compat issues. (That is, a 32-bit app using sysctlbyname will try to use 4 bytes for a long.) -Ed ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: glabel force sectorsize patch
On Sun, Aug 08, 2010 at 02:57:20PM +0200, Marius Nünnerich wrote: On Sun, Aug 8, 2010 at 14:02, Ivan Voras ivo...@freebsd.org wrote: I'd like to hear comments from the wider audience. In respect with your comment, I will compromise: as 4k sector drives have become available over the counter more than 6 months ago and so far I think this is the first effort to give some support for them, I will commit this patch before 9.0 code freeze only if no other support gets developed. I do not like this at all. Even if it's just for the KISS and POLA principles. A geom should do one thing and do it right imo. Why not write a new geom class that does what you want? New GEOM class only for sectorsize conversion that can operate on metadata will be useful, not only to solve this particular problem. Although keep in mind that if at some point disks will be detected and presented as 4kB providers to the GEOM, this class won't be able to find its metadata anymore (as it was stored in the last 512 bytes, not in the last 4 kilobytes). -- Pawel Jakub Dawidek http://www.wheelsystems.com p...@freebsd.org http://www.FreeBSD.org FreeBSD committer Am I Evil? Yes, I Am! pgpLef8MwEhAp.pgp Description: PGP signature
Re: glabel force sectorsize patch
On Sun, Aug 8, 2010 at 14:02, Ivan Voras ivo...@freebsd.org wrote: On 8.8.2010 12:30, Pawel Jakub Dawidek wrote: On Sun, Aug 08, 2010 at 03:57:44AM +0200, Ivan Voras wrote: Hi, In order to help users having 4k sector drives which the system recognizes as 512 byte sector drives, I'm proposing a patch to glabel which enables it to use a forced sector size for its native-labeled providers. It is naturally only usable with glabel-native labels (those created by glabel label) and not partition and file system labels because we cannot add arbitrary new fields to metadata of those types. The patch is here: http://people.freebsd.org/~ivoras/diffs/glabel_ssize.patch [...] This mechanism is a band-aid until there's a better way of dealing with 4k drives. So why do you want to obfuscate glabel with it? For people to start depend on it? Once we start supporting 4kB sectors what do we do with such a change? Remove it and decrease version number? What people will do with providers already labeled this way? If its temporary, just allow to list providers you want to increase sector size in /boot/loader.conf. Once we start supporting it properly people might simply remove it from loader.conf and it should just work. Glabel is not for that and I don't agree for such obfuscation. Of course, there are good and bad sides to it. My take on it is that the only bad side is that it really isn't glabel's primary function to (optionally) fixup geometry, while the good sides are: * glabel is in GENERIC and judging by the mailing lists' traffic it is one of the better used parts of the system so people are familiar with it. It is also already used as a perfectly valid fixup for device renaming, making both UFS and ZFS more stable for usage. * You can't really make people depend on glabel both because it is in GENERIC and because of it storing metadata in the last sector, making the rest of the drive completely usable without it in the event native 4k sector support is grown. I'd like to hear comments from the wider audience. In respect with your comment, I will compromise: as 4k sector drives have become available over the counter more than 6 months ago and so far I think this is the first effort to give some support for them, I will commit this patch before 9.0 code freeze only if no other support gets developed. I do not like this at all. Even if it's just for the KISS and POLA principles. A geom should do one thing and do it right imo. Why not write a new geom class that does what you want? ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: sched_pin() versus PCPU_GET
2010/8/4 m...@freebsd.org: On Fri, Jul 30, 2010 at 2:31 PM, John Baldwin j...@freebsd.org wrote: On Friday, July 30, 2010 10:08:22 am John Baldwin wrote: On Thursday, July 29, 2010 7:39:02 pm m...@freebsd.org wrote: We've seen a few instances at work where witness_warn() in ast() indicates the sched lock is still held, but the place it claims it was held by is in fact sometimes not possible to keep the lock, like: thread_lock(td); td-td_flags = ~TDF_SELECT; thread_unlock(td); What I was wondering is, even though the assembly I see in objdump -S for witness_warn has the increment of td_pinned before the PCPU_GET: 802db210: 65 48 8b 1c 25 00 00 mov %gs:0x0,%rbx 802db217: 00 00 802db219: ff 83 04 01 00 00 incl 0x104(%rbx) * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db21f: 65 48 8b 04 25 48 00 mov %gs:0x48,%rax 802db226: 00 00 if (lock_list != NULL lock_list-ll_count != 0) { 802db228: 48 85 c0 test %rax,%rax * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db22b: 48 89 85 f0 fe ff ff mov %rax,-0x110(%rbp) 802db232: 48 89 85 f8 fe ff ff mov %rax,-0x108(%rbp) if (lock_list != NULL lock_list-ll_count != 0) { 802db239: 0f 84 ff 00 00 00 je 802db33e witness_warn+0x30e 802db23f: 44 8b 60 50 mov 0x50(%rax),%r12d is it possible for the hardware to do any re-ordering here? The reason I'm suspicious is not just that the code doesn't have a lock leak at the indicated point, but in one instance I can see in the dump that the lock_list local from witness_warn is from the pcpu structure for CPU 0 (and I was warned about sched lock 0), but the thread id in panic_cpu is 2. So clearly the thread was being migrated right around panic time. This is the amd64 kernel on stable/7. I'm not sure exactly what kind of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. So... do we need some kind of barrier in the code for sched_pin() for it to really do what it claims? Could the hardware have re-ordered the mov %gs:0x48,%rax PCPU_GET to before the sched_pin() increment? Hmmm, I think it might be able to because they refer to different locations. Note this rule in section 8.2.2 of Volume 3A: • Reads may be reordered with older writes to different locations but not with older writes to the same location. It is certainly true that sparc64 could reorder with RMO. I believe ia64 could reorder as well. Since sched_pin/unpin are frequently used to provide this sort of synchronization, we could use memory barriers in pin/unpin like so: sched_pin() { td-td_pinned = atomic_load_acq_int(td-td_pinned) + 1; } sched_unpin() { atomic_store_rel_int(td-td_pinned, td-td_pinned - 1); } We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but they are slightly more heavyweight, though it would be more clear what is happening I think. However, to actually get a race you'd have to have an interrupt fire and migrate you so that the speculative read was from the other CPU. However, I don't think the speculative read would be preserved in that case. The CPU has to return to a specific PC when it returns from the interrupt and it has no way of storing the state for what speculative reordering it might be doing, so presumably it is thrown away? I suppose it is possible that it actually retires both instructions (but reordered) and then returns to the PC value after the read of listlocks after the interrupt. However, in that case the scheduler would not migrate as it would see td_pinned != 0. To get the race you have to have the interrupt take effect prior to modifying td_pinned, so I think the processor would have to discard the reordered read of listlocks so it could safely resume execution at the 'incl' instruction. The other nit there on x86 at least is that the incl instruction is doing both a read and a write and another rule in the section 8.2.2 is this: • Reads are not reordered with other reads. That would seem to prevent the read of listlocks from passing the read of td_pinned in the incl instruction on x86. I wonder how that's interpreted in the microcode, though? I.e. if the incr instruction decodes to load, add, store, does the h/w allow the later reads to pass the final store? I added the
Re: sched_pin() versus PCPU_GET
On Sun, Aug 8, 2010 at 2:43 PM, Attilio Rao atti...@freebsd.org wrote: 2010/8/4 m...@freebsd.org: On Fri, Jul 30, 2010 at 2:31 PM, John Baldwin j...@freebsd.org wrote: On Friday, July 30, 2010 10:08:22 am John Baldwin wrote: On Thursday, July 29, 2010 7:39:02 pm m...@freebsd.org wrote: We've seen a few instances at work where witness_warn() in ast() indicates the sched lock is still held, but the place it claims it was held by is in fact sometimes not possible to keep the lock, like: thread_lock(td); td-td_flags = ~TDF_SELECT; thread_unlock(td); What I was wondering is, even though the assembly I see in objdump -S for witness_warn has the increment of td_pinned before the PCPU_GET: 802db210: 65 48 8b 1c 25 00 00 mov %gs:0x0,%rbx 802db217: 00 00 802db219: ff 83 04 01 00 00 incl 0x104(%rbx) * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db21f: 65 48 8b 04 25 48 00 mov %gs:0x48,%rax 802db226: 00 00 if (lock_list != NULL lock_list-ll_count != 0) { 802db228: 48 85 c0 test %rax,%rax * Pin the thread in order to avoid problems with thread migration. * Once that all verifies are passed about spinlocks ownership, * the thread is in a safe path and it can be unpinned. */ sched_pin(); lock_list = PCPU_GET(spinlocks); 802db22b: 48 89 85 f0 fe ff ff mov %rax,-0x110(%rbp) 802db232: 48 89 85 f8 fe ff ff mov %rax,-0x108(%rbp) if (lock_list != NULL lock_list-ll_count != 0) { 802db239: 0f 84 ff 00 00 00 je 802db33e witness_warn+0x30e 802db23f: 44 8b 60 50 mov 0x50(%rax),%r12d is it possible for the hardware to do any re-ordering here? The reason I'm suspicious is not just that the code doesn't have a lock leak at the indicated point, but in one instance I can see in the dump that the lock_list local from witness_warn is from the pcpu structure for CPU 0 (and I was warned about sched lock 0), but the thread id in panic_cpu is 2. So clearly the thread was being migrated right around panic time. This is the amd64 kernel on stable/7. I'm not sure exactly what kind of hardware; it's a 4-way Intel chip from about 3 or 4 years ago IIRC. So... do we need some kind of barrier in the code for sched_pin() for it to really do what it claims? Could the hardware have re-ordered the mov %gs:0x48,%rax PCPU_GET to before the sched_pin() increment? Hmmm, I think it might be able to because they refer to different locations. Note this rule in section 8.2.2 of Volume 3A: • Reads may be reordered with older writes to different locations but not with older writes to the same location. It is certainly true that sparc64 could reorder with RMO. I believe ia64 could reorder as well. Since sched_pin/unpin are frequently used to provide this sort of synchronization, we could use memory barriers in pin/unpin like so: sched_pin() { td-td_pinned = atomic_load_acq_int(td-td_pinned) + 1; } sched_unpin() { atomic_store_rel_int(td-td_pinned, td-td_pinned - 1); } We could also just use atomic_add_acq_int() and atomic_sub_rel_int(), but they are slightly more heavyweight, though it would be more clear what is happening I think. However, to actually get a race you'd have to have an interrupt fire and migrate you so that the speculative read was from the other CPU. However, I don't think the speculative read would be preserved in that case. The CPU has to return to a specific PC when it returns from the interrupt and it has no way of storing the state for what speculative reordering it might be doing, so presumably it is thrown away? I suppose it is possible that it actually retires both instructions (but reordered) and then returns to the PC value after the read of listlocks after the interrupt. However, in that case the scheduler would not migrate as it would see td_pinned != 0. To get the race you have to have the interrupt take effect prior to modifying td_pinned, so I think the processor would have to discard the reordered read of listlocks so it could safely resume execution at the 'incl' instruction. The other nit there on x86 at least is that the incl instruction is doing both a read and a write and another rule in the section 8.2.2 is this: • Reads are not reordered with other reads. That would seem to prevent the read of listlocks from passing the read of td_pinned in the incl instruction on x86. I wonder how that's interpreted in the microcode, though? I.e. if the incr instruction decodes to load, add,
Re: Why is TUNABLE_INT discouraged?
In message: 20100808130624.gb40...@sandvine.com Ed Maste ema...@freebsd.org writes: : On Sun, Aug 08, 2010 at 01:30:19AM +0200, Ivan Voras wrote: : : 2010/8/8 Dag-Erling Sm??rgrav d...@des.no: : Garrett Cooper gcoo...@freebsd.org writes: : Dag-Erling Sm??rgrav d...@des.no writes: :Perhaps. ??I don't remember all the details; I can't find a discussion in :the list archives (other than me announcing the change in response to a :bug report), but there must have been one, either on IRC or in Karlsruhe. :In any case, I never removed TUNABLE_INT(), so... : It does matter for integers on 64-bit vs 32-bit architectures though, : right : : Not sure what you mean. ??The original issue was that someone had used : TUNABLE_INT() for something that was actually a memory address. ??I : changed it to TUNABLE_ULONG(). ??Of course, if your tunable is a boolean : value or something like maxprocs, an int is fine - but so is a long. : : Semantically valid but using TUNABLE_INT to hold pointers is a : developer bug, not the fault of the API, and there's nothing wrong : with int as a data type in this context. : : I agree with Ivan here. If we can't find an actual reason to : universally prefer long I'd like to remove this comment. : : As a counterpoint to the preference for long I can offer a reason to : prefer int: the same variable is often exposed by both a tunable and a : sysctls, and a sysctl using long can have 32-bit compat issues. (That : is, a 32-bit app using sysctlbyname will try to use 4 bytes for a long.) There's absolutely no reason to not use TUNABLE_INT for simple integers. Back in the deep, dark, dim past, there was no TUNABLE_*LONG. TUNABLE_INT was introduce in r77900 by peter. DES added TUNABLE_LONG in r137099, as well as adding the comment. The comment is bogus, or at least not quite specific enough. It has been routinely ignored since it was added. There's absolutely nothing wrong with TUNABLE_INT. We really should have a TUNABLE_ADDRESS for this case, although there's at least three types of address that we need to worry about: Physical Addresses, Virtual Addresses and Bus Addresses. You don't know if ULONG or LONG is the right type for an address, or if you need a quad (for example, 32-bit MIPS can address, through PTE entries, addresses beyond the 32-bit barrier, so you'd need a QUAD). I'm in favor of changing the comment to: /* * Please do not use for addresses or pointers */ Warner ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: glabel force sectorsize patch
On Sun, Aug 8, 2010 at 21:08, Ivan Voras ivo...@freebsd.org wrote: On 8.8.2010 14:57, Marius Nünnerich wrote: On Sun, Aug 8, 2010 at 14:02, Ivan Voras ivo...@freebsd.org wrote: This mechanism is a band-aid until there's a better way of dealing with 4k drives. I do not like this at all. Even if it's just for the KISS and POLA principles. A geom should do one thing and do it right imo. As the addition will not modify general behaviour of glabel but add a new feature (which is actually clean and trivial to implement) invisible to most of the users, I don't think either KISS nor POLA are in any danger here. Adding a new feature maps directly to KISS, especially if the feature is in the wrong module. POLA: I wouldn't guess that a blocksize resizing is hidden in a _part_ of glabel. I am not using the native glabel part at all, just the named GPT partitions as most of the users seem to prefer nower days (and I guess will get even more traction after Dan's blog post). I do agree that it shouldn't be glabel's job to do this but also am *very* strongly against shipping 9.0 without any support for 4k drives, and the way I've chosen is the lesser of two evils. I am against workarounds for stupid hardware vendors most of the time. Especially if it's just a minority, they break pola intentionally and is fixed easily without this kludge. Afaik if you align your Partitions to higher values (I use 1MB for example) ufs is not having any performance issues (I have not benchmarked this myself). Code and patches by others are of course welcome. I'm hoping this discussion will trigger someone with experience in the lower levels of kernel to go and finally add the drive info parsing so it gets solved the right way :) Why not write a new geom class that does what you want? I'm not against this approach also. Technically, if we go this way, the new GEOM class will be almost a line-for-line copy-paste of glabel with this single metadata field added, so I'd rather fold it into glabel. I did not think of a new GEOM class that looks like glabel but one that has no metadata stored on disk . It is then activated and controlled by loader.conf variables. (Maybe like gnop? If I remember correctly, I did not take a look at that class for ages). This way you would get: - Your feature - no KISS violation, that class should be really simple - no POLA violation, feature is in a class with a discriptive name, glabel is left alone - no metadata store problem (is it in the last 512 or 4K bytes?) - No problem with future compatibilty, a user would have to active the class and it's configuration by hand, no magic here Marius ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Why is TUNABLE_INT discouraged?
On Sun, Aug 8, 2010 at 11:14 AM, M. Warner Losh i...@bsdimp.com wrote: In message: 20100808130624.gb40...@sandvine.com Ed Maste ema...@freebsd.org writes: : On Sun, Aug 08, 2010 at 01:30:19AM +0200, Ivan Voras wrote: : : 2010/8/8 Dag-Erling Sm??rgrav d...@des.no: : Garrett Cooper gcoo...@freebsd.org writes: : Dag-Erling Sm??rgrav d...@des.no writes: :Perhaps. ??I don't remember all the details; I can't find a discussion in :the list archives (other than me announcing the change in response to a :bug report), but there must have been one, either on IRC or in Karlsruhe. :In any case, I never removed TUNABLE_INT(), so... : It does matter for integers on 64-bit vs 32-bit architectures though, : right : : Not sure what you mean. ??The original issue was that someone had used : TUNABLE_INT() for something that was actually a memory address. ??I : changed it to TUNABLE_ULONG(). ??Of course, if your tunable is a boolean : value or something like maxprocs, an int is fine - but so is a long. : : Semantically valid but using TUNABLE_INT to hold pointers is a : developer bug, not the fault of the API, and there's nothing wrong : with int as a data type in this context. : : I agree with Ivan here. If we can't find an actual reason to : universally prefer long I'd like to remove this comment. : : As a counterpoint to the preference for long I can offer a reason to : prefer int: the same variable is often exposed by both a tunable and a : sysctls, and a sysctl using long can have 32-bit compat issues. (That : is, a 32-bit app using sysctlbyname will try to use 4 bytes for a long.) There's absolutely no reason to not use TUNABLE_INT for simple integers. Back in the deep, dark, dim past, there was no TUNABLE_*LONG. TUNABLE_INT was introduce in r77900 by peter. DES added TUNABLE_LONG in r137099, as well as adding the comment. The comment is bogus, or at least not quite specific enough. It has been routinely ignored since it was added. There's absolutely nothing wrong with TUNABLE_INT. We really should have a TUNABLE_ADDRESS for this case, although there's at least three types of address that we need to worry about: Physical Addresses, Virtual Addresses and Bus Addresses. You don't know if ULONG or LONG is the right type for an address, or if you need a quad (for example, 32-bit MIPS can address, through PTE entries, addresses beyond the 32-bit barrier, so you'd need a QUAD). I'm in favor of changing the comment to: /* * Please do not use for addresses or pointers */ Then comes my next request: could someone please review this patch and commit it if it's ok? This adds comments to TUNABLE_INT (recommended by Warner) and also adds functionality for TUNABLE_UINT as well. I have a simple module and test script attached for it. Adding this functionality would make things more consistent with the TUNABLE KPIs, make my experimenting with sound(4) easier, and could help improve the sound(4) subsystem's code (I'll have to discuss some things with ariff@, because I'm not sure whether or not some quantities need to be signed or not). Thanks! -Garrett uint_tunable.diff Description: Binary data ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: NFS server hangs (was no subject)
I have a similar problem. I have a NFS server (8.0 upgraded a couple times since Feb 2010) that locks up and requires a reboot. The clients are busy vm's from VMWare ESXi using the NFS server for vmdk virtual disk storage. The ESXi reports nfs server inactive and all the vm's post disk write errors when trying to write to their disk. /etc/rc.d/nfsd restart fails to work (it can not kill the nfsd process) The nfsd process runs at 100% cpu at rc_lo state in top. reboot is the only fix. It has only happened under two circumstances. 1) Installation of a VM using Windows 2008. 2) Migrating 16 million mail messages from a physical server to a VM running FreeBSD with ZFS file system as a VM on the ESXi box that uses NFS to store the VM's ZFS disk. The NFS server uses ZFS also. I don't think what you are seeing is the same as what others have reported. (I have a hunch that your problem might be a replay cache problem.) Please try the attached patch and make sure that your sys/rpc/svc.c is at r205562 (upgrade if it isn't). If this patch doesn't help, you could try using the experimental nfs server (which doesn't use the generic replay cache), by adding -e to mountd and nfsd. Please let me know if the patch or switching to the experimental nfs server helps, rick --- rpc/replay.c.sav 2010-08-08 18:05:50.0 -0400 +++ rpc/replay.c 2010-08-08 18:16:43.0 -0400 @@ -90,8 +90,10 @@ replay_setsize(struct replay_cache *rc, size_t newmaxsize) { + mtx_lock(rc-rc_lock); rc-rc_maxsize = newmaxsize; replay_prune(rc); + mtx_unlock(rc-rc_lock); } void @@ -144,8 +146,8 @@ bool_t freed_one; if (rc-rc_count = REPLAY_MAX || rc-rc_size rc-rc_maxsize) { - freed_one = FALSE; do { + freed_one = FALSE; /* * Try to free an entry. Don't free in-progress entries */ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org