Re: NUMA, cpuset and malloc
On 2013/04/21 06:43, Robert Waksmundzki wrote: On NUMA systems allocated memory is striped across local and non-local banks in order to have consistent performance in case the task is rescheduled to a different CPU socket. When a process is pinned to a single CPU socket with cpuset having the memory allocator prefer local banks would probably improve performance. Default system behavior would stay the same and the optimization would only be triggered on big multi socket systems when administrator used cpuset (command mostly used for performance optimization anyway). Is this something currently implemented in FreeBSD? Is this even a good idea? Do you mean linux like numactl ? AFAIK, there is no such feature in the FreeBSD. Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Fixing grep -D skip
I am trying to fix a bug in GNU grep, the bug is if you want to skip FIFO file, it will not work, for example: grep -D skip aaa . it will be stucked on a FIFO file. Here is the patch: http://people.freebsd.org/~davidxu/patch/grep.c.diff2 Is it fine to be committed ? Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On 2013/01/09 07:14, Richard Sharpe wrote: On Tue, 2013-01-08 at 08:14 -0800, Richard Sharpe wrote: On Tue, 2013-01-08 at 15:02 +0800, David Xu wrote: On 2013/01/08 14:33, Richard Sharpe wrote: On Tue, 2013-01-08 at 10:46 +0800, David Xu wrote: On 2013/01/08 09:27, Richard Sharpe wrote: Hi folks, I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0 and I want to check if the assumptions made by the original coder are correct. Essentially, the code queues a number of AIO requests (up to 100) and specifies an RT signal to be sent upon completion with siginfo_t. These are placed into an array. The code assumes that when handling one of these signals, if it has already received N such siginfo_t structures, it can BLOCK further instances of the signal while these structures are drained by the main code in Samba. However, my debugging suggests that if a bunch of signals have already been queued, you cannot block those undelivered but already queued signals. I am certain that they are all being delivered to the main thread and that they keep coming despite the code trying to stop them at 64 (they get all the way up to the 100 that were queued.) Can someone confirm whether I have this correct or not? I am curious that how the code BLOCKs the signal in its signal handler ? AFAIK, after signal handler returned, original signal mask is restored, and re-enables the signal delivering, unless you change it in ucontext.uc_sigmask. It does try to block the signals in the signal handler using the following code (in the signal handler): if (count+1 == TEVENT_SA_INFO_QUEUE_COUNT) { /* we've filled the info array - block this signal until these ones are delivered */ sigset_t set; sigemptyset(set); sigaddset(set, signum); sigprocmask(SIG_BLOCK, set, NULL); However, I also added pthread_sigmask with the same parameters to see if that made any difference and it seemed not to. This code won't work, as I said, after the signal handler returned, kernel will copy the signal mask contained in ucontext into kernel space, and use it in feature signal delivering. The code should be modified as following: void handler(int signum, siginfo_t *info, ucontext_t *uap) { ... if (count + 1 == TEVENT_SA_INFO_QUEUE_COUNT) { sigaddset(uap-uc_sigmask, signum); Hmmm, this seems unlikely because the signal handler is operating in user mode and has no access to kernel-mode variables. Well, it turns out that your suggestion was correct. I did some more searching and found another similar suggestion, so I gave it a whirl, and it works. Now, my problem is that Jeremy Allison thinks that it is a fugly hack. This means that I will probably have big problems getting a patch for this into Samba. I guess a couple of questions I have now are: 1. Is this the same for all versions of FreeBSD since Posix RT Signals were introduced? I have checked source code, and found from FreeBSD 7.0, RT signal is supported, and aio code uses signal queue. 2. Which (interpretation of which) combination of standards require such an approach? The way I introduced is standard: http://pubs.opengroup.org/onlinepubs/007904975/functions/sigaction.html I quoted some text here: When a signal is caught by a signal-catching function installed by sigaction(), a new signal mask is calculated and installed for the duration of the signal-catching function (or until a call to either sigprocmask() or sigsuspend() is made). This mask is formed by taking the union of the current signal mask and the value of the sa_mask for the signal being delivered [XSI] [Option Start] unless SA_NODEFER or SA_RESETHAND is set, [Option End] and then including the signal being delivered. If and when the user's signal handler returns normally, the original signal mask is restored. ... When the signal handler returns, the receiving thread resumes execution at the point it was interrupted unless the signal handler makes other arrangements. If longjmp() or _longjmp() is used to leave the signal handler, then the signal mask must be explicitly restored. This volume of IEEE Std 1003.1-2001 defines the third argument of a signal handling function when SA_SIGINFO is set as a void * instead of a ucontext_t *, but without requiring type checking. New applications should explicitly cast the third argument of the signal handling function to ucontext_t *. ^ --- The above means third parameter is pointing to ucontext_t which is used to restored the previously interrupted context, the context contains a signal mask which is also restored. http://pubs.opengroup.org/onlinepubs/007904975/basedefs/ucontext.h.html Regards, David Xu ___ freebsd-hackers
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On 2013/01/09 11:14, Daniel Eischen wrote: On Tue, 8 Jan 2013, Richard Sharpe wrote: [ ... ] Well, it turns out that your suggestion was correct. I did some more searching and found another similar suggestion, so I gave it a whirl, and it works. Now, my problem is that Jeremy Allison thinks that it is a fugly hack. This means that I will probably have big problems getting a patch for this into Samba. I don't understand why JA thinks this is a hack. Their current method doesn't work, or at least isn't portable. I've tried this on Solaris 10, and it works just as it does in FreeBSD. Test program included after signature. $ ./test_sigprocmask Sending signal 16 Got signal 16, blocked: true Blocking signal 16 using method 0 Handled signal 16, blocked: false Sending signal 16 Got signal 16, blocked: true Blocking signal 16 using method 1 Handled signal 16, blocked: true Yeah, people think that signal handler is normal code, this is a misunderstanding, in fact, it really works like an interrupt service routine. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On 2013/01/08 09:27, Richard Sharpe wrote: Hi folks, I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0 and I want to check if the assumptions made by the original coder are correct. Essentially, the code queues a number of AIO requests (up to 100) and specifies an RT signal to be sent upon completion with siginfo_t. These are placed into an array. The code assumes that when handling one of these signals, if it has already received N such siginfo_t structures, it can BLOCK further instances of the signal while these structures are drained by the main code in Samba. However, my debugging suggests that if a bunch of signals have already been queued, you cannot block those undelivered but already queued signals. I am certain that they are all being delivered to the main thread and that they keep coming despite the code trying to stop them at 64 (they get all the way up to the 100 that were queued.) Can someone confirm whether I have this correct or not? I am curious that how the code BLOCKs the signal in its signal handler ? AFAIK, after signal handler returned, original signal mask is restored, and re-enables the signal delivering, unless you change it in ucontext.uc_sigmask. Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On 2013/01/08 14:33, Richard Sharpe wrote: On Tue, 2013-01-08 at 10:46 +0800, David Xu wrote: On 2013/01/08 09:27, Richard Sharpe wrote: Hi folks, I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0 and I want to check if the assumptions made by the original coder are correct. Essentially, the code queues a number of AIO requests (up to 100) and specifies an RT signal to be sent upon completion with siginfo_t. These are placed into an array. The code assumes that when handling one of these signals, if it has already received N such siginfo_t structures, it can BLOCK further instances of the signal while these structures are drained by the main code in Samba. However, my debugging suggests that if a bunch of signals have already been queued, you cannot block those undelivered but already queued signals. I am certain that they are all being delivered to the main thread and that they keep coming despite the code trying to stop them at 64 (they get all the way up to the 100 that were queued.) Can someone confirm whether I have this correct or not? I am curious that how the code BLOCKs the signal in its signal handler ? AFAIK, after signal handler returned, original signal mask is restored, and re-enables the signal delivering, unless you change it in ucontext.uc_sigmask. It does try to block the signals in the signal handler using the following code (in the signal handler): if (count+1 == TEVENT_SA_INFO_QUEUE_COUNT) { /* we've filled the info array - block this signal until these ones are delivered */ sigset_t set; sigemptyset(set); sigaddset(set, signum); sigprocmask(SIG_BLOCK, set, NULL); However, I also added pthread_sigmask with the same parameters to see if that made any difference and it seemed not to. This code won't work, as I said, after the signal handler returned, kernel will copy the signal mask contained in ucontext into kernel space, and use it in feature signal delivering. The code should be modified as following: void handler(int signum, siginfo_t *info, ucontext_t *uap) { ... if (count + 1 == TEVENT_SA_INFO_QUEUE_COUNT) { sigaddset(uap-uc_sigmask, signum); ... here, sigprocmask call should be removed. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is it possible to block pending queued RealTime signals (AIO originating)?
On 2013/01/08 15:02, David Xu wrote: On 2013/01/08 14:33, Richard Sharpe wrote: On Tue, 2013-01-08 at 10:46 +0800, David Xu wrote: On 2013/01/08 09:27, Richard Sharpe wrote: Hi folks, I am running into a problem with AIO in Samba 3.6.x under FreeBSD 8.0 and I want to check if the assumptions made by the original coder are correct. Essentially, the code queues a number of AIO requests (up to 100) and specifies an RT signal to be sent upon completion with siginfo_t. These are placed into an array. The code assumes that when handling one of these signals, if it has already received N such siginfo_t structures, it can BLOCK further instances of the signal while these structures are drained by the main code in Samba. However, my debugging suggests that if a bunch of signals have already been queued, you cannot block those undelivered but already queued signals. I am certain that they are all being delivered to the main thread and that they keep coming despite the code trying to stop them at 64 (they get all the way up to the 100 that were queued.) Can someone confirm whether I have this correct or not? I am curious that how the code BLOCKs the signal in its signal handler ? AFAIK, after signal handler returned, original signal mask is restored, and re-enables the signal delivering, unless you change it in ucontext.uc_sigmask. It does try to block the signals in the signal handler using the following code (in the signal handler): if (count+1 == TEVENT_SA_INFO_QUEUE_COUNT) { /* we've filled the info array - block this signal until these ones are delivered */ sigset_t set; sigemptyset(set); sigaddset(set, signum); sigprocmask(SIG_BLOCK, set, NULL); However, I also added pthread_sigmask with the same parameters to see if that made any difference and it seemed not to. This code won't work, as I said, after the signal handler returned, kernel will copy the signal mask contained in ucontext into kernel space, and use it in feature signal delivering. The code should be modified as following: void handler(int signum, siginfo_t *info, ucontext_t *uap) { ... if (count + 1 == TEVENT_SA_INFO_QUEUE_COUNT) { sigaddset(uap-uc_sigmask, signum); ... here, sigprocmask call should be removed. Not that this code may only work in single thread mode, if there are multiple threads in the process, kernel is free to deliver the signal to any thread which is not masking it. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ule+smp: small optimization for turnstile priority lending
On 2012/11/06 19:03, Attilio Rao wrote: On 9/20/12, David Xu davi...@freebsd.org wrote: On 2012/09/18 22:05, Andriy Gapon wrote: Here is a snippet that demonstrates the issue on a supposedly fully loaded 2-processor system: 136794 0 3670427870244462 KTRGRAPH group:thread, id:Xorg tid 102818, state:running, attributes: prio:122 136793 0 3670427870241000 KTRGRAPH group:thread, id:cc1plus tid 111916, state:yielding, attributes: prio:183, wmesg:(null), lockname:(null) 136792 1 3670427870240829 KTRGRAPH group:thread, id:idle: cpu1 tid 14, state:running, attributes: prio:255 136791 1 3670427870239520 KTRGRAPH group:load, id:CPU 1 load, counter:0, attributes: none 136790 1 3670427870239248 KTRGRAPH group:thread, id:firefox tid 113473, state:blocked, attributes: prio:122, wmesg:(null), lockname:unp_mtx 136789 1 3670427870237697 KTRGRAPH group:load, id:CPU 0 load, counter:2, attributes: none 136788 1 3670427870236394 KTRGRAPH group:thread, id:firefox tid 113473, point:wokeup, attributes: linkedto:Xorg tid 102818 136787 1 3670427870236145 KTRGRAPH group:thread, id:Xorg tid 102818, state:runq add, attributes: prio:122, linkedto:firefox tid 113473 136786 1 3670427870235981 KTRGRAPH group:load, id:CPU 1 load, counter:1, attributes: none 136785 1 3670427870235707 KTRGRAPH group:thread, id:Xorg tid 102818, state:runq rem, attributes: prio:176 136784 1 3670427870235423 KTRGRAPH group:thread, id:Xorg tid 102818, point:prio, attributes: prio:176, new prio:122, linkedto:firefox tid 113473 136783 1 3670427870202392 KTRGRAPH group:thread, id:firefox tid 113473, state:running, attributes: prio:104 See how how the Xorg thread was forced from CPU 1 to CPU 0 where it preempted cc1plus thread (I do have preemption enabled) only to leave CPU 1 with zero load. Here is a proposed solution: turnstile_wait: optimize priority lending to a thread on a runqueue As the current thread is definitely going into mi_switch, it now removes its load before doing priority propagation which can potentially result in sched_add. In the SMP ULE case the latter searches for the least loaded CPU to place a boosted thread, which is supposedly about to run. diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c index 8e466cd..3299cae 100644 --- a/sys/kern/sched_ule.c +++ b/sys/kern/sched_ule.c @@ -1878,7 +1878,10 @@ sched_switch(struct thread *td, struct thread *newtd, int flags) /* This thread must be going to sleep. */ TDQ_LOCK(tdq); mtx = thread_lock_block(td); - tdq_load_rem(tdq, td); +#if defined(SMP) + if ((flags SW_TYPE_MASK) != SWT_TURNSTILE) +#endif + tdq_load_rem(tdq, td); } /* * We enter here with the thread blocked and assigned to the @@ -2412,6 +2415,21 @@ sched_rem(struct thread *td) tdq_setlowpri(tdq, NULL); } +void +sched_load_rem(struct thread *td) +{ + struct tdq *tdq; + + KASSERT(td == curthread, + (sched_rem_load: only curthread is supported)); + KASSERT(td-td_oncpu == td-td_sched-ts_cpu, + (thread running on cpu different from ts_cpu)); + tdq = TDQ_CPU(td-td_sched-ts_cpu); + TDQ_LOCK_ASSERT(tdq, MA_OWNED); + MPASS(td-td_lock == TDQ_LOCKPTR(tdq)); + tdq_load_rem(tdq, td); +} + /* * Fetch cpu utilization information. Updates on demand. */ diff --git a/sys/kern/subr_turnstile.c b/sys/kern/subr_turnstile.c index 31d16fe..d1d68e9 100644 --- a/sys/kern/subr_turnstile.c +++ b/sys/kern/subr_turnstile.c @@ -731,6 +731,13 @@ turnstile_wait(struct turnstile *ts, struct thread *owner, int queue) LIST_INSERT_HEAD(ts-ts_free, td-td_turnstile, ts_hash); } thread_lock(td); +#if defined(SCHED_ULE) defined(SMP) + /* +* Remove load earlier so that it does not affect cpu selection +* for a thread waken up due to priority lending, if any. +*/ + sched_load_rem(td); +#endif thread_lock_set(td, ts-ts_lock); td-td_turnstile = NULL; diff --git a/sys/sys/sched.h b/sys/sys/sched.h index 4b8387c..b1ead1b 100644 --- a/sys/sys/sched.h +++ b/sys/sys/sched.h @@ -110,6 +110,9 @@ voidsched_preempt(struct thread *td); void sched_add(struct thread *td, int flags); void sched_clock(struct thread *td); void sched_rem(struct thread *td); +#if defined(SCHED_ULE) defined(SMP) +void sched_load_rem(struct thread *td); +#endif void sched_tick(int cnt); void sched_relinquish(struct thread *td); struct thread *sched_choose(void); I found another scenario in taskqueue, in the function taskqueue_terminate, current thread tries to wake another thread up and sleep immediately, the tq_mutex sometimes is a spinlock. So if you remove one thread load from current cpu before wakeup, the resumed thread may be put on same cpu, so it will optimize the cpu scheduling too. I
Re: ule+smp: small optimization for turnstile priority lending
On 2012/11/07 14:17, Jeff Roberson wrote: On Wed, 7 Nov 2012, David Xu wrote: On 2012/11/06 19:03, Attilio Rao wrote: On 9/20/12, David Xu davi...@freebsd.org wrote: On 2012/09/18 22:05, Andriy Gapon wrote: Here is a snippet that demonstrates the issue on a supposedly fully loaded 2-processor system: 136794 0 3670427870244462 KTRGRAPH group:thread, id:Xorg tid 102818, state:running, attributes: prio:122 136793 0 3670427870241000 KTRGRAPH group:thread, id:cc1plus tid 111916, state:yielding, attributes: prio:183, wmesg:(null), lockname:(null) 136792 1 3670427870240829 KTRGRAPH group:thread, id:idle: cpu1 tid 14, state:running, attributes: prio:255 136791 1 3670427870239520 KTRGRAPH group:load, id:CPU 1 load, counter:0, attributes: none 136790 1 3670427870239248 KTRGRAPH group:thread, id:firefox tid 113473, state:blocked, attributes: prio:122, wmesg:(null), lockname:unp_mtx 136789 1 3670427870237697 KTRGRAPH group:load, id:CPU 0 load, counter:2, attributes: none 136788 1 3670427870236394 KTRGRAPH group:thread, id:firefox tid 113473, point:wokeup, attributes: linkedto:Xorg tid 102818 136787 1 3670427870236145 KTRGRAPH group:thread, id:Xorg tid 102818, state:runq add, attributes: prio:122, linkedto:firefox tid 113473 136786 1 3670427870235981 KTRGRAPH group:load, id:CPU 1 load, counter:1, attributes: none 136785 1 3670427870235707 KTRGRAPH group:thread, id:Xorg tid 102818, state:runq rem, attributes: prio:176 136784 1 3670427870235423 KTRGRAPH group:thread, id:Xorg tid 102818, point:prio, attributes: prio:176, new prio:122, linkedto:firefox tid 113473 136783 1 3670427870202392 KTRGRAPH group:thread, id:firefox tid 113473, state:running, attributes: prio:104 See how how the Xorg thread was forced from CPU 1 to CPU 0 where it preempted cc1plus thread (I do have preemption enabled) only to leave CPU 1 with zero load. Here is a proposed solution: turnstile_wait: optimize priority lending to a thread on a runqueue As the current thread is definitely going into mi_switch, it now removes its load before doing priority propagation which can potentially result in sched_add. In the SMP ULE case the latter searches for the least loaded CPU to place a boosted thread, which is supposedly about to run. diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c index 8e466cd..3299cae 100644 --- a/sys/kern/sched_ule.c +++ b/sys/kern/sched_ule.c @@ -1878,7 +1878,10 @@ sched_switch(struct thread *td, struct thread *newtd, int flags) /* This thread must be going to sleep. */ TDQ_LOCK(tdq); mtx = thread_lock_block(td); -tdq_load_rem(tdq, td); +#if defined(SMP) +if ((flags SW_TYPE_MASK) != SWT_TURNSTILE) +#endif +tdq_load_rem(tdq, td); } /* * We enter here with the thread blocked and assigned to the @@ -2412,6 +2415,21 @@ sched_rem(struct thread *td) tdq_setlowpri(tdq, NULL); } +void +sched_load_rem(struct thread *td) +{ +struct tdq *tdq; + +KASSERT(td == curthread, +(sched_rem_load: only curthread is supported)); +KASSERT(td-td_oncpu == td-td_sched-ts_cpu, +(thread running on cpu different from ts_cpu)); +tdq = TDQ_CPU(td-td_sched-ts_cpu); +TDQ_LOCK_ASSERT(tdq, MA_OWNED); +MPASS(td-td_lock == TDQ_LOCKPTR(tdq)); +tdq_load_rem(tdq, td); +} + /* * Fetch cpu utilization information. Updates on demand. */ diff --git a/sys/kern/subr_turnstile.c b/sys/kern/subr_turnstile.c index 31d16fe..d1d68e9 100644 --- a/sys/kern/subr_turnstile.c +++ b/sys/kern/subr_turnstile.c @@ -731,6 +731,13 @@ turnstile_wait(struct turnstile *ts, struct thread *owner, int queue) LIST_INSERT_HEAD(ts-ts_free, td-td_turnstile, ts_hash); } thread_lock(td); +#if defined(SCHED_ULE) defined(SMP) +/* + * Remove load earlier so that it does not affect cpu selection + * for a thread waken up due to priority lending, if any. + */ +sched_load_rem(td); +#endif thread_lock_set(td, ts-ts_lock); td-td_turnstile = NULL; diff --git a/sys/sys/sched.h b/sys/sys/sched.h index 4b8387c..b1ead1b 100644 --- a/sys/sys/sched.h +++ b/sys/sys/sched.h @@ -110,6 +110,9 @@ voidsched_preempt(struct thread *td); voidsched_add(struct thread *td, int flags); voidsched_clock(struct thread *td); voidsched_rem(struct thread *td); +#if defined(SCHED_ULE) defined(SMP) +voidsched_load_rem(struct thread *td); +#endif voidsched_tick(int cnt); voidsched_relinquish(struct thread *td); struct thread *sched_choose(void); I found another scenario in taskqueue, in the function taskqueue_terminate, current thread tries to wake another thread up and sleep immediately, the tq_mutex sometimes is a spinlock. So if you remove one thread load from current cpu before wakeup, the resumed thread may be put on same cpu, so it will optimize the cpu scheduling too. I think
Re: Threaded 6.4 code compiled under 9.0 uses a lot more memory?..
On 2012/10/31 22:44, Karl Pielorz wrote: --On 31 October 2012 16:06 +0200 Konstantin Belousov kostik...@gmail.com wrote: Since you neglected to provide the verbatim output of procstat, nothing conclusive can be said. Obviously, you can make an investigation on your own. Sorry - when I ran it this morning the output was several hundred lines - I didn't want to post all of that to the list 99% of the lines are very similar. I can email it you off-list if having the whole lot will help? Then there's a bunch of 'large' blocks e.g.. PID STARTEND PRT RES PRES REF SHD FL TP PATH 20100x801c00x80280 rw- 28690 4 0 df 20100x802800x80340 rw- 18800 1 0 Most likely, these are malloc arenas. Ok, that's the heaviest usage. Then lots of 'little' blocks, 2010 0x70161000 0x70181000 rw- 160 1 0 ---D df And those are thread stacks. Ok, lots of those (lots of threads going on) - but they're all pretty small. Note that libc_r's thread stack is 64K, while libthr has 1M bytes per-thread. My code only has a single call to malloc, which allocates around 20k per thread. Obviously there's other libraries and stuff running with the code - so would I be correct in guessing that they are more than likely for most of these large blocks? -Karl ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: ule+smp: small optimization for turnstile priority lending
On 2012/09/18 22:05, Andriy Gapon wrote: Here is a snippet that demonstrates the issue on a supposedly fully loaded 2-processor system: 136794 0 3670427870244462 KTRGRAPH group:thread, id:Xorg tid 102818, state:running, attributes: prio:122 136793 0 3670427870241000 KTRGRAPH group:thread, id:cc1plus tid 111916, state:yielding, attributes: prio:183, wmesg:(null), lockname:(null) 136792 1 3670427870240829 KTRGRAPH group:thread, id:idle: cpu1 tid 14, state:running, attributes: prio:255 136791 1 3670427870239520 KTRGRAPH group:load, id:CPU 1 load, counter:0, attributes: none 136790 1 3670427870239248 KTRGRAPH group:thread, id:firefox tid 113473, state:blocked, attributes: prio:122, wmesg:(null), lockname:unp_mtx 136789 1 3670427870237697 KTRGRAPH group:load, id:CPU 0 load, counter:2, attributes: none 136788 1 3670427870236394 KTRGRAPH group:thread, id:firefox tid 113473, point:wokeup, attributes: linkedto:Xorg tid 102818 136787 1 3670427870236145 KTRGRAPH group:thread, id:Xorg tid 102818, state:runq add, attributes: prio:122, linkedto:firefox tid 113473 136786 1 3670427870235981 KTRGRAPH group:load, id:CPU 1 load, counter:1, attributes: none 136785 1 3670427870235707 KTRGRAPH group:thread, id:Xorg tid 102818, state:runq rem, attributes: prio:176 136784 1 3670427870235423 KTRGRAPH group:thread, id:Xorg tid 102818, point:prio, attributes: prio:176, new prio:122, linkedto:firefox tid 113473 136783 1 3670427870202392 KTRGRAPH group:thread, id:firefox tid 113473, state:running, attributes: prio:104 See how how the Xorg thread was forced from CPU 1 to CPU 0 where it preempted cc1plus thread (I do have preemption enabled) only to leave CPU 1 with zero load. Here is a proposed solution: turnstile_wait: optimize priority lending to a thread on a runqueue As the current thread is definitely going into mi_switch, it now removes its load before doing priority propagation which can potentially result in sched_add. In the SMP ULE case the latter searches for the least loaded CPU to place a boosted thread, which is supposedly about to run. diff --git a/sys/kern/sched_ule.c b/sys/kern/sched_ule.c index 8e466cd..3299cae 100644 --- a/sys/kern/sched_ule.c +++ b/sys/kern/sched_ule.c @@ -1878,7 +1878,10 @@ sched_switch(struct thread *td, struct thread *newtd, int flags) /* This thread must be going to sleep. */ TDQ_LOCK(tdq); mtx = thread_lock_block(td); - tdq_load_rem(tdq, td); +#if defined(SMP) + if ((flags SW_TYPE_MASK) != SWT_TURNSTILE) +#endif + tdq_load_rem(tdq, td); } /* * We enter here with the thread blocked and assigned to the @@ -2412,6 +2415,21 @@ sched_rem(struct thread *td) tdq_setlowpri(tdq, NULL); } +void +sched_load_rem(struct thread *td) +{ + struct tdq *tdq; + + KASSERT(td == curthread, + (sched_rem_load: only curthread is supported)); + KASSERT(td-td_oncpu == td-td_sched-ts_cpu, + (thread running on cpu different from ts_cpu)); + tdq = TDQ_CPU(td-td_sched-ts_cpu); + TDQ_LOCK_ASSERT(tdq, MA_OWNED); + MPASS(td-td_lock == TDQ_LOCKPTR(tdq)); + tdq_load_rem(tdq, td); +} + /* * Fetch cpu utilization information. Updates on demand. */ diff --git a/sys/kern/subr_turnstile.c b/sys/kern/subr_turnstile.c index 31d16fe..d1d68e9 100644 --- a/sys/kern/subr_turnstile.c +++ b/sys/kern/subr_turnstile.c @@ -731,6 +731,13 @@ turnstile_wait(struct turnstile *ts, struct thread *owner, int queue) LIST_INSERT_HEAD(ts-ts_free, td-td_turnstile, ts_hash); } thread_lock(td); +#if defined(SCHED_ULE) defined(SMP) + /* +* Remove load earlier so that it does not affect cpu selection +* for a thread waken up due to priority lending, if any. +*/ + sched_load_rem(td); +#endif thread_lock_set(td, ts-ts_lock); td-td_turnstile = NULL; diff --git a/sys/sys/sched.h b/sys/sys/sched.h index 4b8387c..b1ead1b 100644 --- a/sys/sys/sched.h +++ b/sys/sys/sched.h @@ -110,6 +110,9 @@ voidsched_preempt(struct thread *td); void sched_add(struct thread *td, int flags); void sched_clock(struct thread *td); void sched_rem(struct thread *td); +#if defined(SCHED_ULE) defined(SMP) +void sched_load_rem(struct thread *td); +#endif void sched_tick(int cnt); void sched_relinquish(struct thread *td); struct thread *sched_choose(void); I found another scenario in taskqueue, in the function taskqueue_terminate, current thread tries to wake another thread up and sleep immediately, the tq_mutex sometimes is a spinlock. So if you remove one thread load from current cpu before wakeup, the resumed thread may be put on same cpu, so it will optimize the cpu scheduling too. /* * Signal a taskqueue thread to terminate. */ static void taskqueue_terminate(struct thread **pp,
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/16 01:46, Konstantin Belousov wrote: On Tue, Aug 14, 2012 at 11:15:06PM +0800, David Xu wrote: You are requiring the thread library to implement such a mutex and other locks, that after vfork(), the mutex and other lock types must still work across processes, the PTHREAD_PROCESS_PRIVATE type of mutex and other locks now need to work in a PTHREAD_PROCESS_SHARE mode. In fact, yes. In my patch I achieve this by single-threading the parent, I still think single-threading is execussive, vfork should be fast, and because parent thread is already waiting for child process, there is no problem to reuse the parent's stack in child process, it is compatible. which means that existing _PRIVATE mutexes are enough. Well, you forget that if private mutex sleep-wakeup queue is in kernel, you only can see it in same process, otherwise it is a security problem. Now It works because it is me implementing umtx in such a way that it comparings two vmspace pointers in kernel umtx code, and treat two threads are in same process if they are same. But there are implementations do not work in this way, they simply look up lwpid in same process, and if not found, the mutex is broken. process-private and proecess-shared locks work in very different way, then your assumptions is false. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/16 07:57, David Xu wrote: On 2012/08/16 01:46, Konstantin Belousov wrote: On Tue, Aug 14, 2012 at 11:15:06PM +0800, David Xu wrote: You are requiring the thread library to implement such a mutex and other locks, that after vfork(), the mutex and other lock types must still work across processes, the PTHREAD_PROCESS_PRIVATE type of mutex and other locks now need to work in a PTHREAD_PROCESS_SHARE mode. In fact, yes. In my patch I achieve this by single-threading the parent, I still think single-threading is execussive, vfork should be fast, and because parent thread is already waiting for child process, there is no problem to reuse the parent's stack in child process, it is compatible. which means that existing _PRIVATE mutexes are enough. Well, you forget that if private mutex sleep-wakeup queue is in kernel, you only can see it in same process, otherwise it is a security problem. Now It works because it is me implementing umtx in such a way that it comparings two vmspace pointers in kernel umtx code, and treat two threads are in same process if they are same. But there are implementations do not work in this way, they simply look up lwpid in same process, and if not found, the mutex is broken. process-private and proecess-shared locks work in very different way, then your assumptions is false. I must say my implementation is a lucky, not is the intention. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/16 01:49, Konstantin Belousov wrote: On Wed, Aug 15, 2012 at 07:40:04AM +0800, David Xu wrote: On 2012/08/15 05:09, Jilles Tjoelker wrote: On Tue, Aug 14, 2012 at 11:15:06PM +0800, David Xu wrote: But in real word, pthread atfork handlers are not async-signal safe, they mostly do mutex locking and unlocking to keep consistent state, mutex is not async-signal safe. The malloc prefork and postfork handlers happen to work because I have some hacking code in library for malloc locks. Otherwise, you even can not use fork() in signal handler. This problem was also reported to the Austin Group at http://austingroupbugs.net/view.php?id=62 Atfork handlers are inherently async-signal-unsafe. An interpretation was issued suggesting to remove fork() from the list of async-signal-safe functions and add a new async-signal-safe function _Fork() which does not call the atfork handlers. This change will however not be in POSIX.1-2008 TC1 but only in the next issue (SUSv5). A slightly earlier report http://austingroupbugs.net/view.php?id=18 just requested the _Fork() function because an existing application deadlocked when calling fork() from a signal handler. Thanks, although SUSv5 will have _Fork(), but application will not catch up. One solution for this problem is thread library does not execute atfork handler when fork() is called from signal handler, but it requires some work to be done in thread library's signal wrapper, for example, set a flag that the thread is executing signal handler, but siglongjmp can mess the flag, so I have to tweak sigsetjmp and siglongjmp to save/restore the flag, I have such a patch: it fetches target stack pointer stored in jmpbuf, and compare it with top most stack pointer when a first signal was delivered to the thread, if the target stack pointer is larger than the top most stack pointer, the flag is cleared. I do not understand how this interacts with altstacks. Also, there are longjmp()s implemented outside the base, e.g. in the libunwind, which cannot be fixed this way. Also, there are language runtimes that relies heavily on the (synchronous) signals and which use their internal analogues of the stack unwinding, which again be broken by such approach. My patch is very experimental. There are setcontext and getcontext which also can break it. Another solution would save a flag into jmpbuf or ucontext, and indicates the signal handler is executing. a setjmp or getcontext executed in normal context would not have such a flag, but if they executes in signal handler, the per-thread flag will be saved. but it requires lots of changes, and setcontext and getcontext are syscall, kernel does know such a userland flag, unless they are shared between kernel and userland. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/14 16:18, Konstantin Belousov wrote: On Tue, Aug 14, 2012 at 12:42:15PM +0800, David Xu wrote: I simply duplicated idea from OpenSolaris, here is my patch which has similar feature as your patch, and it also tries to prevent vforked child from corrupting parent's data: http://people.freebsd.org/~davidxu/patch/libthr-vfork.diff You shall not return from vfork() frame in the child. Otherwise, the same frame is appears to be destroyed in parent, and parent dies. More often on !x86, but right combination of events on x86 is deadly too. If pid or curthread local variables are spilled into stack save area, then child will override them, and e.g. parent could see pid == 0, returning it to caller. This was the reason why I went to asm wrapper for vfork. OK. Also, it seems that in mt process, malloc and rtld are still broken, or am I missing something ? I will not call it as broken, malloc and rtld are working properly. vfork is not a normal function, for MT process, fork() is even very special too. POSIX says a lot about multi-threaded: http://pubs.opengroup.org/onlinepubs/95399/functions/fork.html I quoted some POSIX document here: A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. [THR] Fork handlers may be established by means of the pthread_atfork() function in order to maintain application invariants across fork() calls. This means child process should only do very simple things, and quickly call execv(). For mt process, fork() is already a very complicated problem, one of problems I still remembered is that when fork() is called in signal handler, should the thread library execute pthread_atfork handler ? if it should do, but none of lock is async-signal safe, though our internal rwlock allows to be used in signal handler, but it is not supported by POSIX. Also are those atfork handler prepared to be executed in signal handler ? it is undefined. POSIX had opened a door here. Above is one of complicated problem, the vfork is even more restrictive than fork(). If it is possible, I would avoid such a complicated problem which vfork() would cause. Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/14 17:41, Konstantin Belousov wrote: On Tue, Aug 14, 2012 at 05:16:56PM +0800, David Xu wrote: On 2012/08/14 16:18, Konstantin Belousov wrote: On Tue, Aug 14, 2012 at 12:42:15PM +0800, David Xu wrote: I simply duplicated idea from OpenSolaris, here is my patch which has similar feature as your patch, and it also tries to prevent vforked child from corrupting parent's data: http://people.freebsd.org/~davidxu/patch/libthr-vfork.diff You shall not return from vfork() frame in the child. Otherwise, the same frame is appears to be destroyed in parent, and parent dies. More often on !x86, but right combination of events on x86 is deadly too. If pid or curthread local variables are spilled into stack save area, then child will override them, and e.g. parent could see pid == 0, returning it to caller. This was the reason why I went to asm wrapper for vfork. OK. Also, it seems that in mt process, malloc and rtld are still broken, or am I missing something ? I will not call it as broken, malloc and rtld are working properly. vfork is not a normal function, for MT process, fork() is even very special too. POSIX says a lot about multi-threaded: http://pubs.opengroup.org/onlinepubs/95399/functions/fork.html I quoted some POSIX document here: A process shall be created with a single thread. If a multi-threaded process calls fork(), the new process shall contain a replica of the calling thread and its entire address space, possibly including the states of mutexes and other resources. Consequently, to avoid errors, the child process may only execute async-signal-safe operations until such time as one of the exec functions is called. [THR] Fork handlers may be established by means of the pthread_atfork() function in order to maintain application invariants across fork() calls. This means child process should only do very simple things, and quickly call execv(). Sure. But to call execv*, the child may need a working rtld, so we fixed it. User code routinely called malloc() or even created threads in the child, so we fixed that too. Otherwise, we have nothing to answer for the demands, and 'other' OSes do support such usage. It is beyond POSIX, but this does not matter, since the feature is expected to be available by application writers. For mt process, fork() is already a very complicated problem, one of problems I still remembered is that when fork() is called in signal handler, should the thread library execute pthread_atfork handler ? if it should do, but none of lock is async-signal safe, though our internal rwlock allows to be used in signal handler, but it is not supported by POSIX. Also are those atfork handler prepared to be executed in signal handler ? it is undefined. POSIX had opened a door here. POSIX authors were aware of this problem. In the rationale for SUSv4, they wrote While the fork() function is async-signal-safe, there is no way for an implementation to determine whether the fork handlers established by pthread_atfork() are async-signal-safe. The fork handlers may attempt to execute portions of the implementation that are not async-signal-safe, such as those that are protected by mutexes, leading to a deadlock condition. It is therefore undefined for the fork handlers to execute functions that are not async-signal-safe when fork() is called from a signal handler. IMO, since fork() is specified to be async-signal safe, and since fork() is specified to call atfork() handlers, SUSv4 requires, without any misinterpreations, that atfork calling machinery must be async-signal safe. The only possibility for undefined behaviour is the application code registering non-async safe handlers. But in real word, pthread atfork handlers are not async-signal safe, they mostly do mutex locking and unlocking to keep consistent state, mutex is not async-signal safe. The malloc prefork and postfork handlers happen to work because I have some hacking code in library for malloc locks. Otherwise, you even can not use fork() in signal handler. Above is one of complicated problem, the vfork is even more restrictive than fork(). If it is possible, I would avoid such a complicated problem which vfork() would cause. I fully agree that the issues caused by vfork() in multithreaded code are complicated, but ignoring them lowers the quality of our implementation. Fixing vfork in multithreaded process is not trivial, but it is possible. My patch aims at working rtld and malloc in child. As I said earlier, we might even try to call parent atfork handlers in child. Sure, if child dies at wrong time, then rtld and malloc locks and data structures can be left in unusable state for parent, but currently we do not work even if child is relatively well-behaving. You are requiring the thread library to implement such a mutex and other locks, that after vfork(), the mutex and other lock types must still work across processes, the PTHREAD_PROCESS_PRIVATE type of mutex and other locks now need
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/15 05:09, Jilles Tjoelker wrote: On Tue, Aug 14, 2012 at 11:15:06PM +0800, David Xu wrote: But in real word, pthread atfork handlers are not async-signal safe, they mostly do mutex locking and unlocking to keep consistent state, mutex is not async-signal safe. The malloc prefork and postfork handlers happen to work because I have some hacking code in library for malloc locks. Otherwise, you even can not use fork() in signal handler. This problem was also reported to the Austin Group at http://austingroupbugs.net/view.php?id=62 Atfork handlers are inherently async-signal-unsafe. An interpretation was issued suggesting to remove fork() from the list of async-signal-safe functions and add a new async-signal-safe function _Fork() which does not call the atfork handlers. This change will however not be in POSIX.1-2008 TC1 but only in the next issue (SUSv5). A slightly earlier report http://austingroupbugs.net/view.php?id=18 just requested the _Fork() function because an existing application deadlocked when calling fork() from a signal handler. Thanks, although SUSv5 will have _Fork(), but application will not catch up. One solution for this problem is thread library does not execute atfork handler when fork() is called from signal handler, but it requires some work to be done in thread library's signal wrapper, for example, set a flag that the thread is executing signal handler, but siglongjmp can mess the flag, so I have to tweak sigsetjmp and siglongjmp to save/restore the flag, I have such a patch: it fetches target stack pointer stored in jmpbuf, and compare it with top most stack pointer when a first signal was delivered to the thread, if the target stack pointer is larger than the top most stack pointer, the flag is cleared. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/11 21:10, Jilles Tjoelker wrote: On Fri, Aug 10, 2012 at 10:16:04AM +0800, David Xu wrote: On 2012/08/09 18:56, Jilles Tjoelker wrote: On Mon, Aug 06, 2012 at 11:25:35AM +0300, Konstantin Belousov wrote: On Sun, Aug 05, 2012 at 11:54:32PM +0200, Jilles Tjoelker wrote: On Mon, Jul 30, 2012 at 01:53:03PM +0300, Konstantin Belousov wrote: On Mon, Jul 30, 2012 at 12:24:08PM +0200, Jilles Tjoelker wrote: People sometimes use system() from large address spaces where it would improve performance greatly to use vfork() instead of fork(). A simple approach is to change fork() to vfork(), although I have not tried this. It seems safe enough to use sigaction and sigprocmask system calls in the vforked process. Alternatively, we can have posix_spawn() do the vfork() with signal changes. This avoids possible whining from compilers and static analyzers about using vfork() in system.c. However, I do not like the tricky code for signals and that it adds lines of code. This is lightly tested. It is interesting to note that for some time our vfork(2) no longer stops the whole forked process (parent), only the forking thread is waiting for the child exit or exec. I am not sure is this point important for system(3), but determined code can notice the difference from the fork-vfork switch. Neither fork nor vfork call thread_single(SINGLE_BOUNDARY), so this is not a difference. It is the difference, because vforked child shares parent address space. Thread singling may be noticeable from a failing execve() (but only in the process doing execve()) and in the rare case of rfork() without RFPROC. No, other running threads in parent affect vforked child till exec or exit. In fact, I would classify this as bug, but not a serious one. There are some ugly ways this parallel execution is depended on. If the vforked child calls sigaction() while another thread is also in sigaction() for that signal, the vforked child needs to wait for the other thread to release the lock. This uses a per-process lock to synchronize threads in different processes, which may not work properly. If the vforked child is killed (such as by SIGKILL) while holding the lock, the parent is not killed but its _thr_sigact is damaged. These problems could be avoided in libthr by skipping the lock in _sigaction() if a signal action is being set to SIG_DFL or SIG_IGN and the old action is not queried. In those cases, _thr_sigact is not touched so no lock is required. This change also helps applications, provided they call sigaction() and not signal(). Alternatively, posix_spawn() and system() could use the sigaction system call directly, bypassing libthr (if present). However, this will not help applications that call vfork() and sigaction() themselves (such as a shell that wants to implement ... using vfork()). posix_spawn() likely still needs some adjustment so that having it reset all signals (sigfillset()) to the default action will not cause it to [EINVAL] because libthr does not allow changing SIGTHR's disposition. Index: lib/libthr/thread/thr_sig.c === --- lib/libthr/thread/thr_sig.c (revision 238970) +++ lib/libthr/thread/thr_sig.c (working copy) @@ -519,8 +519,16 @@ return (-1); } - if (act) + if (act) { newact = *act; + /* +* Short-circuit cases where we do not touch _thr_sigact. +* This allows performing these safely in a vforked child. +*/ + if ((newact.sa_handler == SIG_DFL || + newact.sa_handler == SIG_IGN) oact == NULL) + return (__sys_sigaction(sig,newact, NULL)); + } __sys_sigprocmask(SIG_SETMASK,_thr_maskset,oldset); _thr_rwl_wrlock(_thr_sigact[sig-1].lock); Your patch is better than nothing, I don't object. The problem is visble to you, but there is also invisible user - rtld. If a symbol is never used in parent process, but now it is used in a vforked child, the rtld will be involved, if the child is killed, the rtld's data structure may be in inconsistent state, such as locking or link list etcs... I think this problem might be a non-fixable problem. Hmm. Rtld cannot be fixed like libthr because its data structures are inherently in userland. Perhaps signal handling should be different for a vforked child, like the default action of a signal sent to a thread affects the entire process and not just the thread. This cannot be implemented in the calling code because resolving execve() itself also needs rtld (ugly hacks like performing an execve() call that is guaranteed to fail aside). The rtld problem can be avoided specifically by linking with '-z now'. This might be acceptable for sh and csh; most applications can use posix_spawn() which would have to become a system call. Or the libc's posix_spawn() should use each system call directly, it should not call
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/13 19:50, Konstantin Belousov wrote: On Sun, Aug 12, 2012 at 08:11:29AM +0800, David Xu wrote: On 2012/08/10 18:13, Konstantin Belousov wrote: On Thu, Aug 09, 2012 at 02:08:50PM +0300, Konstantin Belousov wrote: Third alternative, which seems to be even better, is to restore single-threading of the parent for vfork(). single-threading is slow for large threaded process, don't know if it is necessary for vfork(), POSIX says nothing about threaded process. I agree that with both of your statements. But, being fast but allowing silent process corruption is not good behaviour. Either we need to actually support vfork() for threaded processes, or disable it with some error code. I prefer to support it. I believe that vfork() should be wrapped by libthr in the same way as fork() is wrapped. Not sure should we call atfork handlers, for now I decided not to call, since the handlers assume separate address spaces for parent/child. But we could only call parent handler in child, however weird this sounds. The real complication with wrapping is the fact that we cannot return from wrapper in child without destroying parent state. So I tried to prototype the code to handle the wrapping in the same frame, thus neccessity of using asm. Below is WIP, only for amd64 ATM. diff --git a/lib/libthr/arch/amd64/Makefile.inc b/lib/libthr/arch/amd64/Makefile.inc index e6d99ec..476d26a 100644 --- a/lib/libthr/arch/amd64/Makefile.inc +++ b/lib/libthr/arch/amd64/Makefile.inc @@ -1,3 +1,4 @@ #$FreeBSD$ -SRCS+= pthread_md.c _umtx_op_err.S +CFLAGS+=-I${.CURDIR}/../libc/${MACHINE_CPUARCH} +SRCS+= pthread_md.c _umtx_op_err.S vfork.S diff --git a/lib/libthr/arch/amd64/amd64/vfork.S b/lib/libthr/arch/amd64/amd64/vfork.S new file mode 100644 index 000..07d813d --- /dev/null +++ b/lib/libthr/arch/amd64/amd64/vfork.S @@ -0,0 +1,74 @@ +/*- + * Copyright (c) 2012 Konstantin Belousov k...@freebsd.org + * Copyright (c) 1990 The Regents of the University of California. + * All rights reserved. + * + * This code is derived from software contributed to Berkeley by + * William Jolitz. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + *notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + *notice, this list of conditions and the following disclaimer in the + *documentation and/or other materials provided with the distribution. + * 4. Neither the name of the University nor the names of its contributors + *may be used to endorse or promote products derived from this software + *without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND + * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE + * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE + * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE + * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL + * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS + * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) + * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT + * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY + * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF + * SUCH DAMAGE. + */ + +#if defined(SYSLIBC_SCCS) !defined(lint) + .asciz @(#)Ovfork.s 5.1 (Berkeley) 4/23/90 +#endif /* SYSLIBC_SCCS and not lint */ +#include machine/asm.h +__FBSDID($FreeBSD$); + +#include SYS.h + + .weak _vfork + .set_vfork,__sys_vfork + .weak vfork + .setvfork,__sys_vfork +ENTRY(__sys_vfork) + call_thr_vfork_pre + popq%rsi/* fetch return address (%rsi preserved) */ + mov $SYS_vfork,%rax + KERNCALL + jb 2f + cmpl$0,%eax + jne 1f + pushq %rsi + pushq %rsi /* twice for stack alignment */ + call_thr_vfork_post + popq%rsi + popq%rsi + xorl%eax,%eax +1: + jmp *%rsi +2: + pushq %rsi + pushq %rax + call_thr_vfork_post + callPIC_PLT(CNAME(__error)) + popq%rdx + movq%rdx,(%rax) + movq$-1,%rax + movq$-1,%rdx + retq +END(__sys_vfork) + + .section .note.GNU-stack,,%progbits diff --git a/lib/libthr/pthread.map b/lib/libthr/pthread.map index 355edea..40d14b4 100644 --- a/lib/libthr/pthread.map +++ b/lib/libthr/pthread.map @@ -157,6 +157,7 @@ FBSD_1.0 { system; tcdrain; usleep; + vfork; wait; wait3; wait4; diff --git a/lib/libthr
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/09 18:56, Jilles Tjoelker wrote: On Mon, Aug 06, 2012 at 11:25:35AM +0300, Konstantin Belousov wrote: On Sun, Aug 05, 2012 at 11:54:32PM +0200, Jilles Tjoelker wrote: On Mon, Jul 30, 2012 at 01:53:03PM +0300, Konstantin Belousov wrote: On Mon, Jul 30, 2012 at 12:24:08PM +0200, Jilles Tjoelker wrote: People sometimes use system() from large address spaces where it would improve performance greatly to use vfork() instead of fork(). A simple approach is to change fork() to vfork(), although I have not tried this. It seems safe enough to use sigaction and sigprocmask system calls in the vforked process. Alternatively, we can have posix_spawn() do the vfork() with signal changes. This avoids possible whining from compilers and static analyzers about using vfork() in system.c. However, I do not like the tricky code for signals and that it adds lines of code. This is lightly tested. It is interesting to note that for some time our vfork(2) no longer stops the whole forked process (parent), only the forking thread is waiting for the child exit or exec. I am not sure is this point important for system(3), but determined code can notice the difference from the fork-vfork switch. Neither fork nor vfork call thread_single(SINGLE_BOUNDARY), so this is not a difference. It is the difference, because vforked child shares parent address space. Thread singling may be noticeable from a failing execve() (but only in the process doing execve()) and in the rare case of rfork() without RFPROC. No, other running threads in parent affect vforked child till exec or exit. In fact, I would classify this as bug, but not a serious one. There are some ugly ways this parallel execution is depended on. If the vforked child calls sigaction() while another thread is also in sigaction() for that signal, the vforked child needs to wait for the other thread to release the lock. This uses a per-process lock to synchronize threads in different processes, which may not work properly. If the vforked child is killed (such as by SIGKILL) while holding the lock, the parent is not killed but its _thr_sigact is damaged. These problems could be avoided in libthr by skipping the lock in _sigaction() if a signal action is being set to SIG_DFL or SIG_IGN and the old action is not queried. In those cases, _thr_sigact is not touched so no lock is required. This change also helps applications, provided they call sigaction() and not signal(). Alternatively, posix_spawn() and system() could use the sigaction system call directly, bypassing libthr (if present). However, this will not help applications that call vfork() and sigaction() themselves (such as a shell that wants to implement ... using vfork()). posix_spawn() likely still needs some adjustment so that having it reset all signals (sigfillset()) to the default action will not cause it to [EINVAL] because libthr does not allow changing SIGTHR's disposition. Index: lib/libthr/thread/thr_sig.c === --- lib/libthr/thread/thr_sig.c (revision 238970) +++ lib/libthr/thread/thr_sig.c (working copy) @@ -519,8 +519,16 @@ return (-1); } - if (act) + if (act) { newact = *act; + /* +* Short-circuit cases where we do not touch _thr_sigact. +* This allows performing these safely in a vforked child. +*/ + if ((newact.sa_handler == SIG_DFL || + newact.sa_handler == SIG_IGN) oact == NULL) + return (__sys_sigaction(sig,newact, NULL)); + } __sys_sigprocmask(SIG_SETMASK,_thr_maskset,oldset); _thr_rwl_wrlock(_thr_sigact[sig-1].lock); I simply duplicated idea from OpenSolaris, here is my patch which has similar feature as your patch, and it also tries to prevent vforked child from corrupting parent's data: http://people.freebsd.org/~davidxu/patch/libthr-vfork.diff ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/10 18:13, Konstantin Belousov wrote: On Thu, Aug 09, 2012 at 02:08:50PM +0300, Konstantin Belousov wrote: Third alternative, which seems to be even better, is to restore single-threading of the parent for vfork(). single-threading is slow for large threaded process, don't know if it is necessary for vfork(), POSIX says nothing about threaded process. I mean this patch. diff --git a/sys/kern/kern_fork.c b/sys/kern/kern_fork.c index 6cb95cd..e59ee21 100644 --- a/sys/kern/kern_fork.c +++ b/sys/kern/kern_fork.c @@ -756,7 +756,7 @@ fork1(struct thread *td, int flags, int pages, struct proc **procp, struct thread *td2; struct vmspace *vm2; vm_ooffset_t mem_charged; - int error; + int error, single_threaded; static int curfail; static struct timeval lastfail; #ifdef PROCDESC @@ -815,6 +815,19 @@ fork1(struct thread *td, int flags, int pages, struct proc **procp, } #endif + if (((p1-p_flag (P_HADTHREADS | P_SYSTEM)) == P_HADTHREADS) + (flags RFPPWAIT) != 0) { + PROC_LOCK(p1); + if (thread_single(SINGLE_BOUNDARY)) { + PROC_UNLOCK(p1); + error = ERESTART; + goto fail2; + } + PROC_UNLOCK(p1); + single_threaded = 1; + } else + single_threaded = 0; + mem_charged = 0; vm2 = NULL; if (pages == 0) @@ -945,6 +958,12 @@ fail1: if (vm2 != NULL) vmspace_free(vm2); uma_zfree(proc_zone, newproc); + if (single_threaded) { + PROC_LOCK(p1); + thread_single_end(); + PROC_UNLOCK(p1); + } +fail2: #ifdef PROCDESC if (((flags RFPROCDESC) != 0) (fp_procdesc != NULL)) { fdclose(td-td_proc-p_fd, fp_procdesc, *procdescp, td); diff --git a/tools/test/pthread_vfork/pthread_vfork_test.c b/tools/test/pthread_vfork/pthread_vfork_test.c index e004727..88956c2 100644 --- a/tools/test/pthread_vfork/pthread_vfork_test.c +++ b/tools/test/pthread_vfork/pthread_vfork_test.c @@ -29,6 +29,8 @@ #include sys/cdefs.h __FBSDID($FreeBSD$); +#include sys/types.h +#include sys/wait.h #include err.h #include pthread.h #include signal.h @@ -39,10 +41,11 @@ __FBSDID($FreeBSD$); #define NUM_THREADS 100 -void * -vfork_test(void *threadid) +static void * +vfork_test(void *threadid __unused) { - pid_t pid; + pid_t pid, wpid; + int status; for (;;) { pid = vfork(); @@ -50,10 +53,20 @@ vfork_test(void *threadid) _exit(0); else if (pid == -1) err(1, Failed to vfork); + else { + wpid = waitpid(pid, status, 0); + if (wpid == -1) + err(1, waitpid); + } } return (NULL); } +static void +sighandler(int signo __unused) +{ +} + /* * This program invokes multiple threads and each thread calls * vfork() system call. @@ -63,19 +76,24 @@ main(void) { pthread_t threads[NUM_THREADS]; struct sigaction reapchildren; + sigset_t sigchld_mask; int rc, t; memset(reapchildren, 0, sizeof(reapchildren)); - reapchildren.sa_handler = SIG_IGN; - - /* Automatically reap zombies. */ + reapchildren.sa_handler = sighandler; if (sigaction(SIGCHLD, reapchildren, NULL) == -1) err(1, Could not sigaction(SIGCHLD)); + sigemptyset(sigchld_mask); + sigaddset(sigchld_mask, SIGCHLD); + if (sigprocmask(SIG_BLOCK, sigchld_mask, NULL) == -1) + err(1, sigprocmask); + for (t = 0; t NUM_THREADS; t++) { rc = pthread_create(threads[t], NULL, vfork_test, (void *)t); if (rc) errc(1, rc, pthread_create); } + pause(); return (0); } ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: system() using vfork() or posix_spawn() and libthr
On 2012/08/09 18:56, Jilles Tjoelker wrote: On Mon, Aug 06, 2012 at 11:25:35AM +0300, Konstantin Belousov wrote: On Sun, Aug 05, 2012 at 11:54:32PM +0200, Jilles Tjoelker wrote: On Mon, Jul 30, 2012 at 01:53:03PM +0300, Konstantin Belousov wrote: On Mon, Jul 30, 2012 at 12:24:08PM +0200, Jilles Tjoelker wrote: People sometimes use system() from large address spaces where it would improve performance greatly to use vfork() instead of fork(). A simple approach is to change fork() to vfork(), although I have not tried this. It seems safe enough to use sigaction and sigprocmask system calls in the vforked process. Alternatively, we can have posix_spawn() do the vfork() with signal changes. This avoids possible whining from compilers and static analyzers about using vfork() in system.c. However, I do not like the tricky code for signals and that it adds lines of code. This is lightly tested. It is interesting to note that for some time our vfork(2) no longer stops the whole forked process (parent), only the forking thread is waiting for the child exit or exec. I am not sure is this point important for system(3), but determined code can notice the difference from the fork-vfork switch. Neither fork nor vfork call thread_single(SINGLE_BOUNDARY), so this is not a difference. It is the difference, because vforked child shares parent address space. Thread singling may be noticeable from a failing execve() (but only in the process doing execve()) and in the rare case of rfork() without RFPROC. No, other running threads in parent affect vforked child till exec or exit. In fact, I would classify this as bug, but not a serious one. There are some ugly ways this parallel execution is depended on. If the vforked child calls sigaction() while another thread is also in sigaction() for that signal, the vforked child needs to wait for the other thread to release the lock. This uses a per-process lock to synchronize threads in different processes, which may not work properly. If the vforked child is killed (such as by SIGKILL) while holding the lock, the parent is not killed but its _thr_sigact is damaged. These problems could be avoided in libthr by skipping the lock in _sigaction() if a signal action is being set to SIG_DFL or SIG_IGN and the old action is not queried. In those cases, _thr_sigact is not touched so no lock is required. This change also helps applications, provided they call sigaction() and not signal(). Alternatively, posix_spawn() and system() could use the sigaction system call directly, bypassing libthr (if present). However, this will not help applications that call vfork() and sigaction() themselves (such as a shell that wants to implement ... using vfork()). posix_spawn() likely still needs some adjustment so that having it reset all signals (sigfillset()) to the default action will not cause it to [EINVAL] because libthr does not allow changing SIGTHR's disposition. Index: lib/libthr/thread/thr_sig.c === --- lib/libthr/thread/thr_sig.c (revision 238970) +++ lib/libthr/thread/thr_sig.c (working copy) @@ -519,8 +519,16 @@ return (-1); } - if (act) + if (act) { newact = *act; + /* +* Short-circuit cases where we do not touch _thr_sigact. +* This allows performing these safely in a vforked child. +*/ + if ((newact.sa_handler == SIG_DFL || + newact.sa_handler == SIG_IGN) oact == NULL) + return (__sys_sigaction(sig,newact, NULL)); + } __sys_sigprocmask(SIG_SETMASK,_thr_maskset,oldset); _thr_rwl_wrlock(_thr_sigact[sig-1].lock); Your patch is better than nothing, I don't object. The problem is visble to you, but there is also invisible user - rtld. If a symbol is never used in parent process, but now it is used in a vforked child, the rtld will be involved, if the child is killed, the rtld's data structure may be in inconsistent state, such as locking or link list etcs... I think this problem might be a non-fixable problem. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Enumerating sleeping threads
On 2012/8/2 10:12, Daniel Rudy wrote: Hello, What is the best way to enumerate the sleeping threads via sleepqueue(9)? Furthermore, when enumerating the threads that are on the run queue, what locks are needed, if any? sleepqueue hash bucket is private data structure in subr_sleepqueue.c, I think you can not access it outside of the file. One way to enumerate the sleeping threads is iterating all threads in the system, and check their states. proc.h contains two macros: FOREACH_PROC_IN_SYSTEM FOREACH_THREAD_IN_PROC To access thread state, you should use thread lock, call thread_lock() and thread_unlock(). thread lock is not fixed, it might be sleep-queue's spinlock or per-cpu runqueue lock, there even is a blocked spin-lock for intermediate state change. Thank you. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: port devel/doxygen failing to test on -CURRENT and -STABLE
On 2012/07/08 18:21, Chris Rees wrote: Hi all / David, doxygen has been failing for a while now on -CURRENT and apparently -STABLE too. The current fix is disabling one of the tests in the build, but obviously it points to a problem with our base system I've trussed [1] the failing code [2], and it looks as though it's hanging on a _umtx call. I'm gratuitously ignorant of what goes on there... but the timings of recent commits to umtx.h [3] could indicate a link (hope it's not bogus...). Any pointers on what I should do next? Chris [1] http://www.bayofrum.net/~crees/scratch/doxygen-truss _umtx_op(0x8012b0280,0x16,0x0,0x0,0x0,0x1) ERR#22 'Invalid argument' can you execute it in gdb and print its value ? print/x *(int *)0x8012b0280 print/x *(int *)(0x8012b0280+4) [2] http://www.bayofrum.net/tb/index.php?action=display_markup_logbuild=10-localid=1037 [3] http://svnweb.freebsd.org/base/head/sys/sys/umtx.h?view=log ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Fast syscalls via sysenter
On 2012/06/21 20:11, John Baldwin wrote: On Monday, June 18, 2012 2:56:30 pm Daniil Cherednik wrote: Hi! I am trying to continue the work started by DavidXu on implemention of fast syscalls via sysenter/sysexit. http://people.freebsd.org/~davidxu/sysenter/kernel/ I have ported it on FreeBSD9. It looks like it works. Unfortunately I am a beginner in kernel so I have some questions: 1. see http://people.freebsd.org/~davidxu/sysenter/kernel/kernel.patch /* * If %edx was changed, we can not use sysexit, because it * needs %edx to restore userland %eip. */ if (orig_edx != frame.tf_edx) td-td_pcb-pcb_flags |= PCB_FULLCTX; What is the reason why we have to do this additional check? In http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s we store %edx to the stack in pushl %edx /* ring 3 next %eip */ and we restore the register in popl%edx/* ring 3 %eip */ Some system calls return two return values (pipe(2)) or return a 64-bit off_t (lseek(2)). Those system calls change %edx's value and need that changed value to make it out to userland. 2. see http://people.freebsd.org/~davidxu/sysenter/kernel/sysenter.s movlPCPU(CURPCB),%esi callsyscall Why do we movl PCPU(CURPCB),%esi before calling syscall? syscall is just c- function. No clue on this one, looks like it is not needed. [kib@ is cc'ed] I implemented the sysenter syscall long time ago, it indeed can reduce system call overhead on i386. I think it might be the time to implement linux like vdso syscall now based on the work kib@ recently has done, though I don''t know how to hook it into kib's code. I quick googled it, and found they put some data into aux vector: http://www.trilithium.com/johan/2005/08/linux-gate/ http://www.takatan.net/lxr/source/arch/um/os-Linux/elf_aux.c?a=x86_64#L40 Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: High-res Timers
On 2012/5/17 4:59, Brandon Falk wrote: Does anyone have a quick list of high-resolution timer functions? Both user-land and kernel-land? It would be greatly appreciated (doing some performance timing for applications). -Brandon AFAIK, there is no high-resolution timer available. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Startvation of realtime piority threads
On 2012/4/5 9:54, Sushanth Rai wrote: I have a multithreaded user space program that basically runs at realtime priority. Synchronization between threads are done using spinlock. When running this program on a SMP system under heavy memory pressure I see that thread holding the spinlock is starved out of cpu. The cpus are effectively consumed by other threads that are spinning for lock to become available. After instrumenting the kernel a little bit what I found was that under memory pressure, when the user thread holding the spinlock traps into the kernel due to page fault, that thread sleeps until the free pages are available. The thread sleeps PUSER priority (within vm_waitpfault()). When it is ready to run, it is queued at PUSER priority even thought it's base priority is realtime. The other siblings threads that are spinning at realtime priority to acquire the spinlock starves the owner of spinlock. I was wondering if the sleep in vm_waitpfault() should be a MAX(td_user_pri, PUSER) instead of just PUSER. I'm running on 7.2 and it looks like this logic is the same in the trunk. Thanks, Sushanth I think 7.2 still has libkse which supports static priority scheduling, if performance is not important but correctness, you may try libkse with process-scope threads, and use priority-inherit mutex to do locking. Kernel is known to be vulnerable to support user realtime threads. I think not every-locking primitive can support priority propagation, this is an issue. In userland, internal library mutexes are not priority-inherit, so starvation may happen too. If you know what you are doing, don't call such functions which uses internal mutexes, but this is rather difficult. Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Startvation of realtime piority threads
On 2012/4/5 11:56, Konstantin Belousov wrote: On Wed, Apr 04, 2012 at 06:54:06PM -0700, Sushanth Rai wrote: I have a multithreaded user space program that basically runs at realtime priority. Synchronization between threads are done using spinlock. When running this program on a SMP system under heavy memory pressure I see that thread holding the spinlock is starved out of cpu. The cpus are effectively consumed by other threads that are spinning for lock to become available. After instrumenting the kernel a little bit what I found was that under memory pressure, when the user thread holding the spinlock traps into the kernel due to page fault, that thread sleeps until the free pages are available. The thread sleeps PUSER priority (within vm_waitpfault()). When it is ready to run, it is queued at PUSER priority even thought it's base priority is realtime. The other siblings threads that are spinning at realtime priority to acquire the spinlock starves the owner of spinlock. I was wondering if the sleep in vm_waitpfault() should be a MAX(td_user_pri, PUSER) instead of just PUSER. I'm running on 7.2 and it looks like this logic is the same in the trunk. It just so happen that your program stumbles upon a single sleep point in the kernel. If for whatever reason the thread in kernel is put off CPU due to failure to acquire any resource without priority propagation, you would get the same effect. Only blockable primitives do priority propagation, that are mutexes and rwlocks, AFAIR. In other words, any sx/lockmgr/sleep points are vulnerable to the same issue. This is why I suggested that POSIX realtime priority should not be boosted, it should be only higher than PRI_MIN_TIMESHARE but lower than any priority all msleep() callers provided. The problem is userland realtime thread 's busy looping code can cause starvation a thread in kernel which holding a critical resource. In kernel we can avoid to write dead-loop code, but userland code is not trustable. If you search Realtime thread priorities in 2010-december within @arch list. you may find the argument. Speaking of exactly your problem, did you considered wiring the memory of your realtime process ? This is a common practice, used e.g. by ntpd. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: [RFT][patch] Scheduling for HTT and not only
On 2012/2/6 15:44, Alexander Motin wrote: On 06.02.2012 09:40, David Xu wrote: On 2012/2/6 15:04, Alexander Motin wrote: Hi. I've analyzed scheduler behavior and think found the problem with HTT. SCHED_ULE knows about HTT and when doing load balancing once a second, it does right things. Unluckily, if some other thread gets in the way, process can be easily pushed out to another CPU, where it will stay for another second because of CPU affinity, possibly sharing physical core with something else without need. I've made a patch, reworking SCHED_ULE affinity code, to fix that: http://people.freebsd.org/~mav/sched.htt.patch This patch does three things: - Disables strict affinity optimization when HTT detected to let more sophisticated code to take into account load of other logical core(s). Yes, the HTT should first be skipped, looking up in upper layer to find a more idling physical core. At least, if system is a dual-core, 4-thread CPU, and if there are two busy threads, they should be run on different physical cores. - Adds affinity support to the sched_lowest() function to prefer specified (last used) CPU (and CPU groups it belongs to) in case of equal load. Previous code always selected first valid CPU of evens. It caused threads migration to lower CPUs without need. Even some level of imbalance can be borne, until it exceeds a threshold, this at least does not trash other cpu's cache, pushing a new thread to another cpu trashes its cache. The cpus and groups can be arranged in a circle list, so searching a lowest load cpu always starts from right neighborhood to tail, then circles from head to left neighborhood. - If current CPU group has no CPU where the process with its priority can run now, sequentially check parent CPU groups before doing global search. That should improve affinity for the next cache levels. I've made several different benchmarks to test it, and so far results look promising: - On Atom D525 (2 physical cores + HTT) I've tested HTTP receive with fetch and FTP transmit with ftpd. On receive I've got 103MB/s on interface; on transmit somewhat less -- about 85MB/s. In both cases scheduler kept interrupt thread and application on different physical cores. Without patch speed fluctuating about 103-80MB/s on receive and is about 85MB/s on transmit. - On the same Atom I've tested TCP speed with iperf and got mostly the same results: - receive to Atom with patch -- 755-765Mbit/s, without patch -- 531-765Mbit/s. - transmit from Atom in both cases 679Mbit/s. Fluctuating receive behavior in both tests I think can be explained by some heavy callout handled by the swi4:clock process, called on receive (seen in top and schedgraph), but not on transmit. May be it is specifics of the Realtek NIC driver. - On the same Atom tested number of 512 byte reads from SSD with dd in 1 and 32 streams. Found no regressions, but no benefits also as with one stream there is no congestion and with multiple streams all cores congested. - On Core i7-2600K (4 physical cores + HTT) I've run more then 20 `make buildworld`s with different -j values (1,2,4,6,8,12,16) for both original and patched kernel. I've found no performance regressions, while for -j4 I've got 10% improvement: # ministat -w 65 res4A res4B x res4A + res4B +-+ |+ | |++ x x x| |A| |__M__A__| | +-+ N Min Max Median Avg Stddev x 3 1554.86 1617.43 1571.62 1581.3033 32.389449 + 3 1420.69 1423.1 1421.36 1421.7167 1.2439587 Difference at 95.0% confidence -159.587 ± 51.9496 -10.0921% ± 3.28524% (Student's t, pooled s = 22.9197) , and for -j6 -- 3.6% improvement: # ministat -w 65 res6A res6B x res6A + res6B +-+ | + | | + + x x x | ||_M__A___| |__AM_|| +-+ N Min Max Median Avg Stddev x 3 1381.17 1402.94 1400.3 1394.8033 11.880372 + 3 1340.4 1349.34 1341.23 1343.6567 4.9393758 Difference at 95.0% confidence -51.1467 ± 20.6211 -3.66694% ± 1.47842% (Student's t, pooled s = 9.09782) Who wants to do independent testing to verify my results or do some more interesting benchmarks? :) PS: Sponsored by iXsystems, Inc. The benchmark is incomplete, a complete benchmark should at lease includes cpu intensive applications. Testing for release world databases and web servers and other importance applications is needed. I plan to do this, but you may help. ;) Thanks, I need to find time. I have cc'ed hackers@, my first mail seems forgot to include it. I think designing a SMP scheduler is a dirty work, many test and refining and still, you may get imperfect result. ;-) Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail
Re: sem(4) lockup in python?
On 2012/2/5 20:02, Ivan Voras wrote: On 5 February 2012 11:44, Garrett Cooperyaneg...@gmail.com wrote: 'make MAKE_JOBS_NUMBER=1' is the workground used right now.. David Xu suggested that it is a bug in Python - it doesn't set process-shared attribute when it calls sem_init(), but i've tried patching it (replacing the port patchfile file the one I've attached) and I still get the hang. Although I don't know where is the python bug, since I don't know the piece of source code. But general rule to use anonymous shared semaphore between forked processes is the semaphore should be initialized in shared memory page and sem_init() with pshared set 1, such as: sem_ptr = mmap(MMAP_SHARED); sem_init(sem_ptr, pshared=1, init_value); Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is pthread_cond_signal(3) man page correct?
On 2011/03/16 23:23, Yuri wrote: On 02/27/2011 18:00, David Xu wrote: I think in normal case, pthread_cond_signal will wake up one thread, but other events for example, UNIX signal and fork() may interrupt a thread sleeping in kernel, and cause pthread_cond_wait to return to userland, this is called spurious wakeup, and other events, I can not think of yet, but I believe they exist. Does this mean that pthread_cond_signal can also return EINTR? This isn't in pthread_cond_signal(3) either. No, it will return zero, returning EINTR is not allowed. Is this the case that all system calls should be assumed to be able to return EINTR or only those that have EINTR in their man pages? Yuri ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Is pthread_cond_signal(3) man page correct?
On 2011/02/28 05:26, Yuri wrote: Forwarding to standards@ and davidxu@ per Garrett Cooper suggestion. Also I want to add that I came to this question while observing behavior consistent with multiple wakeup on FreeBSD-8.1. The heavily multi-threaded code that assumes that only one thread can be woken up by one pthread_cond_signal call crashes, and the only reasonable explanation so far is that more than one threads are actually being woken up. Yuri On 02/27/2011 12:54, Yuri wrote: On FreeBSD-8.1 this page says: The pthread_cond_signal() function unblocks one thread waiting for the condition variable cond. On Linux it says: The /pthread_cond_signal/() function shall unblock at least one of the threads that are blocked on the specified condition variable /cond/ (if any threads are blocked on /cond/). Also HP page (http://docs.hp.com/en/B2355-90130/pthread_cond_signal.3T.html) says: If there are no threads blocked on /cond/, this function has no effect. And later it says: It is possible that more than one thread can be unblocked due to a spurious wakeup. This is quite confusing: in case nobody is waiting does it block or not? In case other threads are waiting it's really any arbitrary number of threads are woken up? Or on FreeBSD it's strictly 1? Shouldn't this be defined in one and only way by POSIX and all POSIX-compliant systems should work exactly the same. I think man page should be expanded to give more comprehensive explanation. Yuri ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org It is really not important if pthread_cond_signal wake up one thread or multiple threads in corner case, because POSIX said: --- When using condition variables there is always a Boolean predicate involving shared variables associated with each condition wait that is true if the thread should proceed. Spurious wakeups from the pthread_cond_timedwait() or pthread_cond_wait() functions may occur. Since the return from pthread_cond_timedwait() or pthread_cond_wait() does not imply anything about the value of this predicate, the predicate should be re-evaluated upon such return. --- I think in normal case, pthread_cond_signal will wake up one thread, but other events for example, UNIX signal and fork() may interrupt a thread sleeping in kernel, and cause pthread_cond_wait to return to userland, this is called spurious wakeup, and other events, I can not think of yet, but I believe they exist. Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Tracking down a problem with php on FreeBSD
Ivan Voras wrote: On 5 February 2011 19:43, Ruslan Mahmatkhanov cvs-...@yandex.ru wrote: Hi, Ivan! Thank you much for response and sorry for late answer. We was able to collect some data about the issue to make discussion more objective. See below. Simple php-fpm restart solves the problem, but i need to track it down to the cause of this situation and ask for your assistance and instructions on how to debug it. Some facts about this: On one hand, FPM is said to be very experimental... Personally, I've been using apache22-worker or apache22-event + mod_fcgid for years without trouble. We prefer to avoid using apache at all, because in this it's just adds yet another unneeded link and complexity. I guess it's about tradeoffs beween complexity and stability :) - `top -mio` shows very high (8-9 for VCSW) VCSW/IVCSW values for php-fpm processes and LA is more than 120 I think this is significant, especially with this: When attaching to any hanging php-fpm proccess with truss, than i see a lot of this calls: sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) sched_yield(0x80516c000,0x1,0x4d4828b6,0x8012ef45c,0x808bfd80,0x7fffa078) = 0 (0x0) Normal processes of the type PHP is have no need to call sched_yield() arbitrarily, unless they are implementing something they shouldn't - like a synchronization primitive (semaphore/lock). If a lot means of the same order of magnitude as your VCSW rate, this is the reason for it. I've analyzed my php-cgi binary and modules and they don't use sched_yield. And yes, grepping for it in the source finds it only in FPM: sapi/fpm/fpm/fpm_atomic.h:140: sched_yield(); It seems they are trying to implement a spinlock by hand, instead of using what the OS provides. (on the other hand, the implementation might be correct but they may be using it wrong). In any case, this points to bugs in FPM. if so, unfortunately I can't help you further. If you really want to continue using FPM, I guess you should probably replace this hand-made lock implementation by sem(4) or see if robust pthreads mutexes can be committed and MFCed (maybe with David Xu). Yes, there is a p4 branch implemented pthread robust mutex, it requires ABI change. Document for the POSIX robust mutex is here: http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutexattr_getrobust.html ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Distributed SSH attack
Jeremy Lea wrote: Hi, This is off topic to this list, but I dont want to subscribe to -chat just to post there... Someone is currently running a distributed SSH attack against one of my boxes - one attempted login for root every minute or so for the last 48 hours. They wont get anywhere, since the box in question has no root password, and doesn't allow root logins via SSH anyway... But I was wondering if there were any security researchers out there that might be interested in the +-800 IPs I've collected from the botnet? The resolvable hostnames mostly appear to be in Eastern Europe and South America - I haven't spotted any that might be 'findable' to get the botnet software. I could switch out the machine for a honeypot in a VM or a jail, by moving the host to a new IP, and if you can think of a way of allowing the next login to succeed with any password, then you could try to see what they delivered... But I don't have a lot of time to help. Regards, -Jeremy Try to change SSH port to something other than default port 22, I always did this for my machines, e.g, change them to 13579 :-) Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: pthread_{mutex,cond} fifo/starvation/scheduling policy
Bernard van Gastel wrote: But the descheduling of threads if the mutex is not available is done by the library. And especially the order of rescheduling of the threads (thats what I'm interested in). Or am I missing something in the sys/kern/sched files (btw I don't have the umtx file). Regards, Bernard The libthr mutex wait-queue is FIFO in kernel, however, the decision whether a thread should be preempted or not is made by scheduler, when mutex owner unlocks the mutex, it just wakes up a waiter thread from head of the queue, if the thread has been blocked for enough long time, kernel scheduler may decide to raises its priority, and if the mutex owner has spent too much cpu time, the scheduler may decides to low its priority, so when the mutex owner unlocks the mutex, preemption may happen if the waiter has higher priority, and the waiter thread can lock the mutex, otherwise the waiter still has to wait for some time to gain higher thread priority. It looks like round-robin fashion, and the round-robin quantum is made by kernel scheduler. In theory, this has best performance, and directly hand-off seems hurt performance drastically. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
Prashant Vaibhav wrote: ...and that is _exactly_ what I propose(d) in the beginning and what OSX already does. Further, keeping the shared page and functions fixed at the end of the memory space has advantages like not needing any special linking, being easily accessible for code jumps or data reads, and so on [1]. The TSC issues are but one part of the puzzle. After this week-long discussion I still can't decide whether this was something that's desirable at all: keeping in mind that it's among the few project ideas tagged as Suggested for Google Summer of Code 2009 on the FreeBSD website. :-\ Though I've been reading mailing list archives, and the various handbooks, I'm not familiar well enough with other parts of the freebsd kernel to draft another concrete proposal on my own at this time. [1] *Mac OS X Internals: A Systems Approach,* p 595, Amit Singh, ISBN 0321278542 Without using ELF, but using signal like trampoline code as we current do makes it very difficult for some language to do asynchronous stack unwinding, e.g pthread async cancellation and C++ objection destruction. See my recent work for pthread cancellation and stack unwinding: http://people.freebsd.org/~davidxu/patch/unwind.patch Check x86_64_fallback_frame_state() to see what hacking code should be written. Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
Julian Elischer wrote: depends on the hardware. anyhow I was only saying it was possible, not necessarily good or even useful. I had done some works for thread private page shared by kernel and userland when I was doing userland spinlock, if userland asks a page, kernel will allocate it and put some interesting thing in it by scheduler etcs, these code may be useful. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
David Xu wrote: Julian Elischer wrote: depends on the hardware. anyhow I was only saying it was possible, not necessarily good or even useful. I had done some works for thread private page shared by kernel and userland when I was doing userland spinlock, if userland asks a page, kernel will allocate it and put some interesting thing in it by scheduler etcs, these code may be useful. FYI: http://people.freebsd.org/~davidxu/schedctl/ ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
Julian Elischer wrote: David Xu wrote: David Xu wrote: Julian Elischer wrote: depends on the hardware. anyhow I was only saying it was possible, not necessarily good or even useful. I had done some works for thread private page shared by kernel and userland when I was doing userland spinlock, if userland asks a page, kernel will allocate it and put some interesting thing in it by scheduler etcs, these code may be useful. FYI: http://people.freebsd.org/~davidxu/schedctl/ reading this quickly, you allocate a separately addressed page for each thread, but, how do you use it? I store the address in userland TLS area, then get it when I want to check some scheduling informations. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
Julian Elischer wrote: David Xu wrote: Julian Elischer wrote: David Xu wrote: David Xu wrote: Julian Elischer wrote: depends on the hardware. anyhow I was only saying it was possible, not necessarily good or even useful. I had done some works for thread private page shared by kernel and userland when I was doing userland spinlock, if userland asks a page, kernel will allocate it and put some interesting thing in it by scheduler etcs, these code may be useful. FYI: http://people.freebsd.org/~davidxu/schedctl/ reading this quickly, you allocate a separately addressed page for each thread, but, how do you use it? I store the address in userland TLS area, then get it when I want to check some scheduling informations. and the scheduler writes out interesting information to that location?... Yes. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: Improving the kernel/i386 timecounter performance (GSoC proposal)
Julian Elischer wrote: Scott Long wrote: I've been talking about this for years. All I need is help with the VM magic to create the page on fork. I also want two pages, one global for gettimeofday (and any other global data we can think of) and one per-process for static data like getpid/getgid. interestingly it is even feasible to have a per-thread page.. it requires that the scheduler change a page table entry tough. I will knock his door at midnight if he added such a heavy weight task in the scheduler, TLB shutdown is horrible, and big code size squeezing out data from CPU cache is not idea model. scheduler should be as simple as just a context switching routine. :-) David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: threaded, forked, rethreaded processes will deadlock
Kostik Belousov wrote: I looked at the issue once more recently, and I propose the following much less intrusive patch. It is somewhat hackish, but I think that it would be good to have this working. Most other Unixes do have working thread library after the fork. Any objections ? diff --git a/lib/libthr/thread/thr_fork.c b/lib/libthr/thread/thr_fork.c index bc410d1..ae6b9ad 100644 --- a/lib/libthr/thread/thr_fork.c +++ b/lib/libthr/thread/thr_fork.c @@ -173,14 +173,19 @@ _fork(void) /* Ready to continue, unblock signals. */ _thr_signal_unblock(curthread); - if (unlock_malloc) + if (unlock_malloc) { + __isthreaded = 1; _malloc_postfork(); + __isthreaded = 0; + } /* Run down atfork child handlers. */ TAILQ_FOREACH(af, _thr_atfork_list, qe) { if (af-child != NULL) af-child(); } + + THR_UMUTEX_UNLOCK(curthread, _thr_atfork_lock); ^^^ This line is not needed. } else { /* Parent process */ errsave = errno; ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: td_critnest
Ravi Murty wrote: Hello All, The implementation of critical_enter and critical_exit changed between freebsd 5 and freebsd 6. In the newer implemtnation, the code checks if td_critnest is 1 and if it is sets it to zero, then checks if the thread owes a preempt. If so, it increments td_critnest by 1 before grabbing a lock and then decrements it back to zero. I can't figure out why it does this. The freebsd 5 implementation seems straightforward where we check if the thread owes a preempt and if so we switch to the new thread. Can anyone help me with this? Thanks Ravi I guess this becauses thread_lock() also calls critical_exit(), this code avoids recursion. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org
Re: MFC TO 6.X (6.3?) to fix aio_return() ?
Julian Elischer wrote: This diff is a partial MFC (picking parts out of -current) that makes aio_return() return the error return of a completed AIO request. (as it does on othe OS's and in 7.x). The man page for 6.x and other OS's indicate that aio_return shoud return all the same results as a returning read() or write() including setting errno on error. in 6.x this does not happen. on 7.0 it does. The included test program can show the result when using gnop() to simulate IO errors. BTW the test program could be used as a start to sample code as to how to use kqueue and aio together. If people agree this is worth fixing, it would be nice to get it in 6.3 Looks OK to me. Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: SIGFPE with libthr and gdb
mal content wrote: Hello. When using this libmap.conf: libpthread.so.2 libthr.so.2 libpthread.so libthr.so libc_r.so.6 libthr.so.2 libc_r.so libthr.so ...nearly every program receives SIGFPE when calling various functions, when running under gdb. strtol() is one example. Is this a well known problem, before I file a bug report? I'm on FreeBSD 6.2-RELEASE. MC try to update fbsd-threads.c to revision 1.13.2.3, this may resolve the problem. http://www.freebsd.org/cgi/cvsweb.cgi/src/gnu/usr.bin/gdb/libgdb/fbsd-threads.c Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: scheduler CORE for RELENG_6
On Tuesday 24 October 2006 17:31, Stepan A. Baranov wrote: Hello, I prepared patch to port scheduler from HEAD to RELENG_6. I applied this patch for my workstation and scheduler CORE works fine. After applying patch you can say: options SCHED_CORE # CORE scheduler in your KERN CONF file and rebuild your kernel. Please, test it and fix it. Stepan Baranov. Thanks, but I feel the scheduler is still very experiment, there are lots of work to do, I would not like to see it in RELENG_6 and cause lots of questions which I am not ready to answer, I am finding spare time to work on it again. ;-) Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: TLS - implementing linux one in fbsd
Divacky Roman wrote: The M:N and 1:1 threading in FreeBSD use different mechanisms to implement TLS, M:N implements it in userland, while 1:1 implements it in kernel. the thr_new or thr_create are used for 1:1 threading, right now libthr uses thr_new to atomically setup a thread, this includes, storing TID, setting TLS, and maybe signal mask( not implemented ) , cpu affinity mask etcs(not implemented), scheduling scope, in one word, it is intended to map most part of pthread_attr into kernel world. but on the kernel level the implementation must be the same.. I mean the mangling of %gs. right? There is no such standard that a kernel must implement it in that way, we happens to implement it in kernel with GDT, before this, thread libraries were using LDT. The offical TLS standard only defined ABI in userspace: http://people.redhat.com/drepper/tls.pdf M:N thread library only set GDT entry once, for Variant II TLS (x86), the userland scheduler just replaces some pointers in TCB, it does not have to set TLS via syscall later. but 1:1 thread library will just let kernel context switch code to update it for next thread. well.. in linux the thread creation and setting up the tls is done using separate syscalls. I plan to extend clone() syscall to use thr_create() or thr_new() (if the flags tell me its thread) but I am afraid I'l have to modify those syscalls to not to setup TLS (some flag) because linux wants to set it separately. You can try, but the thr_xx syscalls were not designed to implement linux clone() syscall, they are only used by libthr to implement 1:1 threading. I think it is used for futex, and the childtid is use to implement pthread_join and garbage collection in thread library, the parent tid pointer (if I recall correctly) is used by parent thread to retrieve child tid. this is the next step... I think all the magic is done in their libc (or somewhere) and I basically just need to malloc some space for this info and clear/set it on proces creation/exit we don't save childtid pointer and clear it at thread exiting time like Linux did, we use thr_exit() which passes a pointer to let kernel write a value into the address, this lets libthr's garbage collection code work. the thr syscalls may be extented to save childtid pointer somewhere in kernel by adding another flag, and clear it to zero when thread is exiting like Linux did. the cpu_set_user_tls() is then what I need I think... maybe some modifications needed but it shuold be basiscally the thing. the linux syscall set_thread_area() just loads GDT with that info.. thats the same like ours cpu_set_use_tls(), right? Right. thnx for your information! roman ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: TLS - implementing linux one in fbsd
On Tuesday 20 June 2006 20:09, Divacky Roman wrote: Hi I am student working on SoC project - extending linuxolator, now I am working on implementing linux TLS in FreeBSD. Here is what I think/know and I like you to comment on this, thnx. Roman - Linux and FreeBSD TLS implementation - comparison Both systems use per-thread setting of where is the tls area stored. This setting is loaded into active threads GDT and can be accessed via %gs register. This GDT setup is done on every context switch. Yes. Linux uses strict 1:1 threading so every thread is in fact process, so thread creation is done using plain clone()/fork(). FreeBSD uses M:N (including 1:1) threading. Threads are created via pthread_create() call to threading library. In kernel there's thr_new() syscall or thread_create() syscall. I didnt find the connection between threading library and kernel but I assume its using one of the syscalls The M:N and 1:1 threading in FreeBSD use different mechanisms to implement TLS, M:N implements it in userland, while 1:1 implements it in kernel. the thr_new or thr_create are used for 1:1 threading, right now libthr uses thr_new to atomically setup a thread, this includes, storing TID, setting TLS, and maybe signal mask( not implemented ) , cpu affinity mask etcs(not implemented), scheduling scope, in one word, it is intended to map most part of pthread_attr into kernel world. For setting up the GDT for the thread Linux uses syscall set_thread_area() (TODO - how exactly? its unclear what it does). I dont know how FreeBSD does it but I think it might be done via params to the syscalls (TODO - how is it done?) If you use thr_new, it is not necessary to use set_thread_area, I am not sure you need to change TLS pointer again after the thread is created, I think only main thread may need this feature, in FreeBSD, setting thread's TLS pointer is via libc function: _set_tp(void *tp). Remaining questions: clone() - 2.6.x glibc fork() implementation uses clone() syscall. is it supposed to create a thread or just a process? I think its process but why is the binary (ls, date and probably some other) linked to pthread library? is it just Linux strangeness? I dont see a reason for ls to be threaded... does anyone see? Dunno. set/get tid - does it relate to TLS at all? I dont think so but you never know. The tid thing is unclear to me. The clone() syscall is passed CLONE_CHILD_SETTID CLONE_CHILD_CLEARTID which should be mutually exclusive. I dont believe much its a mistake.. but the code is clear: p-set_child_tid = (clone_flags CLONE_CHILD_SETTID) ? child_tidptr : NULL; p-clear_child_tid = (clone_flags CLONE_CHILD_CLEARTID) ? child_tidptr: NULL; kostik belousov pointed out that this is used for futexes, so not interesting for this I think it is used for futex, and the childtid is use to implement pthread_join and garbage collection in thread library, the parent tid pointer (if I recall correctly) is used by parent thread to retrieve child tid. Possible mapping from Linux to FreeBSD: To me it seems that the the set_thread_area() syscall is used in the process of thread creation to set where the tls is stored. In FreeBSD we use cpu_set_user_tls() for this. So it might be enough to just wrap call to cpu_set_user_tls() into the syscall. cpu_set_user_tls is used by thr_new syscall internally to setup TLS pointer before executing user code, the thr_new syscall's only user is libthr. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [patch] Re: dlopen() and dlclose() are not MT-safe? YES, esp. for libthr
在 Saturday 25 March 2006 23:07,Alexander Kabaev 写道: The thread mask only makes sense when flags are per-thread. I meant to use it to detect PLT recursions from locking primitives exported to rtld by the threads library as those are not allowed and threads implementations are required to take special care to provide only self-contained locks. The 'default' lock implementation will not work with any library other than libc_r, and even that holds true only for some definition of work. The dynamic loader never had a reliable locking and furthermore, there was no way to make it work better without threading library cooperation. This is why we came up with a set of callbacks rtld expects every threading library to provide. libpthread was the first where these callbacks were implemented. It comes as a surprise that libthr did not have them, because David Xu was the one who did most of the work on rtld locking callbacks in libpthread. The def_thread_set_flag function use is racy and should be fixed. -- Alexander Kabaev I have fixed it in libthr yesterday, I think I must forgot to do it in the past. Regards, David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: [patch] Re: dlopen() and dlclose() are not MT-safe? YES, esp. for libthr
在 Friday 24 March 2006 16:48,Kostik Belousov 写道: I did understand the purpose of the thread mask code in libexec/rtld/rtld_lock.c, or, more precisely, the condition where this code works (for the context, see the mails with same subject on freebsd-hackers). Look, that code assumes that blocking async signals would stop thread scheduler from doing preemption of the current thread. This works for libc_r, but fails in libpthread and libthr cases. libpthread provides implementation of the locks for rtld. But libthr does not ! As result, rtld exhibit races when used with libthr. In other words, libthr needs code to do proper locking. Do you agree ? Does somebody already planned to do this work ? Best regards, Kostik Belousov I will check libthr source code to see if I can fix it at the weekend. David Xu ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: -lthr vs. -pthread
libpthread default is M:N threading model, kernel thread entity is allocated on demand, things like sleep() only block thread in userland, no kernel thread will be allocated, so in your example, you won't see 5 kernel threads, only two threads are showed here, the extra thread is a signal thread, there is only one signal thread in process live cycle. libthr is 1:1, when you allocate a thread in userland, it creates a kernel thread too. David Xu Cyrille Lefevre wrote: Hi, I'm currently working on enhancements to ps w/ Garance A Drosehn. I've just added some thread related stuffs and to see them, I'm using the following program : #define _REENTRANT #include pthread.h #define NUM_THREADS 5 #define SLEEP_TIME 10 void *sleeping(void *); pthread_t tid[NUM_THREADS]; int main(int argc, char *argv[]) { int i; for (i = 0; i NUM_THREADS; i++) pthread_create(tid[i], NULL, sleeping, (void *)SLEEP_TIME); for (i = 0; i NUM_THREADS; i++) pthread_join(tid[i], NULL); printf(main() reporting that all %d threads have terminated\n, i); return (0); } void * sleeping(void *arg) { int sleep_time = (int)arg; printf(thread %d sleeping %d seconds ...\n, thr_self(), sleep_time); sleep(sleep_time); printf(\nthread %d awakening\n, thr_self()); return (NULL); } then, I compile this one in 2 way : # cc -o thread thread.c -lthr and # cc -pthread -o pthread thread.c here is some of the new ps outputs : lwp is the thread id and nlwp the the number of threads. -q switch in posix mode (aka SystemV) and -C select processes by name (a la pgrep). # ./thread sleep 1; ps -H -O lwp,nlwp -qC thread (thread, using -H) PIDLWP NLWP TTYTIME COMMAND 85146 156 ttyp0 00:00:00 thread 85146 146 ttyp0 00:00:00 thread 85146 136 ttyp0 00:00:00 thread 85146 126 ttyp0 00:00:00 thread 85146 116 ttyp0 00:00:00 thread 85146 851466 ttyp0 00:00:00 thread # ./pthread sleep 1; ps -H -O lwp,nlwp -qC thread (pthread, using -H) PIDLWP NLWP TTYTIME COMMAND 96689 122 ttyp0 00:00:00 pthread 96689 966892 ttyp0 00:00:00 pthread is it normal that -pthread only forks only 1 thread where -lthr forks 5 of them ? # ./thread sleep 1; ps -O lwp,nlwp -qC thread (thread ot pthread, not using -H) PIDLWP NLWP TTYTIME COMMAND 73718 156 ttyp0 00:00:00 thread is it normal that the selected process is the last forked thread and not the thread owner (father) ? PS : using -lc_r, there is no thread at all, but I suppose this is an expected behaviour. CC -current and -hackers Cyrille Lefevre. ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: FreeBSD 5.1-p10 reproducible crash with Apache2
Branko F. Grac(nar wrote: Thanks. I already sent pr at 29.10.2003, which is identified by id 'kern/58677'. PR can be viewed at the following url address: http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/58677 I think, that this really serious issue, concerning operating system stability. best regards, Brane Please tell us your Apache configuration, are you using prefork or worker or perchild mode ? If you are using worker mode, which thread library are you using ? this would help us to narrow down problem scope. --- David Xu ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Improving GNU make compatibility in BSD make (+ patch)
The reason that autoconf automake can only run under gmake is gmake will re-read included file if the included file itself is modified by some rules. BSD make does not! so they do not work with BSD make. for example: DEP_FILES = aaa.d bbb.d ccc.d $(DEP_FILES): %.d : %.cpp @echo Generating auto dependency for $ @$(SHELL) -ec '$(GXX) -MM $(CXXFLAGS) $ \ | sed '\''s/\($*\)\.o[ :]*/\1.o $@ : /g'\'' $@; \ [ -s $@ ] || rm -f $@' -include $(DEP_FILES) when aaa.d and bbb.d and ccc.d are updated, gmake will re-readed it in while BSD make not. in this example, if I use BSD make, I must first create target depend in Makefile and run the stupid make depend command everytime I modified my source code, with gmake, I can always run make without additional steps and it will generates auto-depend rule before compiling real source code. David Xu - Original Message - From: Terry Lambert [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, May 31, 2002 4:38 PM Subject: Re: Improving GNU make compatibility in BSD make (+ patch) Jos Backus wrote: So BSD make interpreting either `$^' or `$+' as its own `$' would improve compatibility with GNU make Makefiles. I am just not sure which of the two GNU make variables maps better to our `$'. This patch implements the former: The biggest problem with GNU make that I've seen is re-expansion of variable variables. The suggested fix doesn't address that, so it won't fix the most common compatability problem. If we are going to evolve make into gmake (not a good idea, IMO), then probably the place to start is any port that requires gmake to get it working under make. I'm not sure, but I believe the other BSD's, and OpenPorts have modified make syntax somewhat. It's probably a good idea to keep compatability with options in the OpenPorts camp, more than any other, if it ever evolves to fulfill its potential properly. I really hate that many autoconf/automake scripts generate code that can ony be run by gmake, and sometimes also requires that /bin/sh actually be bash instead of sh. It also occurs to me (from experience with perl), that in any language where it's posible to do something in more than one way, it is then impossible to differentiate the right way from the wrong way. 8-(. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
is it safe to temporary replace thread pcb pointer?
I am working on vm86 bios call crash bug for CURRENT and have already a working patch on my machine, I have tested the patch under heavy loaded, seems be very stable. in the patch, I replace current thread pcb pointer to a temporary pcb, the pcb of course is not created by pmap, I am not very certain whether it is safe, any help will be appreciated. David Xu __ Do You Yahoo!? LAUNCH - Your Yahoo! Music Experience http://launch.yahoo.com To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
fix wrong PNP ID comment
Current branch, pci_bus.c has wrong PNP ID comment. --- /sys/i386/pci/pci_bus.c.origMon Apr 22 16:13:02 2002 +++ /sys/i386/pci/pci_bus.c Mon Apr 22 16:13:29 2002 @@ -554,7 +554,7 @@ * people. */ static struct isa_pnp_id pcibus_pnp_ids[] = { - { 0x030ad041 /* PNP030A */, PCI Bus }, + { 0x030ad041 /* PNP0A03 */, PCI Bus }, { 0 } }; __ Do You Yahoo!? Yahoo! Games - play chess, backgammon, pool and more http://games.yahoo.com/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Swapping performance
I have done some tests on my machine, the machine has both Linux and FreeBSD installed, the following is the data: MALLOC_SIZE = 1024*1024*400 has bzero Red Linux 6.2(kernel 2.2.14) 5.09user 5.62system 1:15.33elapsed 14%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (115298major+206560minor)pagefaults 161471swaps 4.70user 5.73system 1:17.13elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (116402major+206554minor)pagefaults 160731swaps 4.88user 5.68system 1:17.04elapsed 13%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (117309major+206550minor)pagefaults 161273swaps FreeBSD 4.5-STABLE 5.489u 6.815s 1:25.96 14.2% 4+425738k 0+0io 12937pf+0w 5.342u 6.728s 1:24.40 14.2% 4+414152k 0+0io 12929pf+0w 5.073u 6.815s 1:28.58 13.4% 3+408011k 1+0io 12920pf+0w --- MALLOC_SIZE = 1024*1024*400 no bzero Red Linux 6.2(kernel 2.2.14) 2.01user 4.16system 0:24.79elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (5127major+103353minor)pagefaults 59369swaps 1.82user 4.31system 0:24.90elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (4897major+103339minor)pagefaults 59250swaps 1.76user 4.29system 0:24.51elapsed 24%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (5814major+103343minor)pagefaults 59360swaps FreeBSD 4.5-STABLE 2.802u 3.604s 0:23.20 27.5% 4+415497k 0+0io 81pf+0w 2.975u 3.434s 0:23.58 27.1% 4+412937k 0+0io 83pf+0w 2.871u 3.480s 0:23.91 26.5% 4+413607k 0+0io 83pf+0w /* * vmstress.c */ #include stdlib.h #include string.h #include stdio.h #define MALLOC_SIZE 1024*1024*400 int main(int argc, char **argv) { char *ptr; int i, i_count; int j; ptr = (char *) malloc(MALLOC_SIZE); bzero(ptr, MALLOC_SIZE); i_count = MALLOC_SIZE / 16; fprintf(stderr, *); for (i = 0; i i_count; i ++) { ptr[i 4] = ptr[(i 3) + 1]++; } fprintf(stderr, #); for (j = 0; j i_count; j ++) { ptr[j 4] = ptr[(j 3) + 1]++; } free(ptr); return 0; } Unfortunately, I havn't Linux kernel 2.4.17 installed, is Linux kernel 2.4.17 faster? -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Clearcase and FreeBSD
Terry Lambert [EMAIL PROTECTED] types: Danny J. Zerkel wrote: Maybe cvs is an academic toy. Most real development requires a real configuration management system. Why do you think there is work being done on FreeBSD in Perforce? Frankly, it's because CVS only permits a single line of concurrent developement, and it's a limiting tool; but CVSup is CVS-centric and fails with P4, and P4 costs money as a barrier to adoption for FreeBSD if the project were to cut over to it, Not necessarily. The client is free, and in the ports tree. That includes the server with an evaluation license, which limits it to two clients and two users. Perforce offers Open Source software projects free multiuser - which means unlimited clients - licenses. See URL: http://www.perforce.com/perforce/price.html and search for open source on the page. They even point to the FreeBSD license as a good choice for a candidate. so there's understandable backpressure against using it for the main repository. I think the real pressure is that none of the sources are available. Last time I checked, they didn't even publish a description of the protocol between the server and the client, so you can't independently develop a client. They *do* provide a link library for all their build platforms that you can use to build custom clients, but you don't get source to that. I admit to being biased, but I think that switching would be an incredible win for everyone concerned. mike -- Mike Meyer [EMAIL PROTECTED] http://www.mired.org/home/mwm/ Independent WWW/Perforce/FreeBSD/Unix consultant, email for more information. Does Perforce support replicate like FreeBSD's current CVSUP support? if not, how does it support large number of users or connections? is it a trend that FreeBSD community will migrated to use Perforce instead of CVS? I prompt these because I feel somebodies try to create two FreeBSD source repositories :( Am I wrong? Thanks, -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Linking libc before libc_r into application causes weird problems
sound like cl, I like the libpthread hook. hahaha. -- David Xu Jason Evans wrote: On Fri, Feb 08, 2002 at 07:46:34AM +0200, Maxim Sobolev wrote: Hi, When working on updating port of Ximian Evolution to the latest released version I have stuck to the problem - the new version of application just hanged on startup on my 5-CURRENT box. After lot of digging and debugging I found that the source of the problem is that the resulting application had libc linked in before libc_r, which caused waitpid() in the ORBit library just hang forever, even though child process died almost instantly (I see zombie in the ps(1) output). When program was relinked with -pthread flag, which seemingly forcing right order of libc/libc_r (libc_r first) the problem disappeared. Based on the problematic code in the ORBit I had prepared short testcase illustrating the problem and attaching it with this message. The problem could be exposed by compiling the test.c using the following command: $ cc test.c -o test -lc -lc_r When either of -lc or -lc_r is omitted, or their order is reversed the problem disappears. The problem doesn't exist on 4-STABLE. Any ideas, comments and suggestions are welcome. IIRC, Dan changed things in -current about six months ago so that -lc_r would do the right thing. Previously (and still in -stable), -pthread was necessary in order to prevent libc from being implicitly linked in. There's some magic in the compiler front end that prevents libc from being implicitly linked in if libc_r is specified. It may re-order things as well, but I'd have to look at the code to verify that. In any case, don't manually specify both, or Bad Things will happen, as you've discovered. It's my hope that we'll be able to use -lpthread by the 5.0 release, which is what the standards say should work. We could have that right now, but we've been holding off, since threads may be KSE-based by the 5.0 release. Jason To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: kld VM pager
FreeBSD does not have fault hook available, all faults are processed in vm_fault. I know Linux supports that idea, you can insert a fault hook to monitor some address range where fault occurs, and then graphics frame buffer can be supported. -- David Xu Nicolas Souchu wrote: Hi VM developers, Has anyone already some useful utils to develop a VM pager for FreeBSD? The KGI port project is progressing and is now up to the point that I have to handle the VM events as done in Linux. http://www.freebsd.org/~nsouch/ggiport.html Reading the 4.4BSD Internals, I've understood that I should look in the pager direction and the code shows that the device pager is not much far from my willing... More precisely, has anybody some way to load/unload a pager? Is it really possible? What advices could you address me before I start this? Thanks in advance! Nicholas To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: kld VM pager
you mean we can use OBJT_PHYS? do we fully support it? althougth we have phys_pager, I suspect it is not enough, because along with fault handling, graphics driver will need other operations, for examples, unmap other pages and setup hardware registers, think about a frame buffer driver need to map 32K window into 1024K address space, when fault ocurrs in second 32K window, it should unmap first 32K address and activate second 32K address and setup graphics card registers to let later access to its internal second 32K memory bank. how can you handle such issue after vm_fault? Regards, -- David Xu Alfred Perlstein wrote: * David Xu [EMAIL PROTECTED] [011206 22:15] wrote: FreeBSD does not have fault hook available, all faults are processed in vm_fault. I know Linux supports that idea, you can insert a fault hook to monitor some address range where fault occurs, and then graphics frame buffer can be supported. I'm sure one could add a callback in vm_fault without much issue. However this isn't the point of the hook. One can insert the vm object into the vm map, when a fault occurs the vm_fault code will call the pager handler to service it. The only good part for having a fault hook in linux is most likely for debug purposes. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Can TCP changes be put in RELENG_4?
It will be still a defacto, because Linux distributions will always install tuned version of Linux kernel as default while FreeBSD not, the default GENERIC FreeBSD kernel's performace sucks, and ordinary user will find FreeBSD is slower, could we let user to select which kernel to install at installing time? -- David Xu Terry Lambert wrote: Matthew Dillon wrote: These changes are performance fixes, not security fixes. I consider them fairly significant performance fixes, but these bugs have been in the TCP stack for literally a whole year without an outcry so I don't see much justification for putting them into the security branch. I think the main question is whether or not Linux should continue to kick FreeBSD's ass after 4.5 is released. -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: add some constraints in cpufunc.h
According to GCC manual of inline assembler instruction, it says if your instruction changes condition code register(on X86, it's cpu flag register, and a simple addl instruction can affect it), you'd put cc there, I have reviewed some source header files of bus management, they all have cc constraint, but others not, and some lines lost __volatile__ keyword, GCC can feel free to optimize them and re-order or delete these lines when it thinks this is a right decision, this could be dangerous when high optimizing option is turned on. -- David Xu - Original Message - From: John Baldwin [EMAIL PROTECTED] To: David Xu [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, November 22, 2001 3:38 AM Subject: RE: add some constraints in cpufunc.h On 21-Nov-01 David Xu wrote: 4.4-stable, file sys/i386/include/cpufunc.h, --- cpufunc.h.orig Wed Nov 21 13:35:36 2001 +++ cpufunc.h Wed Nov 21 15:00:12 2001 @@ -72,7 +72,7 @@ { u_int result; - __asm __volatile(bsfl %0,%0 : =r (result) : 0 (mask)); + __asm __volatile(bsfl %0,%0 : =r (result) : 0 (mask) : cc); return (result); } @@ -81,7 +81,7 @@ { u_int result; - __asm __volatile(bsrl %0,%0 : =r (result) : 0 (mask)); + __asm __volatile(bsrl %0,%0 : =r (result) : 0 (mask) : cc); return (result); } @@ -305,7 +305,7 @@ u_int result; __asm __volatile(xorl %0,%0; xchgl %1,%0 -: =r (result) : m (*addr)); +: =r (result) : m (*addr) : cc); return (result); } Have you had actual bugs as a result of cc not being in the constraints? If so, there's a _lot_ more places that need this. All the atomic ops, for example. -- John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/ Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
add some constraints in cpufunc.h
4.4-stable, file sys/i386/include/cpufunc.h, --- cpufunc.h.orig Wed Nov 21 13:35:36 2001 +++ cpufunc.h Wed Nov 21 15:00:12 2001 @@ -72,7 +72,7 @@ { u_int result; - __asm __volatile(bsfl %0,%0 : =r (result) : 0 (mask)); + __asm __volatile(bsfl %0,%0 : =r (result) : 0 (mask) : cc); return (result); } @@ -81,7 +81,7 @@ { u_int result; - __asm __volatile(bsrl %0,%0 : =r (result) : 0 (mask)); + __asm __volatile(bsrl %0,%0 : =r (result) : 0 (mask) : cc); return (result); } @@ -305,7 +305,7 @@ u_int result; __asm __volatile(xorl %0,%0; xchgl %1,%0 -: =r (result) : m (*addr)); +: =r (result) : m (*addr) : cc); return (result); } -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
vm_map_protect()
In FreeBSD 4.4-stable file /sys/vm/vm_map.c, function vm_map_protect() will leave and have an entry splitted when first pass fails. here is the patch to avoid such issue. %diff -u vm_map.c.orig vm_map.c --- vm_map.c.orig Thu Nov 15 08:27:19 2001 +++ vm_map.cThu Nov 15 09:08:47 2001 @@ -999,13 +999,14 @@ { vm_map_entry_t current; vm_map_entry_t entry; + int clip_start = 0; vm_map_lock(map); VM_MAP_RANGE_CHECK(map, start, end); if (vm_map_lookup_entry(map, start, entry)) { - vm_map_clip_start(map, entry, start); + clip_start = 1; } else { entry = entry-next; } @@ -1026,6 +1027,8 @@ } current = current-next; } + if (clip_start) + vm_map_clip_start(map, entry, start); /* * Go back and fix up protections. [Note that clipping is not -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: vm_map_protect()
I like fast code. I want to avoid all object and entry splitting and merging cost. besides, I think the code is still very clear. -- David Xu - Original Message - From: Matthew Dillon [EMAIL PROTECTED] To: David Xu [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, November 15, 2001 1:41 PM Subject: Re: vm_map_protect() :In FreeBSD 4.4-stable file /sys/vm/vm_map.c, function vm_map_protect() :will leave and have an entry splitted when first pass fails. :here is the patch to avoid such issue. Hmm... don't you want to vm_map_simplify_entry() at the end instead of vm_map_clip_start()? -Matt Matthew Dillon [EMAIL PROTECTED] :%diff -u vm_map.c.orig vm_map.c :--- vm_map.c.orig Thu Nov 15 08:27:19 2001 :+++ vm_map.cThu Nov 15 09:08:47 2001 :@@ -999,13 +999,14 @@ : { :vm_map_entry_t current; :vm_map_entry_t entry; :+ int clip_start = 0; : :vm_map_lock(map); : :VM_MAP_RANGE_CHECK(map, start, end); : :if (vm_map_lookup_entry(map, start, entry)) { :- vm_map_clip_start(map, entry, start); :+ clip_start = 1; :} else { :entry = entry-next; :} :@@ -1026,6 +1027,8 @@ :} :current = current-next; :} :+ if (clip_start) :+ vm_map_clip_start(map, entry, start); : :/* : * Go back and fix up protections. [Note that clipping is not : :-- :David Xu : : : :To Unsubscribe: send mail to [EMAIL PROTECTED] :with unsubscribe freebsd-hackers in the body of the message : To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
pmap_collect() and PG_UNMANAGED
Hi, is there any reason why pmap_collect() in /sys/i386/i386/pmap.c does not check PG_UNMANAGED flag? unmanaged page does not have pv_entry associated, so call pmap_remove_all() has side effect, PG_MAPPED and PG_WRITEABLE are roughly cleared.-- David Xu
Re: pmap_collect() and PG_UNMANAGED
- Original Message - From: Peter Wemm [EMAIL PROTECTED] To: David Xu [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Thursday, November 01, 2001 9:43 AM Subject: Re: pmap_collect() and PG_UNMANAGED David Xu wrote: This is a multi-part message in MIME format. --=_NextPart_000_000C_01C162B1.0ECE7770 Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: quoted-printable Hi, is there any reason why pmap_collect() in /sys/i386/i386/pmap.c does = not check PG_UNMANAGED flag? unmanaged page does not have pv_entry = associated, so call pmap_remove_all() has side effect, PG_MAPPED and PG_WRITEABLE = are=20 roughly cleared. -- David Xu Did you have something like this in mind? @@ -1729,7 +1729,7 @@ for(i = 0; i vm_page_array_size; i++) { m = vm_page_array[i]; if (m-wire_count || m-hold_count || m-busy || - (m-flags PG_BUSY)) + (m-flags (PG_BUSY | PG_UNMANAGED))) continue; pmap_remove_all(m); } Yes, I have changed and tested it yesterday, I have pushed a memory hog program into system, run it about half an hour, almost every program was swapped out, still without cause any problem, it seems it is safe to add this flag. BTW; please stop posting MIME/HTML onto mailing lists. Sorry, my MS Outlook express sucks, I will replace it with better one! Cheers, -Peter -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] All of this is for nothing if we don't go to the stars - JMS/B5 -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Version of XFree86 in FreeBSD Release 4.4
I'm confused why Open and NetBSD both have XF4 installed, while we can't. why! David Xu - Original Message - From: Daniel O'Connor [EMAIL PROTECTED] To: Alexander Langer [EMAIL PROTECTED] Cc: Robert Withrow [EMAIL PROTECTED]; Robert Withrow [EMAIL PROTECTED]; Jordan Hubbard [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Wednesday, September 19, 2001 8:14 AM Subject: Re: Version of XFree86 in FreeBSD Release 4.4 On 18-Sep-2001 Alexander Langer wrote: The previous suggestion (have a generic XFree86 port) is a) hacky, but b) workable in the current package framework I suspect.. Yes, would be a nice workaround. I don't use packages, though :) I'm forced not to if I'm using a machine which needs X4.. Usually for a fresh install I use packages since it gets a working machine quicker :) --- To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: Vinum Panic (was Re: HPT370 RAID or Vinum?)
- Original Message - From: [EMAIL PROTECTED] To: Greg Lehey [EMAIL PROTECTED] Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Friday, September 14, 2001 9:28 AM Subject: Vinum Panic (was Re: HPT370 RAID or Vinum?) Well, I have to say that Vinum feels a lot faster than HPT RAID... I will quantify this statement when someone tells me how to turn off ATA write caching, because sysctl -w hw.ata.wc=0 doesn't work, says sysctl: oid 'hw.ata.wc' is read only hw.ata.wc=0 put it into /boot/loader.conf, then reboot. -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: vfs.vmiodirenable undocumented
but why hasn't a complete sysctl manual? I see OpenBSD has a better sysctl manual, our sysctl(8) is too bad, except the command usage info is useful, all left is garbage information and waste disk space. Regards, David Xu - Original Message - From: Sheldon Hearn [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Tuesday, July 10, 2001 8:21 PM Subject: Re: vfs.vmiodirenable undocumented On Tue, 10 Jul 2001 14:13:23 +0200, Sheldon Hearn wrote: Someone recently suggested that I tune vfs.vmiodirenable on a system with lots of memory. The CVS commit logs and the source tell me absolutely nothing about what this tunable does. Is anyone in a position to document it? Someone mailed me privately and pointed out that the sysctl is documented in tuning(7). Thanks, Sheldon. To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re[2]: import NetBSD rc system
Hello Robert, Thursday, June 14, 2001, 11:39:37 PM, you wrote: RW On Thu, 14 Jun 2001, Koster, K.J. wrote: To do some of the hierarchal start/stop at runtime stuff, you really need a stateful rc system that stores its start/stop state in /var/run/rc.d or the like. In this way, the system could track various activities and know which dependencies were already started. How about /var/run/{$deamon}.pid? RW So, one of the things I've always hated (and loved) about UNIX is the pid RW system. One of the problems I have with (foo).pid is that pid's are RW rapidly recycled, so if a daemon dies, there's no way to track that unless RW you're a parent process (wherein you can reliably get the exiting RW information via SIGCHLD and wait()). The same goes for using killall as RW the superuser to find and kill processes such as inetd by name: you can RW easily kill other things if there are user processes with the same name, RW etc. In my view, the only really reliable way to manage daemon processes RW is as the parent of the process. Unfortunately, changing to that model RW would be a time-consuming, compatibility-limiting process which will RW probably not prove feasible. RW Just as an example of some potential suffering: suppose your system RW creates and destroyes about 200 processes a second, as it's a fairly RW heavily loaded user and web server. Such as system takes about five and a RW half minutes to recycle the pid space. If your sendmail daemon has pid RW 238 (or some other low pid), and dies, it takes about five minutes for RW some other process to adopt pid 238. However, /var/run/sendmail.pid will RW still contain 238, as it wasn't cleaned up due to untimely death. Sending RW a HUP signal to 238 could do a number of nasty things, including logging RW you out if it's your SSH daemon :-). Using IPC to manage the daemon, in RW the style of newer named versions, works well as long as you know the RW daemon is still functional--certainly much better than signals, with the RW exception of forceful termination. RW Robert N M Watson FreeBSD Core Team, TrustedBSD Project RW [EMAIL PROTECTED] NAI Labs, Safeport Network Services My advice is: Every scripts in rc.d has a status check function, for example: nfsd.sh status, if this command exits status 0 then it is runing otherwise it is not running. let every script implements its own method to detect if its daemon is up or down. rudely detect a /var/run/${daemon}.pid file is just a typical way. now every scripts should contains 3 functions: start stop status -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
import NetBSD rc system
Hello, Is there any plan to import NetBSD rc system, I am willing to see it appears in FreeBSD 5.0. -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re[2]: import NetBSD rc system
Hello Sergey, Tuesday, June 12, 2001, 7:24:13 AM, you wrote: SB Warner Losh wrote: In message [EMAIL PROTECTED] Mark Santcroos writes: : Can it be called SysV style? Or not seperated in that way? : (I must say, the big ugly rc thing is the only thing I don't like about : FreeBSD, I'm very much in favor of the SysV style init. But thats another : war ;) It specifically isn't SysV style. It works. SysV style encodes the startup order in the file NAMES. The NetBSD rc system encodes it in the files themselves. A big improvement. SB Or a drawback. Encoding the order in the names makes changing SB the order or disabling some files easy, without any neccessity to SB edit the contents of the files. SB Though I haven't seen the NetBSD approach, maybe it actually is better. SB -SB It is based a concept: dependence, it's a more advance idea than SysV, for example: NFS relies on network, if network is not started, and if you start NFS, it will automatic start network, if network starts failure, NFS start is aborted. of course it has SysV capability, you can start and stop individual daemon or subsystem : for example: $/etc/rc.d/nfs stop $/etc/rc.d/nfs start -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re[2]: import NetBSD rc system
Hello Matt, Tuesday, June 12, 2001, 11:05:27 AM, you wrote: MD : MD :On Mon, Jun 11, 2001 at 08:37:49PM -0500, Andrew Hesford ([EMAIL PROTECTED]) wrote: : New modules? Isn't that just the same as /usr/local/etc/rc.d/ ? I side : with Mr. Dillon, I hope things stay the way they are. MD : MD :You acted rashly. It's like /usr/local/etc/rc.d, only it becomes MD :extended to the base system, so that we can have /etc/rc.d/* MD :{stop,start,restart,*} for all the daemons. It makes taking care MD :of our base system daemons easier. MD : MD :-- MD :wca MD All I care about is /etc/rc.conf ... I like the idea of splitting MD the various other rc files into pieces as long as I can control them MD all from /etc/rc.conf. If it's extensible that's even better! MD What I really hate is the SysV/Linux/Solaris style of rc.d configuration MD directories where you create/maintain softlinks in specially named MD directories (named after the run level) to a master set of MD startup files. Blech. Yuch. Ptooey! MD -Matt It seems it fits your need, it has a rc.conf file to control whole rc system and has a rc.d directory but havn't to maintain symbol links. SysV has several run levels, it has to have many symbol links, BSD hasn't. -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: sysctl to disable reboot
Hello Jon, Tuesday, May 22, 2001, 12:30:44 AM, you wrote: JP I thought it would be useful to have a sysctl for disabling the JP keyboard reboot sequence. This functionality is currently JP available through the SC_DISABLE_REBOOT config option, but it's JP convenient to have this capability available at runtime, too. JP I can't say I'm much of a kernel hacker, but the attached patch JP works fine for me. It applies against 4.3-STABLE, but the same JP logic applied to 5.0-CURRENT (which I don't have available for JP testing). JP Note that investigation revealed that OpenBSD has a similar JP sysctl named machdep.kdbreset. I prefer machdep.disable_reboot_key, JP but I'm against changing it for feel-good compatibility reasons. JP If someone with clue feels there is merit in this, feel free to JP commit it. I have already failed many times to persuade them to add a sysctl about keyboard reboot, they prefer to change a keymap file and allow everyone to load it into kernel. -- Gook luck, David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
Re: KSD
AFAIK, KSE is not started currently. you might have a look NetBSD, there is a branch for SA project, it seems they changed proc to lwp. many code has already been commited. Regards, David Xu - Original Message - From: Tim Wiess [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, May 10, 2001 2:31 AM Subject: KSD Hello, Lately I have been reading about the work that is being done to support kernel scheduled entities in FreeBSD. I am very anxious to look at the code for this and see if I might be able to contribute anything to the project. Although, it doesn't look like the code has been merged into CURRENT yet. Is this true? If so, does anyone know where the current development for this is being done? Thanks. tim To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-hackers in the body of the message
vm balance
I heard NetBSD has implemented a FreeBSD like VM, it also implemented a VM balance in recent verion of NetBSD. some parameters like TEXT, DATA and anonymous memory space can be tuned. is there anyone doing such work on FreeBSD or has FreeBSD already implemented it? -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re[2]: status of KSE?
Hello Julian, Friday, March 16, 2001, 12:18:15 PM, you wrote: JE David Xu wrote: I wonder status of KSE, I am dreaming rewrite our application server using kqueue+pthread(KSE), current, we use poll()+pthread because pthread does not work with kqueue at present. -- Best regards, David Xu JE KSE is not into coding yet. JE we have a basic design and have soem documents but JE have been waiting for the SMPng stuff to settle a bit before we JE hit the kernel with a second huge change. JE It will not be ready for a long time. do not assume that it JE will be ready for when you need it becasue it will not. I know KSE is not related to SMP and will run on UP. my primary idea is want to run parellel I/O task in same process with pthread, simply because FreeBSD pthread does not allow me to do multipile I/O tasks at same time on disk file, of course, it is also conflicted with SYSV IPC, so I think of KSE. I don't care about SMP, CPU is enough fast now, I have already seen 1.3G hz CPU, how fast! I think Intel and AMD can very easy to double their CPU clock, hope I can see 3Ghz CPU in next year. I really do think KSE should work before SMP, but it is obvious not. think about Apache 2.0, it is already multi-threaded, FreeBSD pthread will be blocked at disk I/O, it is very bad for Apache 2.0 . -- Best regards, David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
Re: Missing support in FreeBSD for large file sizes?
Hello sthaug, Tuesday, March 06, 2001, 1:24:24 AM, you wrote: snn According to the "Maxtor picks Windows, dumps open source" article at snn http://news.cnet.com/news/0-1003-200-5009496.html?tag=lh snn FreeBSD "did not support large file sizes, Macintosh and newer Novell snn file systems, or backup and management software from companies such as snn OpenView, Tivoli and Microsoft". snn Now I can understand what they say about missing Tivoli support - we're snn using Tivoli backup here ourselves, and the SCO ADSM/TSM client that we snn currently use to backup FreeBSD is passable, but nothing more. A native snn FreeBSD client would be much preferable. snn What I can't understand is the reference to missing support for large snn file sizes - as far as I know, that's one of FreeBSD's strengths! Anybody snn care to guess what they mean here? snn Steinar Haug, Nethelp consulting, [EMAIL PROTECTED] snn To Unsubscribe: send mail to [EMAIL PROTECTED] snn with "unsubscribe freebsd-hackers" in the body of the message this is a stupid decision, let they go. AFAIK, Windows 2000 does not support dump file system to tape or other medias. I was a Windows NT system manager, I know I don't believe all backup softwares for Windows NT, simply because Windows NT system can not be fully backuped. FreeBSD can do, it's strength of Unix File System. -- Best regards, David Xu mailto:[EMAIL PROTECTED] To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message
today cvsup and netstat broken
I have cvsuped 4.2-STABLE today, make buildword and mergemaster, after reboot, netstat no longer show TCP connections: %netstat -n -a -finet Active Internet connections (including servers) Proto Recv-Q Send-Q Local Address Foreign Address(state) udp4 0 0 *.3130 *.* udp4 0 0 *.**.* udp4 0 0 192.168.1.97.138 *.* udp4 0 0 192.168.1.97.137 *.* udp4 0 0 *.138 *.* udp4 0 0 *.137 *.* udp4 0 0 *.518 *.* udp4 0 0 *.512 *.* udp4 0 0 *.111 *.* udp4 0 0 *.514 *.* % note that I telneted to the FreeBSD machine, the command did not show my telnet connection. -- David Xu To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message