Re: svn commit: r220433 - in head/sys: amd64/amd64 cddl/dev/dtrace/amd64 cddl/dev/dtrace/i386 compat/linprocfs contrib/altq/altq dev/acpica i386/i386 i386/isa pc98/pc98 x86/cpufreq x86/isa x86/x86

Bruce Evans Fri, 08 Apr 2011 12:17:59 -0700

On Fri, 8 Apr 2011, Jung-uk Kim wrote:

On Friday 08 April 2011 10:54 am, Bruce Evans wrote:

On Thu, 7 Apr 2011, Jung-uk Kim wrote:

Log:
 Use atomic load & store for TSC frequency.  It may be overkill
for amd64 but safer for i386 because it can be easily over 4 GHz
now.  More worse, it can be easily changed by user with
'machdep.tsc_freq' tunable (directly) or cpufreq(4) (indirectly).
...


I disagree with this complexity and churn, and tried to avoid it --
see previous mails for more details.


I don't like it myself.  It's sort of my way of expressing "do not use
64-bit global variables on 32-bit platforms unless the architecture
guarantees coherency". ;-)

Seriously, using variant TSC is strongly discouraged for timekeeping.
Messing up with it from user land is worse.  If it is really
necessary, it means kernel is broken and we need to fix it.


No, changing it from userland is the correct method.  It is to fix up
the kernel's miscalibration of the TSC (variant or otherwise).  Userland
can use long-running test where the kernel can only afford to do a
short sloppy test at boot time.  The kernel can use long-running tests
after boot time, but this gives complexity in the kernel.

There are probably many 64-bit variables that need to be accessed
more atomically, and tsc_freq is one of the least in need of atomic
accesses, since the race windows are especially tiny for it.  Even
when the race is lost, this is harmless except for frequencies that
are almost exactly 2^32 Hz, since only sysctls that change the
frequency from above this to below, or vice versa, can give an
inconsistent value. Thus there is a race window of a few nsec which
only matters with a probability of about 1 in 2^32, only for
user[s] of the sysctl to change the TSC frequency.


powerd(8) via cpufreq(4) can change tsc_freq frequently if the CPU is
not P-state invariant.  Well, the chances of getting ~4.3GHz CPU
without P-state invariant TSC is little low, I guess.


:-).

Other changes to tsc_freq also break tuning by the sysctl.  My tuning
from userland actually determines the frequency by applying a constant
scale factor to a variable frequency from a table, but I only use this
on old machine which don't have cpufreq(4) etc., so I only do it once
at boot time and don't have to worry about cpufreq(4) messing it up.
(The frequency can be changed across boots by either reconfiguring the
BIOS, or on one machine, by using a CPU with a different frequency.
The scale factor for all this is constant because all clocks are derived
from a single clock in hardware.)  Applying a constant scale factor
to the frequencies set by cpufreq(4) should work in the same way, but
you have to know these frequencies accurately and could just use the
actual freqencies.

The complexity of this shows that you should try hard not to use
64-bit variables on 32-bit systems if accesses to them are not
locked by a mutex.  64-bit systems like amd64 don't need to do
anything special, any more than 32-bit systems accessing 32-bit
variables.  We assume that accesses of <= the bus width are atomic
in a certain weak sense which is enough for variables like
tsc_freq.


Actually, Pentium and above guarantees the "weak" atomic memory
read/write when it is aligned to quad-word boundary.  CMPXCHG8B is
the only one instruction (except for FPU) useful for that feature,
though.  I considered implementing weaker version initially but I am
not sure if it has any benefit.


I think this is a bit x86-specific, and having to do anything that
the compiler won't do automatically makes it too hard to use for an
average variable.  Does just using aligned FPU/MMX/XMM instructions
give atomic operations?

I thought of the following KPI-transparent method that can be used
for any struct if updates need not be seen immediately.  It is a
refined version of timehands method used for timecounters:

        #define tsc_freq        _tsc_freq[_tsc_gen]


It just hides the implementation.  How about this, then?

#define tsc_freq        atomic_load_acq_64(&_tsc_freq)

:-P


Not quite the same, since the atomic op is heavyweight.  You used
lots of extra code to avoid multiple accesses to it.  The array
access is only slightly heavier weight than a variable access.
If there are multiple accesses, then _tsc_gen if not the whole
_tsc_freq[_tsc_gen] should be cached in a register.  Even extra

writes to these variables, and sfences, aren't a problem withmy version, since they are only done in the unusual case where

the high bits are changed.

where the everything is updated sufficiently atomically and the
object data is guaranteed not to change for a suitably long time
after the generation that indexes it is read:

        if ((new_tsc_freq >> 32) == (tsc_freq >> 32)) {
                /*
                 * Usual case.  Below 2**32, or upper bits just not
                 * changing.  No need for complications.
                 */
                tsc_freq = new_tsc_freq;
        else if ((new_tsc_freq & 0xFFFFFFFF) == (tsc_freq & 0xFFFFFFFF))
{ /*
                 * Upper bits changing but lower bits not.  Very unusual,
                 * and optimized here to give a place for this comment.
                 * Again, no need for complications.
                 */
                tsc_freq = new_tsc_freq;
        } else {


Perhaps your version can avoid the atomic op for the usual case of writing,
as above, but this isn't very useful since most accesses are reads which
need it.

...


You see the complexity is still there and I don't like it.


Yes, all this is a waste of time for just tsc_freq.  It may be needed
for other variables.

Bruce
_______________________________________________
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"

Re: svn commit: r220433 - in head/sys: amd64/amd64 cddl/dev/dtrace/amd64 cddl/dev/dtrace/i386 compat/linprocfs contrib/altq/altq dev/acpica i386/i386 i386/isa pc98/pc98 x86/cpufreq x86/isa x86/x86

Reply via email to