Re: [GSoc] Timeconter Performance Improvements

2011-03-27 Thread Warner Losh

On Mar 27, 2011, at 10:29 PM, Julian Elischer wrote:

> On 3/27/11 3:32 PM, Warner Losh wrote:
>> On Mar 26, 2011, at 8:43 AM, Jing Huang wrote:
>> 
>>> Hi,
>>> 
>>> Thanks for you all sincerely. Under your guidance, I read the
>>> specification of TSC in Intel Manual and learned the hardware feature
>>> of TSC:
>>> 
>>> Processor families increment the time-stamp counter differently:
>>>   • For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 
>>> 4
>>> processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]);
>>> and for P6 family processors: the time-stamp counter increments with every
>>> internal processor clock cycle.
>>> 
>>>   • For Pentium 4 processors, Intel Xeon processors (family [0FH],
>>> models [03H and
>>> higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], 
>>> model
>>> [0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo 
>>> processors
>>> (family [06H], model [0FH]); for Intel Core 2 and Intel Xeon processors 
>>> (family
>>> [06H], display_model [17H]); for Intel Atom processors (family [06H],
>>> display_model [1CH]): the time-stamp counter increments at a constant rate.
>>> 
>>> Maybe we would implement gettimeofday as fellows. Firstly, use cpuid
>>> to find the family and models of current CPU. If the CPU support
>>> constant TSC, we look up the shared page and calculate the precise
>>> time in usermode. If the platform has invariant TSCs, and we just
>>> fallback to a syscall. So, I think a single global shared page maybe
>>> proper.
>> I think that the userspace portion should be more like:
>> 
>> int kernel_time_type) SECTION(shared);
>> struct tsc_goo tsc_time_data SECTION(shared);
>> 
>> switch (kernel_time_type) {
>> case 1:
>>  /* code to use tsc_time_data to return time */
>>  break;
>> default:
>>  /* call the kernel */
>> }
>> 
>> I think we should avoid hard-coding lists of CPU families in userland.  The 
>> kernel init routines will decide, based on the CPU type and other stuff if 
>> this optimization can be done.  This would allow the kernel to update to 
>> support new CPU types without needing to churn libc.
>> 
>> Warner
>> 
>> P.S.  The SECTION(shared) notation above just means that the variables are 
>> in the shared page.
> 
> As has been mentioned here and there, the gold-standard way for doing this is 
> for the kernel to export a special memory region
> in elf format that can be linked to with exported kernel sanctioned code 
> snippets specially tailored for the cpu/OS/binray-format
> in question. There is no real security risk to this but potential upsides are 
> great.

You'll have to map multiple pages if you do this: one for the data that has to 
be exported from the kernel and one that has to be the executable code.  I 
don't think this is necessarily the "gold standard" at all.  I think it is 
overkill that we'll grow to regret.

My method you'll have the code 100% in userland, where it belongs.  If you want 
to map CPU-type-specific code, add it to ld.so.

Warner

>>> 
>>> On Sat, Mar 26, 2011 at 10:12 PM, John Baldwin  wrote:
 On Saturday, March 26, 2011 08:16:46 am Peter Jeremy wrote:
> On 2011-Mar-25 08:18:38 -0400, John Baldwin  wrote:
>> For modern Intel CPUs you can just assume that the TSCs are in sync 
>> across
>> packages.  They also have invariant TSC's meaning that the frequency
>> doesn't change.
> Synchronised P-state invariant TSCs vastly simplify the problem but
> not everyone has them.  Should the fallback be more complexity to
> support per-CPU TSC counts and varying frequencies or a fallback to
> reading the time via a syscall?
 I think we should just fallback to a syscall in that case.  We will also 
 need
 to do that if the TSC is not used as the timecounter (or always duplicate 
 the
 ntp_adjtime() work we do for the current timecounter for the TSC 
 timecounter).
 
 Doing this easy case may give us the most bang for the buck, and it is 
 also a
 good first milestone.  Once that is in place we can decide what the value 
 is
 in extending it to support harder variations.
 
 One thing we do need to think about is if the shared page should just 
 export a
 fixed set of global data, or if it should export routines.  The latter
 approach is more complex, but it makes the ABI boundary between userland 
 and
 the kernel more friendly to future changes.  I believe Linux does the 
 latter
 approach?
 
 --
 John Baldwin
 
>>> ___
>>> freebsd-hackers@freebsd.org mailing list
>>> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
>>> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
>>> 
>>> 
>> ___
>> freebsd-hackers@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-h

Re: [GSoc] Timeconter Performance Improvements

2011-03-27 Thread Julian Elischer

On 3/27/11 3:32 PM, Warner Losh wrote:

On Mar 26, 2011, at 8:43 AM, Jing Huang wrote:


Hi,

Thanks for you all sincerely. Under your guidance, I read the
specification of TSC in Intel Manual and learned the hardware feature
of TSC:

Processor families increment the time-stamp counter differently:
   • For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4
processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]);
and for P6 family processors: the time-stamp counter increments with every
internal processor clock cycle.

   • For Pentium 4 processors, Intel Xeon processors (family [0FH],
models [03H and
higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], model
[0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors
(family [06H], model [0FH]); for Intel Core 2 and Intel Xeon processors (family
[06H], display_model [17H]); for Intel Atom processors (family [06H],
display_model [1CH]): the time-stamp counter increments at a constant rate.

Maybe we would implement gettimeofday as fellows. Firstly, use cpuid
to find the family and models of current CPU. If the CPU support
constant TSC, we look up the shared page and calculate the precise
time in usermode. If the platform has invariant TSCs, and we just
fallback to a syscall. So, I think a single global shared page maybe
proper.

I think that the userspace portion should be more like:

int kernel_time_type) SECTION(shared);
struct tsc_goo tsc_time_data SECTION(shared);

switch (kernel_time_type) {
case 1:
/* code to use tsc_time_data to return time */
break;
default:
/* call the kernel */
}

I think we should avoid hard-coding lists of CPU families in userland.  The 
kernel init routines will decide, based on the CPU type and other stuff if this 
optimization can be done.  This would allow the kernel to update to support new 
CPU types without needing to churn libc.

Warner

P.S.  The SECTION(shared) notation above just means that the variables are in 
the shared page.


As has been mentioned here and there, the gold-standard way for doing 
this is for the kernel to export a special memory region
in elf format that can be linked to with exported kernel sanctioned 
code snippets specially tailored for the cpu/OS/binray-format
in question. There is no real security risk to this but potential 
upsides are great.


On Sat, Mar 26, 2011 at 10:12 PM, John Baldwin  wrote:

On Saturday, March 26, 2011 08:16:46 am Peter Jeremy wrote:

On 2011-Mar-25 08:18:38 -0400, John Baldwin  wrote:

For modern Intel CPUs you can just assume that the TSCs are in sync across
packages.  They also have invariant TSC's meaning that the frequency
doesn't change.

Synchronised P-state invariant TSCs vastly simplify the problem but
not everyone has them.  Should the fallback be more complexity to
support per-CPU TSC counts and varying frequencies or a fallback to
reading the time via a syscall?

I think we should just fallback to a syscall in that case.  We will also need
to do that if the TSC is not used as the timecounter (or always duplicate the
ntp_adjtime() work we do for the current timecounter for the TSC timecounter).

Doing this easy case may give us the most bang for the buck, and it is also a
good first milestone.  Once that is in place we can decide what the value is
in extending it to support harder variations.

One thing we do need to think about is if the shared page should just export a
fixed set of global data, or if it should export routines.  The latter
approach is more complex, but it makes the ABI boundary between userland and
the kernel more friendly to future changes.  I believe Linux does the latter
approach?

--
John Baldwin


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [GSoc] Timeconter Performance Improvements

2011-03-27 Thread Mark Tinguely

On 3/27/2011 5:32 PM, Warner Losh wrote:

On Mar 26, 2011, at 8:43 AM, Jing Huang wrote:


Hi,

Thanks for you all sincerely. Under your guidance, I read the
specification of TSC in Intel Manual and learned the hardware feature
of TSC:

Processor families increment the time-stamp counter differently:
   • For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4
processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]);
and for P6 family processors: the time-stamp counter increments with every
internal processor clock cycle.

   • For Pentium 4 processors, Intel Xeon processors (family [0FH],
models [03H and
higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], model
[0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors
(family [06H], model [0FH]); for Intel Core 2 and Intel Xeon processors (family
[06H], display_model [17H]); for Intel Atom processors (family [06H],
display_model [1CH]): the time-stamp counter increments at a constant rate.

Maybe we would implement gettimeofday as fellows. Firstly, use cpuid
to find the family and models of current CPU. If the CPU support
constant TSC, we look up the shared page and calculate the precise
time in usermode. If the platform has invariant TSCs, and we just
fallback to a syscall. So, I think a single global shared page maybe
proper.

I think that the userspace portion should be more like:

int kernel_time_type) SECTION(shared);
struct tsc_goo tsc_time_data SECTION(shared);

switch (kernel_time_type) {
case 1:
/* code to use tsc_time_data to return time */
break;
default:
/* call the kernel */
}

I think we should avoid hard-coding lists of CPU families in userland.  The 
kernel init routines will decide, based on the CPU type and other stuff if this 
optimization can be done.  This would allow the kernel to update to support new 
CPU types without needing to churn libc.

Warner

P.S.  The SECTION(shared) notation above just means that the variables are in 
the shared page.



On Sat, Mar 26, 2011 at 10:12 PM, John Baldwin  wrote:

On Saturday, March 26, 2011 08:16:46 am Peter Jeremy wrote:

On 2011-Mar-25 08:18:38 -0400, John Baldwin  wrote:

For modern Intel CPUs you can just assume that the TSCs are in sync across
packages.  They also have invariant TSC's meaning that the frequency
doesn't change.

Synchronised P-state invariant TSCs vastly simplify the problem but
not everyone has them.  Should the fallback be more complexity to
support per-CPU TSC counts and varying frequencies or a fallback to
reading the time via a syscall?

I think we should just fallback to a syscall in that case.  We will also need
to do that if the TSC is not used as the timecounter (or always duplicate the
ntp_adjtime() work we do for the current timecounter for the TSC timecounter).

Doing this easy case may give us the most bang for the buck, and it is also a
good first milestone.  Once that is in place we can decide what the value is
in extending it to support harder variations.

One thing we do need to think about is if the shared page should just export a
fixed set of global data, or if it should export routines.  The latter
approach is more complex, but it makes the ABI boundary between userland and
the kernel more friendly to future changes.  I believe Linux does the latter
approach?

--
John Baldwin


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"



___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"



If a user process can perform a rfork(2) or rfork_thread(3) with RFMEM 
option, then can't the same page table be active on multiple processors? 
Mapping per CPU page(s) to a fixed user addess(es) would only hold the 
last switched cpu's information.


x86 architectures use a segment pointer to keep the kernel per cpu 
information current.



--Mark Tinguely.


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [GSoc] Timeconter Performance Improvements

2011-03-27 Thread Warner Losh

On Mar 26, 2011, at 8:43 AM, Jing Huang wrote:

> Hi,
> 
> Thanks for you all sincerely. Under your guidance, I read the
> specification of TSC in Intel Manual and learned the hardware feature
> of TSC:
> 
> Processor families increment the time-stamp counter differently:
>   • For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4
> processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]);
> and for P6 family processors: the time-stamp counter increments with every
> internal processor clock cycle.
> 
>   • For Pentium 4 processors, Intel Xeon processors (family [0FH],
> models [03H and
> higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], 
> model
> [0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo 
> processors
> (family [06H], model [0FH]); for Intel Core 2 and Intel Xeon processors 
> (family
> [06H], display_model [17H]); for Intel Atom processors (family [06H],
> display_model [1CH]): the time-stamp counter increments at a constant rate.
> 
> Maybe we would implement gettimeofday as fellows. Firstly, use cpuid
> to find the family and models of current CPU. If the CPU support
> constant TSC, we look up the shared page and calculate the precise
> time in usermode. If the platform has invariant TSCs, and we just
> fallback to a syscall. So, I think a single global shared page maybe
> proper.

I think that the userspace portion should be more like:

int kernel_time_type) SECTION(shared);
struct tsc_goo tsc_time_data SECTION(shared);

switch (kernel_time_type) {
case 1:
/* code to use tsc_time_data to return time */
break;
default:
/* call the kernel */
}

I think we should avoid hard-coding lists of CPU families in userland.  The 
kernel init routines will decide, based on the CPU type and other stuff if this 
optimization can be done.  This would allow the kernel to update to support new 
CPU types without needing to churn libc.

Warner

P.S.  The SECTION(shared) notation above just means that the variables are in 
the shared page.

> 
> 
> On Sat, Mar 26, 2011 at 10:12 PM, John Baldwin  wrote:
>> On Saturday, March 26, 2011 08:16:46 am Peter Jeremy wrote:
>>> On 2011-Mar-25 08:18:38 -0400, John Baldwin  wrote:
 For modern Intel CPUs you can just assume that the TSCs are in sync across
 packages.  They also have invariant TSC's meaning that the frequency
 doesn't change.
>>> 
>>> Synchronised P-state invariant TSCs vastly simplify the problem but
>>> not everyone has them.  Should the fallback be more complexity to
>>> support per-CPU TSC counts and varying frequencies or a fallback to
>>> reading the time via a syscall?
>> 
>> I think we should just fallback to a syscall in that case.  We will also need
>> to do that if the TSC is not used as the timecounter (or always duplicate the
>> ntp_adjtime() work we do for the current timecounter for the TSC 
>> timecounter).
>> 
>> Doing this easy case may give us the most bang for the buck, and it is also a
>> good first milestone.  Once that is in place we can decide what the value is
>> in extending it to support harder variations.
>> 
>> One thing we do need to think about is if the shared page should just export 
>> a
>> fixed set of global data, or if it should export routines.  The latter
>> approach is more complex, but it makes the ABI boundary between userland and
>> the kernel more friendly to future changes.  I believe Linux does the latter
>> approach?
>> 
>> --
>> John Baldwin
>> 
> ___
> freebsd-hackers@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
> To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
> 
> 

___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [GSoc] Timeconter Performance Improvements

2011-03-26 Thread Julian Elischer

On 3/25/11 1:24 AM, Peter Jeremy wrote:

On 2011-Mar-24 17:00:02 +0800, Jing Huang  wrote:

 In this scenario, I plan to use both tsc and shared memory to
calculate precise time in user mode. The shared memory includes
system_time, tsc_system_time and factor_tsc-system_time.

This sounds like a reasonable approach to me.  Note that once we
implement a shared page, there is probably a variety of other
information we could usefully place on that page.

SunOS 4.x included a page of shared memory per CPU.  This was mapped
as an array (indexed by CPU number) at one address and the page
reflecting the current CPU was additionally mapped at another fixed
address.  This allowed a process to both refer to data on its CPU
as well any CPU on the system.


 We also consider the CPU frequency, because tsc counter is
related to it. When kernel changes CPU frequency, the shared memory
should be update subsequently.

Two issues with this, particularly on x86 without invariant TSC:
- looking up the current CPU frequency may not be a cheap operation
- the reported CPU frequency appears to be just an approximate value,
   rather than the actual TSC frequency.

On 2011-Mar-24 21:34:35 +0800, Jing Huang  wrote:

As I know, tsc counter is CPU specific. If the process running on
a multi-core platform, we must consider switching problem. The one
way, we can let the kernel to take of this. When switching to another
CPU, the kernel will reset the shared memory according to the new CPU.

I'm not sure what the cost of managing this page mapping will be.


The second way, we can use CPUID instruction to get the info of
current CPU, which can be executed in user mode ether. At the same
time, the kernel maintains shared memory for each CPU. When invoke
gettimeofday, the function will calculate precise time with current
CPU's shared memory.

This approach suffers from a race condition between the CPUID
instruction and accessing the appropriate shared page - there is the
potential for an interrupt causing the process to be switched to a
different CPU, resulting in an incorrect page being accessed.



The shared page(s) can be in the form of an elf module that is linked 
with the process at load time.

that way you can put cpu-specific code snippets there as well.
when using  a shared page to modify the TSC value read, one also needs to
tempirarily lock the cpu you are on between the time you read the 
calibration value and
the time you read the TSC.. A user process has only limited ability to 
do that.




___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [GSoc] Timeconter Performance Improvements

2011-03-26 Thread Warner Losh

On Mar 26, 2011, at 8:12 AM, John Baldwin wrote:

> On Saturday, March 26, 2011 08:16:46 am Peter Jeremy wrote:
>> On 2011-Mar-25 08:18:38 -0400, John Baldwin  wrote:
>>> For modern Intel CPUs you can just assume that the TSCs are in sync across
>>> packages.  They also have invariant TSC's meaning that the frequency
>>> doesn't change.
>> 
>> Synchronised P-state invariant TSCs vastly simplify the problem but
>> not everyone has them.  Should the fallback be more complexity to
>> support per-CPU TSC counts and varying frequencies or a fallback to
>> reading the time via a syscall?
> 
> I think we should just fallback to a syscall in that case.  We will also need 
> to do that if the TSC is not used as the timecounter (or always duplicate the 
> ntp_adjtime() work we do for the current timecounter for the TSC timecounter).

Logically, the code should look like:
if (can_do_fast_time)
do_the_fast_time
else
call the kernel

We can expand what can or can't do the fast time later once we get the basics 
working.

> Doing this easy case may give us the most bang for the buck, and it is also a 
> good first milestone.  Once that is in place we can decide what the value is 
> in extending it to support harder variations.

Agreed.

> One thing we do need to think about is if the shared page should just export a
> fixed set of global data, or if it should export routines.  The latter 
> approach is more complex, but it makes the ABI boundary between userland and 
> the kernel more friendly to future changes.  I believe Linux does the latter 
> approach?

There's nothing that says we can't couple this with loading a cpu-specific 
shared library, which would also insulate things.

Having a single page of both data and code strikes me as unwise.  Having one of 
each wouldn't be too bad.

Warner___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [GSoc] Timeconter Performance Improvements

2011-03-26 Thread Kostik Belousov
On Sat, Mar 26, 2011 at 10:12:32AM -0400, John Baldwin wrote:
> On Saturday, March 26, 2011 08:16:46 am Peter Jeremy wrote:
> > On 2011-Mar-25 08:18:38 -0400, John Baldwin  wrote:
> > >For modern Intel CPUs you can just assume that the TSCs are in sync across
> > >packages.  They also have invariant TSC's meaning that the frequency
> > >doesn't change.
> > 
> > Synchronised P-state invariant TSCs vastly simplify the problem but
> > not everyone has them.  Should the fallback be more complexity to
> > support per-CPU TSC counts and varying frequencies or a fallback to
> > reading the time via a syscall?
> 
> I think we should just fallback to a syscall in that case.  We will also need 
> to do that if the TSC is not used as the timecounter (or always duplicate the 
> ntp_adjtime() work we do for the current timecounter for the TSC timecounter).
> 
> Doing this easy case may give us the most bang for the buck, and it is also a 
> good first milestone.  Once that is in place we can decide what the value is 
> in extending it to support harder variations.
> 
> One thing we do need to think about is if the shared page should just export a
> fixed set of global data, or if it should export routines.  The latter 
> approach is more complex, but it makes the ABI boundary between userland and 
> the kernel more friendly to future changes.  I believe Linux does the latter 
> approach?
Linux uses a so-called vdso, which is linked into the process.

I think that the efforts to implement a vdso approximately equal to the
efforts required to implement timecounters in the user mode. On the
other hand, with vdso we could properly annotate signal trampolines
with the unwind info, that is also a big win.


pgpbOEkvvqnQ4.pgp
Description: PGP signature


Re: [GSoc] Timeconter Performance Improvements

2011-03-26 Thread Jing Huang
Hi,

 Thanks for you all sincerely. Under your guidance, I read the
specification of TSC in Intel Manual and learned the hardware feature
of TSC:

Processor families increment the time-stamp counter differently:
   • For Pentium M processors (family [06H], models [09H, 0DH]); for Pentium 4
processors, Intel Xeon processors (family [0FH], models [00H, 01H, or 02H]);
and for P6 family processors: the time-stamp counter increments with every
internal processor clock cycle.

   • For Pentium 4 processors, Intel Xeon processors (family [0FH],
models [03H and
higher]); for Intel Core Solo and Intel Core Duo processors (family [06H], model
[0EH]); for the Intel Xeon processor 5100 series and Intel Core 2 Duo processors
(family [06H], model [0FH]); for Intel Core 2 and Intel Xeon processors (family
[06H], display_model [17H]); for Intel Atom processors (family [06H],
display_model [1CH]): the time-stamp counter increments at a constant rate.

Maybe we would implement gettimeofday as fellows. Firstly, use cpuid
to find the family and models of current CPU. If the CPU support
constant TSC, we look up the shared page and calculate the precise
time in usermode. If the platform has invariant TSCs, and we just
fallback to a syscall. So, I think a single global shared page maybe
proper.


On Sat, Mar 26, 2011 at 10:12 PM, John Baldwin  wrote:
> On Saturday, March 26, 2011 08:16:46 am Peter Jeremy wrote:
>> On 2011-Mar-25 08:18:38 -0400, John Baldwin  wrote:
>> >For modern Intel CPUs you can just assume that the TSCs are in sync across
>> >packages.  They also have invariant TSC's meaning that the frequency
>> >doesn't change.
>>
>> Synchronised P-state invariant TSCs vastly simplify the problem but
>> not everyone has them.  Should the fallback be more complexity to
>> support per-CPU TSC counts and varying frequencies or a fallback to
>> reading the time via a syscall?
>
> I think we should just fallback to a syscall in that case.  We will also need
> to do that if the TSC is not used as the timecounter (or always duplicate the
> ntp_adjtime() work we do for the current timecounter for the TSC timecounter).
>
> Doing this easy case may give us the most bang for the buck, and it is also a
> good first milestone.  Once that is in place we can decide what the value is
> in extending it to support harder variations.
>
> One thing we do need to think about is if the shared page should just export a
> fixed set of global data, or if it should export routines.  The latter
> approach is more complex, but it makes the ABI boundary between userland and
> the kernel more friendly to future changes.  I believe Linux does the latter
> approach?
>
> --
> John Baldwin
>
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [GSoc] Timeconter Performance Improvements

2011-03-26 Thread John Baldwin
On Saturday, March 26, 2011 08:16:46 am Peter Jeremy wrote:
> On 2011-Mar-25 08:18:38 -0400, John Baldwin  wrote:
> >For modern Intel CPUs you can just assume that the TSCs are in sync across
> >packages.  They also have invariant TSC's meaning that the frequency
> >doesn't change.
> 
> Synchronised P-state invariant TSCs vastly simplify the problem but
> not everyone has them.  Should the fallback be more complexity to
> support per-CPU TSC counts and varying frequencies or a fallback to
> reading the time via a syscall?

I think we should just fallback to a syscall in that case.  We will also need 
to do that if the TSC is not used as the timecounter (or always duplicate the 
ntp_adjtime() work we do for the current timecounter for the TSC timecounter).

Doing this easy case may give us the most bang for the buck, and it is also a 
good first milestone.  Once that is in place we can decide what the value is 
in extending it to support harder variations.

One thing we do need to think about is if the shared page should just export a
fixed set of global data, or if it should export routines.  The latter 
approach is more complex, but it makes the ABI boundary between userland and 
the kernel more friendly to future changes.  I believe Linux does the latter 
approach?

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [GSoc] Timeconter Performance Improvements

2011-03-26 Thread Kostik Belousov
On Sat, Mar 26, 2011 at 11:16:46PM +1100, Peter Jeremy wrote:
> On 2011-Mar-25 08:18:38 -0400, John Baldwin  wrote:
> >For modern Intel CPUs you can just assume that the TSCs are in sync across 
> >packages.  They also have invariant TSC's meaning that the frequency doesn't 
> >change.
> 
> Synchronised P-state invariant TSCs vastly simplify the problem but
> not everyone has them.  Should the fallback be more complexity to
> support per-CPU TSC counts and varying frequencies or a fallback to
> reading the time via a syscall?
> 
> >I believe we already have a shared page (it holds the signal trampoline now)
> >for at least the x86 platform (probably some others as well).
> 
> r217151 for amd64 and r217400 for ppc.  It doesn't appear to be
> supported on other platforms.  My reading of the code is that there is
> a single shared page used by all processes/CPUs.  In order to support
> non-synchronised TSCs, this would need to be changed to per-CPU.
Not neccessary. If you have a reliable way to access proper private
per-CPU page from the array, then you could use the same method
to access the array in the single page.

IMO, per-cpu page in process address space at the same address
for all pages is too costly. I think we can target a modern hardware
for user-mode tsc, this is the kind of machines that are used for
benchmarks anyway.


pgpnxlUPO1v61.pgp
Description: PGP signature


Re: [GSoc] Timeconter Performance Improvements

2011-03-26 Thread Peter Jeremy
On 2011-Mar-25 08:18:38 -0400, John Baldwin  wrote:
>For modern Intel CPUs you can just assume that the TSCs are in sync across 
>packages.  They also have invariant TSC's meaning that the frequency doesn't 
>change.

Synchronised P-state invariant TSCs vastly simplify the problem but
not everyone has them.  Should the fallback be more complexity to
support per-CPU TSC counts and varying frequencies or a fallback to
reading the time via a syscall?

>I believe we already have a shared page (it holds the signal trampoline now)
>for at least the x86 platform (probably some others as well).

r217151 for amd64 and r217400 for ppc.  It doesn't appear to be
supported on other platforms.  My reading of the code is that there is
a single shared page used by all processes/CPUs.  In order to support
non-synchronised TSCs, this would need to be changed to per-CPU.

-- 
Peter Jeremy


pgpTiRyo5tsg4.pgp
Description: PGP signature


Re: [GSoc] Timeconter Performance Improvements

2011-03-25 Thread John Baldwin
On Thursday, March 24, 2011 9:34:35 am Jing Huang wrote:
> Hi,
> 
>Thanks for your replay. That is just my self-introduction:) I want
> to borrow the shared memory idea from KVM, I am not want to port a
> whole KVM:)  But for this project, there are some basic problems.
> 
> As I know, tsc counter is CPU specific. If the process running on
> a multi-core platform, we must consider switching problem. The one
> way, we can let the kernel to take of this. When switching to another
> CPU, the kernel will reset the shared memory according to the new CPU.
> The second way, we can use CPUID instruction to get the info of
> current CPU, which can be executed in user mode ether. At the same
> time, the kernel maintains shared memory for each CPU. When invoke
> gettimeofday, the function will calculate precise time with current
> CPU's shared memory.
> 
>I don't know which is better? Could I need to deal other problems?

For modern Intel CPUs you can just assume that the TSCs are in sync across 
packages.  They also have invariant TSC's meaning that the frequency doesn't 
change.  You can easily export a copy of the current 'timehands' structure 
when the TSC is used as the timecounter and then just reimplement bintime() in 
userland.  This assumes you use the TSC as the kernel's timecounter, but you 
really need to do that so that ntpd_adjtime() is taken into account, etc.

That will give a very fast and very cheap timecounter.

I believe we already have a shared page (it holds the signal trampoline now)
for at least the x86 platform (probably some others as well).

-- 
John Baldwin
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"


Re: [GSoc] Timeconter Performance Improvements

2011-03-25 Thread Peter Jeremy
On 2011-Mar-24 17:00:02 +0800, Jing Huang  wrote:
> In this scenario, I plan to use both tsc and shared memory to
>calculate precise time in user mode. The shared memory includes
>system_time, tsc_system_time and factor_tsc-system_time.

This sounds like a reasonable approach to me.  Note that once we
implement a shared page, there is probably a variety of other
information we could usefully place on that page.

SunOS 4.x included a page of shared memory per CPU.  This was mapped
as an array (indexed by CPU number) at one address and the page
reflecting the current CPU was additionally mapped at another fixed
address.  This allowed a process to both refer to data on its CPU
as well any CPU on the system.

> We also consider the CPU frequency, because tsc counter is
>related to it. When kernel changes CPU frequency, the shared memory
>should be update subsequently.

Two issues with this, particularly on x86 without invariant TSC:
- looking up the current CPU frequency may not be a cheap operation
- the reported CPU frequency appears to be just an approximate value,
  rather than the actual TSC frequency.

On 2011-Mar-24 21:34:35 +0800, Jing Huang  wrote:
>As I know, tsc counter is CPU specific. If the process running on
>a multi-core platform, we must consider switching problem. The one
>way, we can let the kernel to take of this. When switching to another
>CPU, the kernel will reset the shared memory according to the new CPU.

I'm not sure what the cost of managing this page mapping will be.

>The second way, we can use CPUID instruction to get the info of
>current CPU, which can be executed in user mode ether. At the same
>time, the kernel maintains shared memory for each CPU. When invoke
>gettimeofday, the function will calculate precise time with current
>CPU's shared memory.

This approach suffers from a race condition between the CPUID
instruction and accessing the appropriate shared page - there is the
potential for an interrupt causing the process to be switched to a
different CPU, resulting in an incorrect page being accessed.

-- 
Peter Jeremy


pgpHImAnkRcSI.pgp
Description: PGP signature


[GSoc] Timeconter Performance Improvements

2011-03-24 Thread Jing Huang
Hi,

   Thanks for your replay. That is just my self-introduction:) I want
to borrow the shared memory idea from KVM, I am not want to port a
whole KVM:)  But for this project, there are some basic problems.

As I know, tsc counter is CPU specific. If the process running on
a multi-core platform, we must consider switching problem. The one
way, we can let the kernel to take of this. When switching to another
CPU, the kernel will reset the shared memory according to the new CPU.
The second way, we can use CPUID instruction to get the info of
current CPU, which can be executed in user mode ether. At the same
time, the kernel maintains shared memory for each CPU. When invoke
gettimeofday, the function will calculate precise time with current
CPU's shared memory.

   I don't know which is better? Could I need to deal other problems?


Jing.
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"