This case has timed out and has a +1.  I'm closing as approved.

Thanks,
Sherry

On Wed, Aug 04, 2010 at 11:16:18AM -0700, Sherry Moore wrote:
> I am sponsoring the following fast-track for Lejun Zhu and Kuriakose
> Kuruvilla.  It requests a patch/micro binding.  Man pages with change
> bars are available in the materials directory.  Timeout is set to
> 8/11/2010.
> 
> Thanks,
> Sherry
> 
> Template Version: @(#)sac_nextcase 1.70 03/30/10 SMI
> This information is Copyright (c) 2010, Oracle and/or its affiliates. All 
> rights reserved.
> 1. Introduction
>    1.1. Project/Component Working Name:
>         Support Intel Advanced Vector Extensions (AVX) in Solaris
> 
>    1.2. Name of Document Author/Supplier:
>         Lejun Zhu
>         Kuriakose Kuruvilla
> 
>    1.3. Date of This Document:
>         Jul 14th, 2010
> 
>    1.4. Name of Major Document Customer(s)/Consumer(s):
>         1.4.1. The Community you expect to review your project:
>         1.4.2. The ARC(s) you expect to review your project:
>                 // Leave blank if you don't have any preference
>                 // This item is advisory only
> 
>    1.5. Email Aliases:
>         1.5.2. Responsible Engineer:
>               <lejun....@intel.com>
>               <kuriakose.kuruvi...@oracle.com>
>         1.5.4. Interest List: intel-c...@sun.com
> 
> 2. Project Summary
>    2.1. Project Description:
>         Intel Advanced Vector Extensions (AVX) introduces new instructions 
> that accelerate vector floating point operations. AVX uses 256-bit 
> registers, which requires extension of current Solaris interfaces that 
> manipulate FPU registers, such as signal stack layout, "setcontext" 
> syscall and /proc interface.
> 
>    2.2. Risks and Assumptions:
>         When extending Solaris interfaces and/or data structures to 
> support AVX, it is very important to provide binary compatibility for 
> existing applications. All application binaries that exist today will 
> continue to run on new Solaris kernel without having to be recompiled. The 
> only restriction for existing binaries is that they have enough space on 
> the signal stack to hold the extra state (see 4.1.2 for details).
> 
> 3. Business Summary
>    3.1. Problem Area:
>         Intel AVX is a new 256-bit SIMD FP vector extension of Intel 
> Architecture. Its introduction is targeted for the next Intel 
> Microarchitecture (code named: Sandy Bridge). Intel AVX accelerates the 
> trends towards FP intensive computation in general purpose applications 
> like image, video, and audio processing, engineering applications such as 
> 3D modeling and analysis, scientific simulation, and financial analytics.
> 
>    3.2. Market/Requester:
> 
>    3.3. Business Justification:
>         Customers who use Solaris x86 will expect to run optimized 
> applications on Sandy Bridge and future generations of Intel CPU, and many 
> optimizations will use AVX instructions, such as Basic Linear Algebra 
> Subprograms (BLAS) with DGEMM Routine, or sequential and cluster FFTs. 
> Also, the amd64 ABI has already supported YMM registers. Latest GCC can 
> generate AVX instructions, and an AVX-enabled Sun Studio compiler is being 
> developed. All of these will require kernel changes to support AVX.
> 
>    3.4. Competitive Analysis:
>         Support for XSAVE and YMM has already been implemented in Linux 
> kernel.
> 
>    3.5. Opportunity Window/Exposure:
>         Intel will support AVX instructions in the next generation Intel 
> Microarchitecture (code-named: Sandy Bridge). Applications optimized for 
> Sandy Bridge will emerge soon. In order to enable these optimizations on 
> Solaris, we need to get the OS support into ON as soon as possible.
> 
>    3.6. How will you know when you are done?:
>       Applications can run correctly and use YMM registers on Intel machines
> that support AVX/YMM registers.
> 
> 4. Technical Description:
>     4.1. Details:
>         4.1.1 Extending ucontext_t
>             Structure ucontext_t will have the same size as its previous 
> version and all existing fields will be at the same byte offset, except 
> part of its filler is used for xregs extension. A new flag UC_XREGS (0x10) 
> for the uc_flags field will be added. Any ucontext_t with this flag set is 
> considered to have the new layout described in this PSARC case. Any 
> ucontext_t with this flag not set in its uc_flags is considered to have 
> the original layout and its uc_xrs field will be ignored.
> 
>             A data structure will be defined as follows for both 32-bit 
> and 64-bit applications:
> 
>             #define XRS_ID  0x00737278 /* the string "xrs" */
> 
>             typedef struct {
>                 unsigned long xrs_id;
>                 caddr_t xrs_ptr;
>             } xrs_t;
> 
>             Field xrs_id must have the value XRS_ID (little endian), and 
> xrs_ptr will point to a prxregset_t data structure.
> 
>             Part of uc_filler in current ucontext_t definition will be 
> used to store xrs_t. The new definition of ucontext_t is:
> 
>             typedef struct  ucontext {
>                 unsigned long   uc_flags;
>                 ucontext_t      *uc_link;
>                 sigset_t        uc_sigmask;
>                 stack_t         uc_stack;
>                 mcontext_t      uc_mcontext;
>                 xrs_t           uc_xrs;
>                 long            uc_filler[3];
>             } ucontext_t;
> 
>             For 64-bit kernel to work with 32-bit application, the 
> following definition will be used:
> 
>             typedef struct {
>                 uint32_t xrs_id;
>                 caddr32_t xrs_ptr;
>             } xrs32_t;
> 
>             typedef struct ucontext32 {
>                 uint32_t        uc_flags;
>                 caddr32_t       uc_link;
>                 sigset32_t      uc_sigmask;
>                 stack32_t       uc_stack;
>                 mcontext32_t    uc_mcontext;
>                 xrs32_t               uc_xrs;
>                 int32_t         uc_filler[3];
>             } ucontext32_t;
> 
>             Only the kernel components that are specified in this PSARC 
> case will use the extended form of ucontext_t. The rest of kernel code 
> that uses ucontext_t, for example getcontext() calls, will remain 
> unchanged, which means flag UC_XREGS will always be cleared in uc_flags, 
> and uc_xrs will be filled with 0 (if it is a kernel created ucontext_t) in 
> these unchanged cases.
> 
>             There is an alternative way to store xrs_t in ucontext_t, 
> which is putting xrs_t into fpregset_t of mcontext_t. But there is no 
> trailing padding bytes in the amd64 definition of fpregset_t, therefore we 
> will have to use the software available bytes (defined in table "XSAVE 
> Save Area Layout for x87 FPU and SSE State" of Intel Software Developer's 
> Manual Volume 2B) and put xrs_t in the middle of fpregset_t (before 
> "status" and "xstatus"). In the i386 definition, the layout of fpregset_t 
> is different - there are no software available bytes, but there is 
> trailing space because fp_emul is larger than fpchip_state. Putting the 
> same field in different orders in the amd64 and i386 definition will make 
> it less straightforward in C declaration. Also, since XSAVE is designed to 
> be an generic mechanism capable of saving more than FPU state, putting 
> xrs_t in fpregset_t will look strange if we have non-FPU state in the 
> future. In Solaris implementation whenever we do a selective copy in of 
> fpregset_t we will need to change the code to always copy in xrs_t in 
> fpregset_t, which makes it even more confusing. So using uc_filler is the 
> better way to extend uccontext_t without changing the size of any existing 
> data structure.
> 
>             Data type prxregset_t is defined as:
> 
>             #define XR_TYPE_XSAVE  0x101
> 
>             typedef struct prxregset {
>                 uint32_t pr_type;
>                 uint32_t pr_align;
>                 uint32_t pr_xsize;
>                 uint32_t pr_pad;
>                 union {
>                     struct pr_xsave {
>                         uint16_t pr_fcw;
>                         uint16_t pr_fsw;
>                         uint16_t pr_fctw;
>                         uint16_t pr_fop;
>             #if defined(__amd64)
>                         uint64_t pr_rip;
>                         uint64_t pr_rdp;
>             #else
>                         uint32_t pr_eip;
>                         uint16_t pr_cs;
>                         uint16_t __pr_ign0;
>                         uint32_t pr_dp;
>                         uint16_t pr_ds;
>                         uint16_t __pr_ign1;
>             #endif
>                         uint32_t pr_mxcsr;
>                         uint32_t pr_mxcsr_mask;
>                         union {
>                             uint16_t pr_fpr_16[5];
>                             u_longlong_t pr_fpr_mmx;
>                             uint32_t __pr_fpr_pad[4];
>                         } pr_st[8];
>             #if defined(__amd64)
>                         upad128_t pr_xmm[16];
>                         upad128_t __pr_ign2[3];
>             #else
>                         upad128_t pr_xmm[8];
>                         upad128_t __pr_ign2[11];
>             #endif
>                         union {
>                             struct {
>                                 uint64_t pr_xcr0;
>                                 uint64_t pr_mbz[2];
>                             } pr_xsave_info;
>                             upad128_t __pr_pad[3];
>                         } pr_sw_avail;
>                         uint64_t pr_xstate_bv;
>                         uint64_t pr_rsv_mbz[2];
>                         uint64_t pr_reserved[5];
>             #if defined(__amd64)
>                         upad128_t pr_ymm[16];
>             #else
>                         upad128_t pr_ymm[8];
>                         upad128_t __pr_ign3[8];
>             #endif
>                     } pr_xsave;
>                 } pr_un;
>             } prxregset_t;
> 
>             Field pr_type and pr_align are derived from SPARC prxregset_t 
> definition. Field pr_type will have the value XR_TYPE_XSAVE indicating 
> that this data structure is defined as in this PSARC case. Field pr_align 
> is currently unused and should be set to 0. The value of field pr_xsize 
> will be equal to the size of the union member selected by the pr_type, in 
> this case sizeof (struct pr_xsave). pr_pad will make the layout of 
> prxregset_t identical under 32-bit and 64-bit compilers, its value is 
> ignored by the kernel and should be set to 0.
> 
>             Field pr_xsave is used to store XSAVE/AVX specific state. The 
> first 512 byte part is the same as FXSAVE layout (the same as the amd64 
> definition of fpregset_t, see also the FXSAVE instruction in Intel 
> Software Developer's Manual Volume 2A), followed by 64 byte XSAVE header 
> and 256 byte YMM state. See table "General Layout of XSAVE/XRSTOR Save 
> Area" in Intel Software Developer's Manual Volume 2B for detailed meaning 
> of each new field in XSAVE layout. Field pr_sw_avail represents the 
> software available bytes defined in table "XSAVE Save Area Layout for x87 
> FPU and SSE State" of Intel Software Developer's Manual Volume 2B, and is 
> used to store additional information. Its field pr_xcr0 contains the value 
> of XCR0 of the CPU when the state is saved, the rest of the area should be 
> set to 0.
> 
>             The YMM registers are always 256 bits in length for both 
> 32-bit and 64-bit code. The lower part (bit 127-0) of the YMM registers is 
> mapped onto the corresponding XMM registers. pr_ymm only stores the upper 
> part (bit 255-128), and the lower part is stored in pr_xmm as they used to 
> be. This is consistent with the XSAVE layout used by the CPU. All 16 YMM 
> registers are available in 64-bit code, but 32-bit code can only access 
> first 8 YMM registers.
> 
>         4.1.1.1 Future extensibility
> 
>             The definition of prxregset_t is extendable. In case of a 
> future extension, for example adding a 512 byte state XYZ into the 
> context, the definition will be:
> 
>             typedef struct prxregset {
>                 uint32_t pr_type;
>                 uint32_t pr_align;
>                 uint32_t pr_xsize;
>                 uint32_t pr_pad;
>                 union {
>                     struct pr_xsave {
>                         uint16_t pr_fcw;
>                         uint16_t pr_fsw;
>                         uint16_t pr_fctw;
>                         uint16_t pr_fop;
>             #if defined(__amd64)
>                         uint64_t pr_rip;
>                         uint64_t pr_rdp;
>             #else
>                         uint32_t pr_eip;
>                         uint16_t pr_cs;
>                         uint16_t __pr_ign0;
>                         uint32_t pr_dp;
>                         uint16_t pr_ds;
>                         uint16_t __pr_ign1;
>             #endif
>                         uint32_t pr_mxcsr;
>                         uint32_t pr_mxcsr_mask;
>                         union {
>                             uint16_t pr_fpr_16[5];
>                             u_longlong_t pr_fpr_mmx;
>                             uint32_t __pr_fpr_pad[4];
>                         } pr_st[8];
>             #if defined(__amd64)
>                         upad128_t pr_xmm[16];
>                         upad128_t __pr_ign2[3];
>             #else
>                         upad128_t pr_xmm[8];
>                         upad128_t __pr_ign2[11];
>             #endif
>                         union {
>                             struct {
>                                 uint64_t pr_xcr0;
>                                 uint64_t pr_mbz[2];
>                             } pr_xsave_info;
>                             upad128_t __pr_pad[3];
>                         } pr_sw_avail;
>                         uint64_t pr_xstate_bv;
>                         uint64_t pr_rsv_mbz[2];
>                         uint64_t pr_reserved[5];
>             #if defined(__amd64)
>                         upad128_t pr_ymm[16];
>             #else
>                         upad128_t pr_ymm[8];
>                         upad128_t __pr_ign3[8];
>             #endif
>                         uint8_t pr_xyz[512];
>                     } pr_xsave;
>                 } pr_un;
>             } prxregset_t;
> 
>             As a general rule, when extending prxregset_t as defined in 
> this PSARC case, all existing fields should be kept in the same byte 
> offset within prxregset_t, unless the value of pr_type is changed as well.
> 
>             The kernel will verify the integrity of data structure "pxr" 
> and convert an earlier version to latest version using the following 
> pseudo code:
> 
>             prxregset_t *pxr; /* Possibly earlier version of xregs */
>             prxregset_t kxr; /* Latest definition of xregs in kernel */
>             /* FXSAVE + XSAVE header + YMM */
>             size_t size_avx = 512 + 64 + 256;
>             size_t size_xyz = size_avx + 512; /* AVX + XYZ */
> 
>             if (pxr->pr_type != XR_TYPE_XSAVE) {
>                 /* pxr is invalid */
>             }
> 
>             If (pxr->pr_xsize < size_avx) {
>                 /* pxr is invalid */
>             }
> 
>             if ((pxr->pr_un.pr_xsave.pr_xstate_bv & XFEATURE_XYZ) &&
>                  pxr->pr_xsize < size_xyz) {
>                 /* pxr is invalid */
>             }
> 
>             bcopy(&pxr->pr_un.pr_xsave, &kxr.pr_un.pr_xsave, 512);
> 
>             if (pxr->pr_un.pr_xsave.pr_xstate_bv & XFEATURE_AVX) {
>                 bcopy(&pxr->pr_un.pr_xsave.pr_ymm,
>                     &kxr.pr_un.pr_xsave.pr_ymm,
>                     sizeof (kxr.pr_un.pr_xsave.pr_ymm));
>             }
> 
>             if (pxr->pr_un.pr_xsave.pr_xstate_bv & XFEATURE_XYZ) {
>                 bcopy(&pxr->pr_un.pr_xsave.pr_xyz,
>                     &kxr.pr_un.pr_xsave.pr_xyz,
>                     sizeof (kxr.pr_un.pr_xsave.pr_xyz));
>             }
> 
>             Applications are encouraged to use the value in pr_xsize to 
> work with future prxregset_t extensions. When pr_xyz is added, such 
> applications that are developed before XYZ extension can still work, for 
> example, to copy a prxregset_t structure in memory to a file by 
> calculating the number of bytes to copy using pr_xsize.
> 
>         4.1.2 Signal stack
>             An amd64 signal frame looks like this on the stack:
>             old %rsp:
>                     <128 bytes of untouched stack space>
>                     <a siginfo_t [optional]>
>                     <a prxregset_t [optional]> (added by this PSARC case)
>                     <a ucontext_t>
>                     <siginfo_t *>
>                     <signal number>
>             new %rsp:       <return address (deliberately invalid)>
> 
>             An i386 SVR4/ABI signal frame looks like this on the stack:
>             old %esp:
>                     <a siginfo32_t [optional]>
>                     <a prxregset_t [optional]> (added by this PSARC case)
>                     <a ucontext32_t>
>                     <pointer to that ucontext32_t>
>                     <pointer to that siginfo32_t>
>                     <signo>
>             new %esp:       <return address (deliberately invalid)>
> 
>             User space code will access siginfo_t and ucontext_t through 
> pointers, so the signature of signal handler is not changed. This PSARC 
> case adds a prxregset_t to the signal frame if the system supports AVX. 
> The existence of prxregset_t can be determined from the uc_flags and 
> uc_xrs of ucontext_t.
> 
>             On AVX enabled systems, this extension will appear on every 
> application that has its FPU state enabled, even if the application does 
> not use AVX or YMM registers. As the result, some additional space in the 
> signal handler stack will be used (sizeof prxregset_t, which is 848 bytes 
> for now).
> 
>         4.1.3 Getsetcontext syscall
>             Syscall "getsetcontext" (100) will be extended to deal with 
> the new ucontext_t. On AVX enabled machines, YMM is considered part of FPU 
> state. If UC_XREGS and UC_FPU are found in uc_flags, and xrs_id and 
> pr_xsize are valid, SETCONTEXT will update the LWP's FPU state using the 
> content in prxregset_t.
> 
>             GETCONTEXT will not be extended, because all YMM registers are 
> caller saved. Compiler generated code or assembly programmer should 
> restore YMM registers when the called function returns. When the 
> application tries to restore the context saved by GETCONTEXT, the 
> application will continue to execute from the next instruction after 
> setcontext() if UC_CPU is not set, or from the next instruction after 
> previous getcontext() if UC_CPU is set. In both cases, the YMM registers 
> should be restored by compiler generated code or hand written assembly. 
> Therefore it is not necessary for GETCONTEXT to return YMM content.
> 
>             As a result, the kernel code branch to process the extended 
> SETCONTEXT will only be executed when UC_XREGS is set, which happens only 
> when SETCONTEXT is called at the end of libc signal handling routine. 
> Normal calls of libc setcontext() from user application do not have 
> UC_XREGS set, and SETCONTEXT will work the same way as before.
> 
>         4.1.4 /proc
>             File /proc/<pid>/lwp/<id>/xregs will be used to support 
> read/write of extra state (XSTATE_BV and YMM for now) through the procfs 
> interface. The following functions in x86 architecture will be added:
>             PCSXREG (procfs ioctl)
> 
>             The length of /proc/<pid>/lwp/<id>/xregs will be sizeof 
> (prxregset_t) on machines that support AVX, and 0 on other machines.
> 
>             On machines that support AVX, the content of 
> /proc/<pid>/lwp/<id>/xregs will be the same as "<a prxregset_t 
> [optional]>" that is placed on signal stack.
> 
>             When using PCSXREG to set extra state, user space application 
> must provide a prxregset_t that is valid under the integrity check, and is 
> meaningful on current machine. In this PSARC case, prxregset_t will be 
> considered invalid if the values of these fields: pr_type or pr_xsize 
> fails the sanity check defined in 4.1.1.1. Trying to set an invalid 
> prxregset_t or set YMM in a system that does not support AVX will not 
> change the state of the target process. In such situations, EINVAL will be 
> returned. This is different from the behavior in SPARC implementation, 
> which does not verify the content of prxregset_t.
> 
>             The value of pr_xcr0 is informational and should not be 
> modified when application modifies state through procfs.
> 
>             The bit values in pr_xstate_bv indicate the corresponding area 
> of the FPU state that should be set (bit X = 1) or initialized (bit X = 
> 0). When bit X is set to 0, values in corresponding area will be ignored 
> and initial values will be set into FPU instead. For the meaning of each 
> bit, see the operation section of XRSTOR instruction in Intel Software 
> Developer's Manual Volume 2B.
> 
>         4.1.4.1 Future extensibility of procfs
>             Considering the general rules to extend prxregset_t in 
> 4.1.1.1, it is safe for applications that are developed today to 
> read/write xregs on a future Solaris version. For example, an application 
> which reads xregs, update ymm0 and write it back can do the following:
> 
>             prxregset_t *pxr;
>             struct pr_xsave *pxs;
>             size_t len;
>             /* FXSAVE + XSAVE header + YMM */
>             size_t size_avx = 512 + 64 + 256;
> 
>             len = get_file_size("/proc/123/lwp/1/xregs");
>             if (len < size_avx) {
>                 //The system does not have xregs extension with
>                 //AVX state. Stop.
>             }
> 
>             pxr = (prxregset_t *)malloc(len);
>             read_entire_file("/proc/123/lwp/1/xregs", pxr);
> 
>             // Sanity check.
>             if (pxr->pr_type != XR_TYPE_XSAVE) {
>                 //Not the xregs type we want, stop.
>             }
> 
>             pxs = &pxr->pr_un.pr_xsave;
> 
>             if ((pxs->pr_sw_avail.pr_xsave_info.pr_xcr0 &
>                 XFEATURE_AVX) == 0) {
>                 //This system does not have AVX. Stop.
>             }
> 
>             if (!pxs->pr_xstate_bv & XFEATURE_AVX) {
>                 //YMM is in initial state, clean and set.
>                 memset(pxs->pr_ymm,
>                     0, sizeof (pxs->pr_ymm));
>                 pxs->pr_xstate_bv |= XFEATURE_AVX;
>             }
> 
>             //Update pxs->pr_ymm[0]
>             ioctl_set_xregs("/proc/123/lwp/1/xregs", pxr);
> 
>         4.1.5 Core dump format
>             We already have prxregset_t as part of the core dump file (see 
> core(4)). On x86 systems that have xregs extension, e.g. the systems that 
> have enabled AVX extension, the core dump will include note sections with 
> prxregset_t as described in the manpage.
> 
>             To support dumping xregs using gcore(1), libproc needs to be 
> extended by adding SPARC specific APIs to x86 definition as well. Because 
> libproc is only used privately by tools such as dtrace and gcore, this 
> will not cause compatibility issues.
> 
>         4.1.6 mdb(1)
>             mdb(1) will support disassembling all the new AVX 
> instructions, as well as XSAVE, XRESTORE, XGETBV and XSETBV. Also, mdb(1) 
> will be able to process YMM values as part of the FPU state in the same 
> way as we have for XMM today. On platforms that support AVX, mdb(1) "print 
> floating point registers" commands ($x and $y) will print the %ymm value 
> for each %xmm that is printed. An example of mdb output is:
> 
>             > $x
>             _fp_hw 0x03 (80387 chip with SSE)
>             < ...omitted >
> 
>             %xmm0  0x5f4d4d585f4d4d585f4d4d585f4d4d58
>             %xmm1  0x00000000000000000000000000000000
>             %xmm2  0x00000000000000000000000000000000
>             %xmm3  0x00000000000000000000000000000000
>             %xmm4  0x00000000000000000000000000000000
>             %xmm5  0x00000000000000000000000000000000
>             %xmm6  0x00000000000000000000000000000000
>             %xmm7  0x00000000000000000000000000000000
>             %ymm0  
> 0x5f4d4d595f4d4d595f4d4d595f4d4d595f4d4d585f4d4d585f4d4d585f4d4d58
>             %ymm1  
> 0x0000000000000000000000000000000000000000000000000000000000000000
>             %ymm2  
> 0x0000000000000000000000000000000000000000000000000000000000000000
>             %ymm3  
> 0x0000000000000000000000000000000000000000000000000000000000000000
>             %ymm4  
> 0x0000000000000000000000000000000000000000000000000000000000000000
>             %ymm5  
> 0x0000000000000000000000000000000000000000000000000000000000000000
>             %ymm6  
> 0x0000000000000000000000000000000000000000000000000000000000000000
>             %ymm7  
> 0x0000000000000000000000000000000000000000000000000000000000000000
> 
>         4.1.7 Hardware Capabilities
>             Two new hardware capability bits, AV_386_XSAVE (0x10000000) 
> and AV_386_AVX (0x20000000) are added. Applications that needs to be aware 
> of XSAVE and AVX can test the hardware capabilities on the current system 
> to see if it supports these features.
> 
>         4.1.8 Linux Brand
>             Solaris 10 has a Lx Brand that supports an earlier version of 
> Linux kernel, which doesn't have AVX or XSAVE support. In Nevada, it has 
> been removed by PSARC/2010/169. So, nothing needs to be changed in Linux 
> Brand for now. However, changes will be required in the Lx Brand in the 
> future if we upgrade the Linux kernel to a version that supports AVX.
>       
>     4.2. Bug/RFE Number(s):
>         6714685 Need to support Intel Advanced Vector Extensions (AVX)
> 
>         Also the following CRs are related to this PSARC:
>         6958308 XSAVE/XRSTOR mechanism to save and restore processor state
>         6970220 Replace use of XSAVE with XSAVEOPT instruction for optimized 
> context saves
> 
>     4.3. In Scope:
>         Kernel changes necessary to support the use of AVX instructions 
> and YMM registers in user space.
> 
>     4.4. Out of Scope:
>         Debuggers and tools other than mdb(1) to manipulate YMM.
> 
>     4.5. Interfaces:
> 
>         Interface                               Stability
>         ---------                               ---------
> 
> Data structure:
>         ucontext_t / ucontext32_t               Evolving
>         xrs_t / xrs32_t                         Evolving
>         prxregset_t                             Evolving
> 
> Procfs ioctl:
>         PCSXREG                                 Evolving
> 
> User space API:
>     proc_service:
>         ps_lgetxregsize                         Evolving
>         ps_lgetxregs                            Evolving
>         ps_lsetxregs                            Evolving
>     thread_db:
>         td_thr_getxregsize                      Evolving
>         td_thr_getxregs                         Evolving
>         td_thr_setxregs                         Evolving
>     libproc:
>         Plwp_getxregs                           Evolving
>         Plwp_setxregs                           Evolving
> 
>     4.6. Doc Impact:
>         As xregs is introduced in x86 architecture as well, the following
>       manpages needs to be updated. All changed man pages can be found in
>       attachment. Modified versions of these man pages have change bars.
> 
>         ps_lgetregs(3PROC)
>         proc_service(3PROC)
>         td_thr_getgregs(3C_DB)
> 
> 
>     4.7. Admin/Config Impact:
>         N/A
> 
>     4.8. HA Impact:
>         N/A
> 
>     4.9. I18N/L10N Impact:
>         N/A
> 
>     4.10. Packaging & Delivery:
>         N/A
> 
>     4.11. Security Impact:
>         We need to prevent YMM state to be "leaked" between processes, 
> because these registers may contain sensitive information. Currently we 
> always set or initialize all FPU state (legacy FP, XMM and YMM) during 
> context switch when using XRSTOR. In signal stack handling, we keep YMM 
> values untouched if UC_XREGS is not set in ucontext_t. This is the same as 
> how we handle the rest of FPU state today if UC_FPU is not set.
> 
>     4.12. Dependencies:
>         N/A
> 
> 5. Reference Documents:
>         Intel Advanced Vector Extensions Programming Reference
>         Document #319433, www.intel.com
>         Chapter 3 System Programming Model: OS requirement to support AVX.
> 
>         Intel 64 and IA-32 Architectures Software Developer's Manual
>         Document #253667, www.intel.com
>         XSAVE layout.
> 
>         System V Application Binary Interface
>         AMD64 Architecture Processor Supplement
>         Draft Version 0.99, www.x86-64.org
>         Section 3.2: ABI requirement for YMM.
> 
> 6. Resources and Schedule:
>    6.1. Projected Availability:
>         3Q '10
> 
>    6.2. Cost of Effort:
>         The implementation and unit testing will take 1 engineer and 3 
> months. Also it will take 1 engineer and 3 months for integration and back 
> port.
> 
>    6.4. Product Approval Committee requested information:
>         6.4.1. Consolidation or Component Name: ON (OS/Net)
>         6.4.7. Target RTI Date/Release:
>                 OpenSolaris build 147
>         6.4.8. Target Code Design Review Date:
> 
>    6.5. ARC review type:
>         Fast track
> 
>    6.6. ARC Exposure: open
>        6.6.1. Rationale: Part of OpenSolaris
> 
> 7. Prototype Availability:
>    7.1. Prototype Availability:
>       Prototype currently available
> 
>    7.2. Prototype Cost:
>       2 person-weeks required to verify implementation
> 
> 
> 
> 6. Resources and Schedule
>     6.4. Steering Committee requested information
>       6.4.1. Consolidation C-team Name:
>               ON
>     6.5. ARC review type: FastTrack
>     6.6. ARC Exposure: open
_______________________________________________
opensolaris-arc mailing list
opensolaris-arc@opensolaris.org

Reply via email to