Re: [Dwarf-Discuss] modeling different address spaces

2020-07-20 Thread Michael Eager via Dwarf-Discuss

On 7/20/20 8:24 AM, Metzger, Markus T wrote:

Hello Michael,


We'd also want an unbounded piece operator to describe partially registerized
unbounded arrays, but I have not worked that out in detail, yet, and we're a bit
farther away from an implementation.


Can you describe this more?


Consider a large array kept in memory and a for loop iterating over the array.  
If that
loop gets vectorized, compilers would load a portion of the array into 
registers at the
beginning of the loop body, operate on the registers, and write them back at 
the end
of the loop body.

The entire array can be split into three pieces:
- elements that have already been processed: in memory
- elements that are currently being processed: in registers
- elements that will be processed in future iterations: in memory

For unbounded arrays, the size of the third piece is not known.


When would you need to know the third piece?

How is this different from a non-vector processor doing an optimized 
string operation, loading 4 characters into a register at a time?  If 
the string is nul-terminated, the string length might be unknown.


--
Michael Eager
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] modeling different address spaces

2020-07-20 Thread Metzger, Markus T via Dwarf-Discuss
Hello Michael,

> > I have a write-up ready but wanted to wait until we have a public
> implementation.
> > Is that the right order?  Or would you rather want to review proposals right
> away.
> 
> I'm not sure what a SIMD lane is.  There are a number of architectures
> which support SIMD, such as the AVX extension in x86.  We try to
> describe functionality in generic terms as much as possible so that it
> can be used with a variety of architectures.
> 
> We'd be happy to see your proposal.  Often an implementation is a good
> proof of concept, but getting feedback on a design early in the process
> can be a guide to that implementation and avoids changes later.

I tried submitting the proposal via the public comment function but I'm not sure
whether I succeeded.  When I clicked on "Submit Comment" nothing happened.
I have not filled out the Section and Page fields as the proposal covers 
multiple
sections on multiple pages.  I did not get any error.  I'm attaching the 
proposal.

The proposal covers SIMD in general.  I have patches for GDB with a test case
using AVX.


> > We'd also want an unbounded piece operator to describe partially 
> > registerized
> > unbounded arrays, but I have not worked that out in detail, yet, and we're 
> > a bit
> > farther away from an implementation.
> 
> Can you describe this more?

Consider a large array kept in memory and a for loop iterating over the array.  
If that
loop gets vectorized, compilers would load a portion of the array into 
registers at the
beginning of the loop body, operate on the registers, and write them back at 
the end
of the loop body.

The entire array can be split into three pieces:
- elements that have already been processed: in memory
- elements that are currently being processed: in registers
- elements that will be processed in future iterations: in memory

For unbounded arrays, the size of the third piece is not known.

Regards,
Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
Implicitly vectorized code executes multiple instances of a source code loop or
of a source code kernel function simultaneously in a single sequence of
instructions operating on a vector of data elements (cf. SIMD: Single
Instruction Multiple Data).

The size of this vector is typically referred to as SIMD width or SIMD size.
Individual elements and their control flow are typically referred to as SIMD
lanes or SIMD channels.

The user has written the source code from the point of view of a single SIMD
lane.  The code was later vectorized by the compiler to execute multiple SIMD
lanes simultaneously.

Although SIMD code typically works on large vectors or matrices, the compiler is
free to temporarily reorganize the data, e.g. by registerizing some vector
elements, while leaving the rest of the vector in memory or by gathering a
particular structure field of a vector of structures in a register.

Further, the assignment of loop iterations or work items to SIMD lanes may be
done dynamically.

We thus cannot infer the relative location of data objects in SIMD code.

To be able to describe this, we propose the following DWARF extension to
describe the location of a variable as function of the SIMD lane.

===

Section 2.2, pg. 17-22.

Add the following entry to Table 2.2:

  ---
  DW_AT_simd_widthSIMD width of subroutine or
  lexical block
  ---


Section 3.3.5, pg. 79-80.

Add

A subprogram or inlined subroutine may have a `DW_AT_simd_width` attribute
whose value is the SIMD width of the code it contains.  A value of zero
means that the subroutine does not contain SIMD code.

If the attribute is not present, the SIMD width is inherited from the parent
DIE.

The SIMD width may be overwritten for nested subroutines or for lexical
blocks contained within that subroutine.


Section 3.5, pg. 92.

Add

A lexical block that contains SIMD code may have a `DW_AT_simd_width`
attribute whose value is the SIMD width of the code it contains.  A value of
zero means that the lexical block does not contain SIMD code.  This can be
used to mark non-SIMD blocks inside a SIMD subroutine.

If the attribute is not present, the SIMD width is inherited from the parent
DIE.

The SIMD width may be overwritten for lexical blocks nested within that
block.


Section 2.5.1.3, pg. 29-33.

Add

16. DW_OP_push_simd_lane

The DW_OP_push_simd_lane operation pushes the SIMD lane for which the
expression shall be evaluated.

The operation is only valid in the context of a lexical block for which
the SIMD width is known (see DW_AT_simd_width).

 

Re: [Dwarf-Discuss] modeling different address spaces

2020-07-20 Thread Michael Eager via Dwarf-Discuss

On 7/20/20 1:43 AM, Metzger, Markus T via Dwarf-Discuss wrote:


I also have a small proposal for describing locations as function of the SIMD 
lane by
adding a DW_OP_push_simd_lane operator and introducing stack variants for piece
operators.
  
I have a write-up ready but wanted to wait until we have a public implementation.

Is that the right order?  Or would you rather want to review proposals right 
away.


I'm not sure what a SIMD lane is.  There are a number of architectures 
which support SIMD, such as the AVX extension in x86.  We try to 
describe functionality in generic terms as much as possible so that it 
can be used with a variety of architectures.


We'd be happy to see your proposal.  Often an implementation is a good 
proof of concept, but getting feedback on a design early in the process 
can be a guide to that implementation and avoids changes later.



We'd also want an unbounded piece operator to describe partially registerized
unbounded arrays, but I have not worked that out in detail, yet, and we're a bit
farther away from an implementation.


Can you describe this more?

--
Michael Eager
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org


Re: [Dwarf-Discuss] modeling different address spaces

2020-07-20 Thread Metzger, Markus T via Dwarf-Discuss
Hello Michael,

> https://llvm.org/docs/AMDGPUDwarfProposalForHeterogeneousDebugging.html
> >
> > AFAIK, these changes will be made to LLVM and there is interest in adding to
> the DWARF standard eventually.
> 
> As mentioned in the past, I would be pleased to see proposals submitted
> for these changes.

I also have a small proposal for describing locations as function of the SIMD 
lane by
adding a DW_OP_push_simd_lane operator and introducing stack variants for piece
operators.
 
I have a write-up ready but wanted to wait until we have a public 
implementation.
Is that the right order?  Or would you rather want to review proposals right 
away.

We'd also want an unbounded piece operator to describe partially registerized
unbounded arrays, but I have not worked that out in detail, yet, and we're a bit
farther away from an implementation.

Regards,
Markus.

Intel Deutschland GmbH
Registered Address: Am Campeon 10-12, 85579 Neubiberg, Germany
Tel: +49 89 99 8853-0, www.intel.de
Managing Directors: Christin Eisenschmid, Gary Kershaw
Chairperson of the Supervisory Board: Nicole Lau
Registered Office: Munich
Commercial Register: Amtsgericht Muenchen HRB 186928
___
Dwarf-Discuss mailing list
Dwarf-Discuss@lists.dwarfstd.org
http://lists.dwarfstd.org/listinfo.cgi/dwarf-discuss-dwarfstd.org