Re: Doing better than CS ioctl ?

2009-08-13 Thread Jerome Glisse
On Wed, 2009-08-12 at 15:27 +0100, Keith Whitwell wrote:
 Dave,
 
 The big problem with the (second) radeon approach of state objects was
 that we defined those objects statically  encoded them into the kernel
 interface.  That meant that when new hardware functionality was needed
 (or discovered) we had to rev the kernel interface, usually in a fairly
 ugly way.
 
 I think Jerome's approach could be a good improvement if the state
 objects it creates are defined by software at runtime, more like little
 display lists than pre-defined state atoms.  The danger again is that
 you run into cases where you need to expand objects the verifier will
 allow userspace to create, but at least in doing so you won't be
 breaking existing users of the interface.
 
 I think the key is that there should be no pre-defined format for these
 state objects, simply that they should be a sequence of legal
 commands/register writes that the kernel validates once and userspace
 can execute multiple times.
 
 Keith

My idea was to have state grouped together according to what they matter
for. Like glstate of texture or object in nvidia hw. Idea is that most
of the object can be valid only if we know all of their state. For
renderbuffer we need to know its format, size, tiling, for texture we
need to know its format, size, mipmap levels and possibly others state
and so on and so forth.

If we just take arbitrary packet from userspace we might end up in
situation hard to decipher. If one validated cs program a renderbuffer
and other state like zbuffer it might be valid but now if we
combine it with another validated cs things might be completely wrong,
this another cs might just change clipping and renderbuffer size
but not update the zbuffer so we might endup rendering to zbuffer
either too small or too big (too small is what we don't want to
do ;)).

So in the end you need to enforce a set of register onto userspace.
Userspace need to submit a cs which program at least this set to be
validated, we can have different set like renderbufferset(clipping,
scissor,colorbuffer,zbuffer registers), vertex set(vbo,...), shaders
set(shaders reg) and then you can combine different set to do the
rendering. I think splitting states matter because you often render
to some buffer but with different vbo or pixel shader or vertex shader
or primitive, so it sounds better to split states.

There i think we endup pretty much to what i proposed. Thing is,
i don't think packet format is the best to communicate with the kernel
as kernel will have to parse the buffer and this is resource consuming
not to mention that tracking states that way it bit painfull.

I think state object with structure defined per asic (r3xx, r5xx, r6xx)
are better, no parsing, clear split of each value and easy access to
check that all together they do somethings allowed and then
easy and quick for the kernel to build the packet out of this.

On the backward incompatibilities side it's not harder to expand those
states :
struct radeon_state {
u32 state_id;
u64 state_struct_ptr;
};
version 1: state_id = 0x501
struct rv515_texture {
u32 width;
u32 height;
...
};
version 2: state_id = 0x502
struct rv515_texture {
u32 width;
u32 height;
...
u32 texture_pixel_sampling_center; /* well anythings new */
};

So from user pov it could still use the 0x501 and kernel
will just ignore the end of the structure and will set
default safe value for those. If userspace space submit
a 0x502 then it's assume that it knows about new state
and kernel will take them into account.

I don't think this add more works or code than adding new packet
to a parser.


Anyway the biggest problem of any of such approach is that we
need to figure out how to allocate memory to store either validated
cs or kernel built packet on behalf of the program, we don't
want to abuse kernel memory allocation. And we can't allow userspace
to modify those object after they had been validated :)

Cheers,
Jerome


--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: Doing better than CS ioctl ?

2009-08-12 Thread Keith Whitwell

Dave,

The big problem with the (second) radeon approach of state objects was
that we defined those objects statically  encoded them into the kernel
interface.  That meant that when new hardware functionality was needed
(or discovered) we had to rev the kernel interface, usually in a fairly
ugly way.

I think Jerome's approach could be a good improvement if the state
objects it creates are defined by software at runtime, more like little
display lists than pre-defined state atoms.  The danger again is that
you run into cases where you need to expand objects the verifier will
allow userspace to create, but at least in doing so you won't be
breaking existing users of the interface.

I think the key is that there should be no pre-defined format for these
state objects, simply that they should be a sequence of legal
commands/register writes that the kernel validates once and userspace
can execute multiple times.

Keith


On Sat, 2009-08-08 at 05:43 -0700, Dave Airlie wrote:
 On Sat, Aug 8, 2009 at 7:51 AM, Jerome Glissegli...@freedesktop.org wrote:
  Investigating where time is spent in radeon/kms world when doing
  rendering leaded me to question the design of CS ioctl. As i am among
  the people behind it, i think i should give some historical background
  on the choice that were made.
 
 I think this sounds quite like the original radeon interface or maybe
 even a bit like the second one. The original one stored the registers
 in the sarea, and updated the context under the lock, and had the
 kernel emit it. The sceond one had a bunch of state objects, containing
 ranges of registers that were safe to emit.
 
 Maybe Keith Whitwell can point out why these were a good/bad idea,
 not sure if anyone else remembers that far back.
 
 Dave.
 
 
  The first motivation behind cs ioctl was to take common language
  btw userspace and kernel and btw kernel and device. Of course in
  an ideal world command submitted through cs ioctl could directly
  be forwarded to the GPU without much overhead. Thing is, the world
  we leave in isn't that good. There is 2 things the cs ioctl
  do before forwarding command:
 
  1- First it must rewrite any packet which supply an offset to GPU
  with the address the memory manager validate the buffer object
  associated to this packet. We can't get rid of this with the cs
  ioctl (we might do somethings very clever like doing a new
  microcode for the cp so that cp can rewrite packet using some
  table of validated buffer offset but i am not even sure cp
  would be powerful enough to do that).
  2- In order to provide a more advanced security than what we
  did have in the past i added a cs checker facility which is
  responsible to analyze the command stream and make sure that
  the GPU won't read or write outside the supplied buffer object
  list. DRI1 didn't offered such advanced checking. This feature
  was added with GPU sharing in mind where sensible application
  might run on the GPU and for which we might like to protect
  their memory.
 
  We can obviously avoid the second item and things would work
  but userspace would be able to abuse the GPU to access outside
  the GPU object its own (this doesn't means it will be able to
  access any system ram but rather any ram that is mapped to GPU
  which should for the time being only be pixmap, texture, vbo
  or things like that).
 
  Bottom line is that with cs ioctl we do 2 times a different
  work. In userspace we build a command stream under stable by the
  GPU and in kernel space we unencode this command stream to check
  it. Obviously this sounds wrong.
 
  That being said, CS ioctl isn't that bad, it doesn't consume much
  on benchmark i have done but i expect it might consume a more on
  older cpu or when many complex 3D apps run at the same time. So
  i am not proposing to trash it away but rather to discuss about
  a better interface we could add at latter point to slowly replace
  cs. CS is bringing today feature we needed yesterday so we should
  focus our effort on getting cs ioctl as smooth and good as possible.
 
 
  So as a pet project i have been thinking this last few days of
  what would be a better interface btw userspace and kernel and
  i come up with somethings in btw gallium state object and nvidia
  gpu object (well at least as far as i know each of this my
  design sounds close to that).
 
  Idea behind design is that whenever userspace allocate a bo,
  userspace knows about properties of the bo. If it's a texture
  userspace knows the size, the number of mipmap level, the
  border,... of the textur. If it's a vbo it's knows the layout
  the size, number of elements, ... same for rendering viewport
  it knows the size and associated properties
 
  Design 2 ioctl:
 create_object :
 supply :
 - object type id specific to asic
 - object structure associated to type
 id, fully describing the object
 

Re: Doing better than CS ioctl ?

2009-08-08 Thread Dave Airlie
On Sat, Aug 8, 2009 at 7:51 AM, Jerome Glissegli...@freedesktop.org wrote:
 Investigating where time is spent in radeon/kms world when doing
 rendering leaded me to question the design of CS ioctl. As i am among
 the people behind it, i think i should give some historical background
 on the choice that were made.

I think this sounds quite like the original radeon interface or maybe
even a bit like the second one. The original one stored the registers
in the sarea, and updated the context under the lock, and had the
kernel emit it. The sceond one had a bunch of state objects, containing
ranges of registers that were safe to emit.

Maybe Keith Whitwell can point out why these were a good/bad idea,
not sure if anyone else remembers that far back.

Dave.


 The first motivation behind cs ioctl was to take common language
 btw userspace and kernel and btw kernel and device. Of course in
 an ideal world command submitted through cs ioctl could directly
 be forwarded to the GPU without much overhead. Thing is, the world
 we leave in isn't that good. There is 2 things the cs ioctl
 do before forwarding command:

 1- First it must rewrite any packet which supply an offset to GPU
 with the address the memory manager validate the buffer object
 associated to this packet. We can't get rid of this with the cs
 ioctl (we might do somethings very clever like doing a new
 microcode for the cp so that cp can rewrite packet using some
 table of validated buffer offset but i am not even sure cp
 would be powerful enough to do that).
 2- In order to provide a more advanced security than what we
 did have in the past i added a cs checker facility which is
 responsible to analyze the command stream and make sure that
 the GPU won't read or write outside the supplied buffer object
 list. DRI1 didn't offered such advanced checking. This feature
 was added with GPU sharing in mind where sensible application
 might run on the GPU and for which we might like to protect
 their memory.

 We can obviously avoid the second item and things would work
 but userspace would be able to abuse the GPU to access outside
 the GPU object its own (this doesn't means it will be able to
 access any system ram but rather any ram that is mapped to GPU
 which should for the time being only be pixmap, texture, vbo
 or things like that).

 Bottom line is that with cs ioctl we do 2 times a different
 work. In userspace we build a command stream under stable by the
 GPU and in kernel space we unencode this command stream to check
 it. Obviously this sounds wrong.

 That being said, CS ioctl isn't that bad, it doesn't consume much
 on benchmark i have done but i expect it might consume a more on
 older cpu or when many complex 3D apps run at the same time. So
 i am not proposing to trash it away but rather to discuss about
 a better interface we could add at latter point to slowly replace
 cs. CS is bringing today feature we needed yesterday so we should
 focus our effort on getting cs ioctl as smooth and good as possible.


 So as a pet project i have been thinking this last few days of
 what would be a better interface btw userspace and kernel and
 i come up with somethings in btw gallium state object and nvidia
 gpu object (well at least as far as i know each of this my
 design sounds close to that).

 Idea behind design is that whenever userspace allocate a bo,
 userspace knows about properties of the bo. If it's a texture
 userspace knows the size, the number of mipmap level, the
 border,... of the textur. If it's a vbo it's knows the layout
 the size, number of elements, ... same for rendering viewport
 it knows the size and associated properties

 Design 2 ioctl:
        create_object :
                supply :
                        - object type id specific to asic
                        - object structure associated to type
                        id, fully describing the object
                return :
                        - object id
                processing :
                        - check that the state provided are
                        correct and check that the bo is big
                        enough for the state
                        - translate state into packet stream
                        - store the object and packet stream
                         associated object id
        batchs :
                supply :
                        - table of batch
                process :
                        - check each batch and schedule them

 Each batch is a set of object id and userspace need to provide
 all object id for the batch to be valid. For instance if shader
 object id needs 5 texture, batch needs to have 5 texture object
 id supplied.

 Checking that a batch is valid is quick as it's a set of
 already checked object. You create object just after creating
 the bo (if it's a pixmap you can create a texture and viewport
 just after and whenever you want to use this pixmap just use
 the proper object id). This means 

Doing better than CS ioctl ?

2009-08-07 Thread Jerome Glisse
Investigating where time is spent in radeon/kms world when doing
rendering leaded me to question the design of CS ioctl. As i am among
the people behind it, i think i should give some historical background
on the choice that were made.

The first motivation behind cs ioctl was to take common language
btw userspace and kernel and btw kernel and device. Of course in
an ideal world command submitted through cs ioctl could directly
be forwarded to the GPU without much overhead. Thing is, the world
we leave in isn't that good. There is 2 things the cs ioctl
do before forwarding command:

1- First it must rewrite any packet which supply an offset to GPU
with the address the memory manager validate the buffer object
associated to this packet. We can't get rid of this with the cs
ioctl (we might do somethings very clever like doing a new
microcode for the cp so that cp can rewrite packet using some
table of validated buffer offset but i am not even sure cp
would be powerful enough to do that).
2- In order to provide a more advanced security than what we
did have in the past i added a cs checker facility which is
responsible to analyze the command stream and make sure that
the GPU won't read or write outside the supplied buffer object
list. DRI1 didn't offered such advanced checking. This feature
was added with GPU sharing in mind where sensible application
might run on the GPU and for which we might like to protect
their memory.

We can obviously avoid the second item and things would work
but userspace would be able to abuse the GPU to access outside
the GPU object its own (this doesn't means it will be able to
access any system ram but rather any ram that is mapped to GPU
which should for the time being only be pixmap, texture, vbo
or things like that).

Bottom line is that with cs ioctl we do 2 times a different
work. In userspace we build a command stream under stable by the
GPU and in kernel space we unencode this command stream to check
it. Obviously this sounds wrong.

That being said, CS ioctl isn't that bad, it doesn't consume much
on benchmark i have done but i expect it might consume a more on
older cpu or when many complex 3D apps run at the same time. So
i am not proposing to trash it away but rather to discuss about
a better interface we could add at latter point to slowly replace
cs. CS is bringing today feature we needed yesterday so we should
focus our effort on getting cs ioctl as smooth and good as possible.


So as a pet project i have been thinking this last few days of
what would be a better interface btw userspace and kernel and
i come up with somethings in btw gallium state object and nvidia
gpu object (well at least as far as i know each of this my
design sounds close to that).

Idea behind design is that whenever userspace allocate a bo,
userspace knows about properties of the bo. If it's a texture
userspace knows the size, the number of mipmap level, the
border,... of the textur. If it's a vbo it's knows the layout
the size, number of elements, ... same for rendering viewport
it knows the size and associated properties

Design 2 ioctl:
create_object :
supply :
- object type id specific to asic
- object structure associated to type
id, fully describing the object
return :
- object id
processing :
- check that the state provided are
correct and check that the bo is big
enough for the state
- translate state into packet stream
- store the object and packet stream
 associated object id
batchs :
supply :
- table of batch
process :
- check each batch and schedule them

Each batch is a set of object id and userspace need to provide
all object id for the batch to be valid. For instance if shader
object id needs 5 texture, batch needs to have 5 texture object
id supplied.

Checking that a batch is valid is quick as it's a set of
already checked object. You create object just after creating
the bo (if it's a pixmap you can create a texture and viewport
just after and whenever you want to use this pixmap just use
the proper object id). This means that for object which are
used multiple times you do object properties checking once and
then takes advantage of quick reuse.

Example of what object looks like is at:
http://people.freedesktop.org/~glisse/rv515obj.h

So what we win is fast checking, better knowledge in the kernel
of a use of a bo, all this allow to add many optimization :
- simple state remission optimization (don't remit state
of an object if the object state are already set in the
GPU)
- clever flushing if a bo is only associated to texture
object than kernel knows that