Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Packard
On Fri, 2007-05-04 at 16:57 +0100, Keith Whitwell wrote:

> That's a special case of a the general problem of what do you do when a 
> client submits any validation list that can't be satisfied.  Failing to 
> render isn't really an option, either the client or the memory manager 
> has to either prevent it happening in the first place or have some 
> mechanism for chopping up the dma buffer into segments which are 
> satisfiable...  Neither of which I can see an absolutely reliable way to 
> do.

I think we must return an error from the kernel and let user mode sort
it out; potentially by breaking up the operation into smaller pieces, or
(ick), simply falling back to software. Eliminating per-submit flushing
will even make this reasonably efficient as we remap the GTT as objects
are used. I don't think we want to support automatic partitioning of the
operation in the kernel; punting that step to user mode seems like a
sensible option.

Certainly presenting all of the objects to the kernel atomically will
permit it to succeed if the device can possibly perform the operation;
ejecting all existing objects and reloading with precisely the objects
proposed by the application can be done, and is even inexpensive on UMA
hardware.

> The way to get around this is to mandate that hardware supports paged 
> virtual memory...  But that seems to be a difficult trick.

Yeah, especially as we don't currently have any examples in our
environment.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Whitwell

Keith Packard wrote:

OTOH, letting DRM resolve the deadlock by unmapping and remapping shared 
buffers in the correct order might not be the best one either. It will 
certainly mean some CPU overhead and what if we have to do the same with 
buffer validation? (Yes for some operations with thousands and thousands 
of relocations, the user space validation might need to stay).


I do not want to do relocations in user space. I don't see why doing
thousands of these requires moving this operation out of the kernel.


Agreed.  The original conception for this was to have valdiation plus 
relocations be a single operation, and by implication in the kernel. 
Although the code as it stands doesn't do this, I think that should 
still be the approach.


The issue with thousands of relocations from my point of view isn't a 
problem - that's just a matter of getting appropriate data structures in 
place.


Where things get a bit more interesting is with hardware where you are 
required to submit a whole scene's worth of rendering before the 
hardware will kick off, and with the expectation that the texture 
placement will remain unchanged throughout the scene.  This is a very 
easy way to hit any upper limit on texture memory - the agp aperture 
size in the case of integrated chipsets.


That's a special case of a the general problem of what do you do when a 
client submits any validation list that can't be satisfied.  Failing to 
render isn't really an option, either the client or the memory manager 
has to either prevent it happening in the first place or have some 
mechanism for chopping up the dma buffer into segments which are 
satisfiable...  Neither of which I can see an absolutely reliable way to 
do.


I think that any memory manager we can propose will have flaws of some 
sort - either it is prone to failures that aren't really allowed by the 
API, is excessively complex or somewhat pessimistic.  We've chosen a 
design that is simple, optimistic, but can potentially say "no" 
unexpectedly.  It would then be up to the client to somehow pick up the 
pieces & potentially submit a smaller list.  So far we just haven't 
touched on how that might work.


The way to get around this is to mandate that hardware supports paged 
virtual memory...  But that seems to be a difficult trick.


Keith
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Packard
On Fri, 2007-05-04 at 14:32 +0200, Thomas Hellström wrote:
>  If there isn't we can at least consider other 
> alternatives that resolve the deadlock issue but that also will help 
> clients synchronize and keep data coherent.

If clients want coherence, they're welcome to implement their own
locking. Let's make sure we separate the semantics required for GPU
operation from semantics required by DRM users.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Packard
On Fri, 2007-05-04 at 11:40 +0200, Jerome Glisse wrote:

> On a side note i think this scheme also fit well with gpu having
> several context and which doesn't need big validation (read
> nv gpu).

Yeah, I want to make sure we have a simple model that supports
multi-context hardware while also avoiding failing over badly when we
have more users than hardware contexts.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Packard
On Fri, 2007-05-04 at 10:07 +0200, Thomas Hellström wrote:
>  
> It's rare to have two clients access the same buffer at the same time. 
> In what situation will this occur?

Right, what I'm trying to avoid is having any contention for
applications *not* sharing the same objects. 

If there is any locking for mapping, we can either attempt to define a
locking order, or we can have a single global lock. The former leaves us
prone to deadlocks, the latter eliminates the ability for un-contended
parallel access.

> * It will encourage different DRI clients to simultaneously access
>   the same buffer.

Sure. Separate 'DRI' from 'GL' and this may be a sensible plan. If you
want to prevent this *that's not DRI's problem*.

> * Inter-client and GPU data coherence can be guaranteed if we issue
>   a mb() / write-combining flush with the unmap operation (which,
>   BTW, I'm not sure is done today). Otherwise it is up to the
>   clients, and very easy to forget.

CPU-GPU coherence is ensured by the mutual exclusion between mapping and
submitting. You may either have data available to the CPU or to the GPU.
I think that's a basic requirement for any solution in this space.
Keying the submit and map as to whether writing will occur means that
appropriate flushing and fencing can be automatically applied within the
kernel.

> OTOH, letting DRM resolve the deadlock by unmapping and remapping shared 
> buffers in the correct order might not be the best one either. It will 
> certainly mean some CPU overhead and what if we have to do the same with 
> buffer validation? (Yes for some operations with thousands and thousands 
> of relocations, the user space validation might need to stay).

I do not want to do relocations in user space. I don't see why doing
thousands of these requires moving this operation out of the kernel.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Jerome Glisse

On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote:

I was actually referring to an example where two clients need to have a
buffer mapped and access it at exactly the same time.
If there is such a situation, we have no other choice than to drop the
buffer locking on map. If there isn't we can at least consider other
alternatives that resolve the deadlock issue but that also will help
clients synchronize and keep data coherent.

/Thomas


One might be a texture where a portion is updated by one thread
and another portion update by another one, i believe the application
will know better than us if such concurrent access will conflict or not.
If this both thread access different pixel it make sense to let them
work together at the same time on the texture. If they are writing
to same pixel then they will have to sync between them so they
don't do somethings stupid.

My point is that the user space will know better if sync is needed
or not and how to sync access to the same buffer. Moreover we
can still add a locking mechanism in user space (in libdrm for
instance).

There is very likely others use case for such concurrent access which
i can't think off right now.

best,
Jerome Glisse
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Thomas Hellström

Jerome Glisse wrote:

On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote:

Jerome Glisse wrote:
> On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote:
>> Keith Packard wrote:
>> > On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:
>> >
>> >
>> >> It might be possible to find schemes that work around this. One 
way

>> >> could possibly be to have a buffer mapping -and validate order for
>> >> shared buffers.
>> >>
>> >
>> > If mapping never blocks on anything other than the fence, then 
there

>> > isn't any dead lock possibility. What this says is that ordering of
>> > rendering between clients is *not DRMs problem*. I think that's 
a good
>> > solution though; I want to let multiple apps work on DRM-able 
memory

>> > with their own CPU without contention.
>> >
>> > I don't recall if Eric layed out the proposed rules, but:
>> >
>> >  1) Map never blocks on map. Clients interested in dealing with 
this

>> > are on their own.
>> >
>> >  2) Submit blocks on map. You must unmap all buffers before 
submitting
>> > them. Doing the relocations in the kernel makes this all 
possible.

>> >
>> >  3) Map blocks on the fence from submit. We can play with 
pending the
>> > flush until the app asks for the buffer back, or we can play 
with
>> > figuring out when flushes are useful automatically. Doesn't 
matter

>> > if the policy is in the kernel.
>> >
>> > I'm interested in making deadlock avoidence trivial and eliminating
>> any
>> > map-map contention.
>> >
>> >
>> It's rare to have two clients access the same buffer at the same 
time.

>> In what situation will this occur?
>>
>> If we think of map / unmap and validation / fence  as taking a buffer
>> mutex either for the CPU or for the GPU, that's the way 
implementation

>> is done today. The CPU side of the mutex should IIRC be per-client
>> recursive. OTOH, the TTM implementation won't stop the CPU from
>> accessing the buffer when it is unmapped, but then you're on your 
own.

>> "Mutexes" need to be taken in the correct order, otherwise a deadlock
>> will occur, and GL will, as outlined in Eric's illustration, more or
>> less encourage us to take buffers in the "incorrect" order.
>>
>> In essence what you propose is to eliminate the deadlock problem 
by just

>> avoiding taking the buffer mutex unless we know the GPU has it. I see
>> two problems with this:
>>
>> * It will encourage different DRI clients to simultaneously 
access

>>   the same buffer.
>> * Inter-client and GPU data coherence can be guaranteed if we 
issue

>>   a mb() / write-combining flush with the unmap operation (which,
>>   BTW, I'm not sure is done today). Otherwise it is up to the
>>   clients, and very easy to forget.
>>
>> I'm a bit afraid we might in the future regret taking the easy way 
out?

>>
>> OTOH, letting DRM resolve the deadlock by unmapping and remapping 
shared
>> buffers in the correct order might not be the best one either. It 
will
>> certainly mean some CPU overhead and what if we have to do the 
same with
>> buffer validation? (Yes for some operations with thousands and 
thousands

>> of relocations, the user space validation might need to stay).
>>
>> Personally, I'm slightly biased towards having DRM resolve the 
deadlock,
>> but I think any solution will do as long as the implications and 
why we

>> choose a certain solution are totally clear.
>>
>> For item 3) above the kernel must have a way to issue a flush when
>> needed for buffer eviction.
>> The current implementation also requires the buffer to be completely
>> flushed before mapping.
>> Other than that the flushing policy is currently completely up to the
>> DRM drivers.
>>
>> /Thomas
>
> I might say stupid things as i don't think i fully understand all
> the input to this problem. Anyway here is my thought on all this:
>
> 1) First client map never block (as in Keith layout) except on
>fence from drm side (point 3 in Keith layout)
>
But is there really a need for this except to avoid the above-mentioned
deadlock?
As I'm not too up to date with all the possibilities the servers and GL
clients may be using shared buffers,
I need some enlightenment :). Could we have an example, please?


I think the current main consumer would be compiz or any other
compositor which use TextureFromPixmap, i really think the we
might see further use of sharing graphical data among applications,
i got example here at my work of such use case even thought this
doesn't use GL at all but another indoor protocol. Another possible
case where such buffer sharing might occur is inside same application
with two or more GL context (i am ready to bet that we already have
some where example of such application).

I was actually referring to an example where two clients need to have a 
buffer mapped and access it at exactly the same time.
If there is such a situation, we have no other choice than to drop the 
buffer locking on map. If there isn't we can at least 

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Jerome Glisse

On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote:

Jerome Glisse wrote:
> On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote:
>> Keith Packard wrote:
>> > On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:
>> >
>> >
>> >> It might be possible to find schemes that work around this. One way
>> >> could possibly be to have a buffer mapping -and validate order for
>> >> shared buffers.
>> >>
>> >
>> > If mapping never blocks on anything other than the fence, then there
>> > isn't any dead lock possibility. What this says is that ordering of
>> > rendering between clients is *not DRMs problem*. I think that's a good
>> > solution though; I want to let multiple apps work on DRM-able memory
>> > with their own CPU without contention.
>> >
>> > I don't recall if Eric layed out the proposed rules, but:
>> >
>> >  1) Map never blocks on map. Clients interested in dealing with this
>> > are on their own.
>> >
>> >  2) Submit blocks on map. You must unmap all buffers before submitting
>> > them. Doing the relocations in the kernel makes this all possible.
>> >
>> >  3) Map blocks on the fence from submit. We can play with pending the
>> > flush until the app asks for the buffer back, or we can play with
>> > figuring out when flushes are useful automatically. Doesn't matter
>> > if the policy is in the kernel.
>> >
>> > I'm interested in making deadlock avoidence trivial and eliminating
>> any
>> > map-map contention.
>> >
>> >
>> It's rare to have two clients access the same buffer at the same time.
>> In what situation will this occur?
>>
>> If we think of map / unmap and validation / fence  as taking a buffer
>> mutex either for the CPU or for the GPU, that's the way implementation
>> is done today. The CPU side of the mutex should IIRC be per-client
>> recursive. OTOH, the TTM implementation won't stop the CPU from
>> accessing the buffer when it is unmapped, but then you're on your own.
>> "Mutexes" need to be taken in the correct order, otherwise a deadlock
>> will occur, and GL will, as outlined in Eric's illustration, more or
>> less encourage us to take buffers in the "incorrect" order.
>>
>> In essence what you propose is to eliminate the deadlock problem by just
>> avoiding taking the buffer mutex unless we know the GPU has it. I see
>> two problems with this:
>>
>> * It will encourage different DRI clients to simultaneously access
>>   the same buffer.
>> * Inter-client and GPU data coherence can be guaranteed if we issue
>>   a mb() / write-combining flush with the unmap operation (which,
>>   BTW, I'm not sure is done today). Otherwise it is up to the
>>   clients, and very easy to forget.
>>
>> I'm a bit afraid we might in the future regret taking the easy way out?
>>
>> OTOH, letting DRM resolve the deadlock by unmapping and remapping shared
>> buffers in the correct order might not be the best one either. It will
>> certainly mean some CPU overhead and what if we have to do the same with
>> buffer validation? (Yes for some operations with thousands and thousands
>> of relocations, the user space validation might need to stay).
>>
>> Personally, I'm slightly biased towards having DRM resolve the deadlock,
>> but I think any solution will do as long as the implications and why we
>> choose a certain solution are totally clear.
>>
>> For item 3) above the kernel must have a way to issue a flush when
>> needed for buffer eviction.
>> The current implementation also requires the buffer to be completely
>> flushed before mapping.
>> Other than that the flushing policy is currently completely up to the
>> DRM drivers.
>>
>> /Thomas
>
> I might say stupid things as i don't think i fully understand all
> the input to this problem. Anyway here is my thought on all this:
>
> 1) First client map never block (as in Keith layout) except on
>fence from drm side (point 3 in Keith layout)
>
But is there really a need for this except to avoid the above-mentioned
deadlock?
As I'm not too up to date with all the possibilities the servers and GL
clients may be using shared buffers,
I need some enlightenment :). Could we have an example, please?


I think the current main consumer would be compiz or any other
compositor which use TextureFromPixmap, i really think the we
might see further use of sharing graphical data among applications,
i got example here at my work of such use case even thought this
doesn't use GL at all but another indoor protocol. Another possible
case where such buffer sharing might occur is inside same application
with two or more GL context (i am ready to bet that we already have
some where example of such application).


> 4) We got 2 gpu queue:
> - one with pending apps ask in which we do all stuff
> necessary
>   to be done before submitting (locking buffer,
> validation, ...)
>   for instance we might wait here for each buffer that are
> still
>   mapped by some other apps in user 

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Thomas Hellström

Jerome Glisse wrote:

On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote:

Keith Packard wrote:
> On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:
>
>
>> It might be possible to find schemes that work around this. One way
>> could possibly be to have a buffer mapping -and validate order for
>> shared buffers.
>>
>
> If mapping never blocks on anything other than the fence, then there
> isn't any dead lock possibility. What this says is that ordering of
> rendering between clients is *not DRMs problem*. I think that's a good
> solution though; I want to let multiple apps work on DRM-able memory
> with their own CPU without contention.
>
> I don't recall if Eric layed out the proposed rules, but:
>
>  1) Map never blocks on map. Clients interested in dealing with this
> are on their own.
>
>  2) Submit blocks on map. You must unmap all buffers before submitting
> them. Doing the relocations in the kernel makes this all possible.
>
>  3) Map blocks on the fence from submit. We can play with pending the
> flush until the app asks for the buffer back, or we can play with
> figuring out when flushes are useful automatically. Doesn't matter
> if the policy is in the kernel.
>
> I'm interested in making deadlock avoidence trivial and eliminating 
any

> map-map contention.
>
>
It's rare to have two clients access the same buffer at the same time.
In what situation will this occur?

If we think of map / unmap and validation / fence  as taking a buffer
mutex either for the CPU or for the GPU, that's the way implementation
is done today. The CPU side of the mutex should IIRC be per-client
recursive. OTOH, the TTM implementation won't stop the CPU from
accessing the buffer when it is unmapped, but then you're on your own.
"Mutexes" need to be taken in the correct order, otherwise a deadlock
will occur, and GL will, as outlined in Eric's illustration, more or
less encourage us to take buffers in the "incorrect" order.

In essence what you propose is to eliminate the deadlock problem by just
avoiding taking the buffer mutex unless we know the GPU has it. I see
two problems with this:

* It will encourage different DRI clients to simultaneously access
  the same buffer.
* Inter-client and GPU data coherence can be guaranteed if we issue
  a mb() / write-combining flush with the unmap operation (which,
  BTW, I'm not sure is done today). Otherwise it is up to the
  clients, and very easy to forget.

I'm a bit afraid we might in the future regret taking the easy way out?

OTOH, letting DRM resolve the deadlock by unmapping and remapping shared
buffers in the correct order might not be the best one either. It will
certainly mean some CPU overhead and what if we have to do the same with
buffer validation? (Yes for some operations with thousands and thousands
of relocations, the user space validation might need to stay).

Personally, I'm slightly biased towards having DRM resolve the deadlock,
but I think any solution will do as long as the implications and why we
choose a certain solution are totally clear.

For item 3) above the kernel must have a way to issue a flush when
needed for buffer eviction.
The current implementation also requires the buffer to be completely
flushed before mapping.
Other than that the flushing policy is currently completely up to the
DRM drivers.

/Thomas


I might say stupid things as i don't think i fully understand all
the input to this problem. Anyway here is my thought on all this:

1) First client map never block (as in Keith layout) except on
   fence from drm side (point 3 in Keith layout)

But is there really a need for this except to avoid the above-mentioned 
deadlock?
As I'm not too up to date with all the possibilities the servers and GL 
clients may be using shared buffers,

I need some enlightenment :). Could we have an example, please?


4) We got 2 gpu queue:
- one with pending apps ask in which we do all stuff 
necessary
  to be done before submitting (locking buffer, 
validation, ...)
  for instance we might wait here for each buffer that are 
still

  mapped by some other apps in user space
- one run queue in which we add each apps ask that are now
  ready to be submited to the gpu


This is getting closer and closer to a GPU scheduler, an interesting 
topic indeed.
Perhaps we should have a separate  discussion on the  needs and 
requirements for such a thing?


Regards,
/Thomas



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Jerome Glisse

On 5/4/07, Jerome Glisse <[EMAIL PROTECTED]> wrote:

On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote:
> Keith Packard wrote:
> > On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:
> >
> >
> >> It might be possible to find schemes that work around this. One way
> >> could possibly be to have a buffer mapping -and validate order for
> >> shared buffers.
> >>
> >
> > If mapping never blocks on anything other than the fence, then there
> > isn't any dead lock possibility. What this says is that ordering of
> > rendering between clients is *not DRMs problem*. I think that's a good
> > solution though; I want to let multiple apps work on DRM-able memory
> > with their own CPU without contention.
> >
> > I don't recall if Eric layed out the proposed rules, but:
> >
> >  1) Map never blocks on map. Clients interested in dealing with this
> > are on their own.
> >
> >  2) Submit blocks on map. You must unmap all buffers before submitting
> > them. Doing the relocations in the kernel makes this all possible.
> >
> >  3) Map blocks on the fence from submit. We can play with pending the
> > flush until the app asks for the buffer back, or we can play with
> > figuring out when flushes are useful automatically. Doesn't matter
> > if the policy is in the kernel.
> >
> > I'm interested in making deadlock avoidence trivial and eliminating any
> > map-map contention.
> >
> >
> It's rare to have two clients access the same buffer at the same time.
> In what situation will this occur?
>
> If we think of map / unmap and validation / fence  as taking a buffer
> mutex either for the CPU or for the GPU, that's the way implementation
> is done today. The CPU side of the mutex should IIRC be per-client
> recursive. OTOH, the TTM implementation won't stop the CPU from
> accessing the buffer when it is unmapped, but then you're on your own.
> "Mutexes" need to be taken in the correct order, otherwise a deadlock
> will occur, and GL will, as outlined in Eric's illustration, more or
> less encourage us to take buffers in the "incorrect" order.
>
> In essence what you propose is to eliminate the deadlock problem by just
> avoiding taking the buffer mutex unless we know the GPU has it. I see
> two problems with this:
>
> * It will encourage different DRI clients to simultaneously access
>   the same buffer.
> * Inter-client and GPU data coherence can be guaranteed if we issue
>   a mb() / write-combining flush with the unmap operation (which,
>   BTW, I'm not sure is done today). Otherwise it is up to the
>   clients, and very easy to forget.
>
> I'm a bit afraid we might in the future regret taking the easy way out?
>
> OTOH, letting DRM resolve the deadlock by unmapping and remapping shared
> buffers in the correct order might not be the best one either. It will
> certainly mean some CPU overhead and what if we have to do the same with
> buffer validation? (Yes for some operations with thousands and thousands
> of relocations, the user space validation might need to stay).
>
> Personally, I'm slightly biased towards having DRM resolve the deadlock,
> but I think any solution will do as long as the implications and why we
> choose a certain solution are totally clear.
>
> For item 3) above the kernel must have a way to issue a flush when
> needed for buffer eviction.
> The current implementation also requires the buffer to be completely
> flushed before mapping.
> Other than that the flushing policy is currently completely up to the
> DRM drivers.
>
> /Thomas

I might say stupid things as i don't think i fully understand all
the input to this problem. Anyway here is my thought on all this:

1) First client map never block (as in Keith layout) except on
fence from drm side (point 3 in Keith layout)
2) Client should always unmap buffer before submitting (as in Keith layout)
3) In drm side you always acquire buffer lock in a give order for
instance each buffer got and id and you lock from smaller id to bigger
one (with a clever implementation the cost for that will be small)
4) We got 2 gpu queue:
 - one with pending apps ask in which we do all stuff necessary
   to be done before submitting (locking buffer, validation, ...)
   for instance we might wait here for each buffer that are still
   mapped by some other apps in user space
 - one run queue in which we add each apps ask that are now
   ready to be submited to the gpu

Of course in this scheme we keep the fencing stuff so user space can
know when it safe to use previously submited buffer again. The outcome
of having two seperate queue in drm is that if two apps lockup each other
other apps can still use the gpu so only the apps fighting for a buffer will
suffer.

And for user space synchronization i believe it a user space problem i.e.
it's up to user space to add proper synch. For instance as map doesn't
block for any client in user space 

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Jerome Glisse

On 5/4/07, Thomas Hellström <[EMAIL PROTECTED]> wrote:

Keith Packard wrote:
> On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:
>
>
>> It might be possible to find schemes that work around this. One way
>> could possibly be to have a buffer mapping -and validate order for
>> shared buffers.
>>
>
> If mapping never blocks on anything other than the fence, then there
> isn't any dead lock possibility. What this says is that ordering of
> rendering between clients is *not DRMs problem*. I think that's a good
> solution though; I want to let multiple apps work on DRM-able memory
> with their own CPU without contention.
>
> I don't recall if Eric layed out the proposed rules, but:
>
>  1) Map never blocks on map. Clients interested in dealing with this
> are on their own.
>
>  2) Submit blocks on map. You must unmap all buffers before submitting
> them. Doing the relocations in the kernel makes this all possible.
>
>  3) Map blocks on the fence from submit. We can play with pending the
> flush until the app asks for the buffer back, or we can play with
> figuring out when flushes are useful automatically. Doesn't matter
> if the policy is in the kernel.
>
> I'm interested in making deadlock avoidence trivial and eliminating any
> map-map contention.
>
>
It's rare to have two clients access the same buffer at the same time.
In what situation will this occur?

If we think of map / unmap and validation / fence  as taking a buffer
mutex either for the CPU or for the GPU, that's the way implementation
is done today. The CPU side of the mutex should IIRC be per-client
recursive. OTOH, the TTM implementation won't stop the CPU from
accessing the buffer when it is unmapped, but then you're on your own.
"Mutexes" need to be taken in the correct order, otherwise a deadlock
will occur, and GL will, as outlined in Eric's illustration, more or
less encourage us to take buffers in the "incorrect" order.

In essence what you propose is to eliminate the deadlock problem by just
avoiding taking the buffer mutex unless we know the GPU has it. I see
two problems with this:

* It will encourage different DRI clients to simultaneously access
  the same buffer.
* Inter-client and GPU data coherence can be guaranteed if we issue
  a mb() / write-combining flush with the unmap operation (which,
  BTW, I'm not sure is done today). Otherwise it is up to the
  clients, and very easy to forget.

I'm a bit afraid we might in the future regret taking the easy way out?

OTOH, letting DRM resolve the deadlock by unmapping and remapping shared
buffers in the correct order might not be the best one either. It will
certainly mean some CPU overhead and what if we have to do the same with
buffer validation? (Yes for some operations with thousands and thousands
of relocations, the user space validation might need to stay).

Personally, I'm slightly biased towards having DRM resolve the deadlock,
but I think any solution will do as long as the implications and why we
choose a certain solution are totally clear.

For item 3) above the kernel must have a way to issue a flush when
needed for buffer eviction.
The current implementation also requires the buffer to be completely
flushed before mapping.
Other than that the flushing policy is currently completely up to the
DRM drivers.

/Thomas


I might say stupid things as i don't think i fully understand all
the input to this problem. Anyway here is my thought on all this:

1) First client map never block (as in Keith layout) except on
   fence from drm side (point 3 in Keith layout)
2) Client should always unmap buffer before submitting (as in Keith layout)
3) In drm side you always acquire buffer lock in a give order for
   instance each buffer got and id and you lock from smaller id to bigger
   one (with a clever implementation the cost for that will be small)
4) We got 2 gpu queue:
- one with pending apps ask in which we do all stuff necessary
  to be done before submitting (locking buffer, validation, ...)
  for instance we might wait here for each buffer that are still
  mapped by some other apps in user space
- one run queue in which we add each apps ask that are now
  ready to be submited to the gpu

Of course in this scheme we keep the fencing stuff so user space can
know when it safe to use previously submited buffer again. The outcome
of having two seperate queue in drm is that if two apps lockup each other
other apps can still use the gpu so only the apps fighting for a buffer will
suffer.

And for user space synchronization i believe it a user space problem i.e.
it's up to user space to add proper synch. For instance as map doesn't
block for any client in user space two apps can mess with same buffer
it's up to user to have a policy to exclude each other (i believe this will
be a dri or xorg problem to synch btw consumer).

I believe in this scheme you can only 

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Thomas Hellström

Keith Packard wrote:

On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:

  
It might be possible to find schemes that work around this. One way 
could possibly be to have a buffer mapping -and validate order for 
shared buffers.



If mapping never blocks on anything other than the fence, then there
isn't any dead lock possibility. What this says is that ordering of
rendering between clients is *not DRMs problem*. I think that's a good
solution though; I want to let multiple apps work on DRM-able memory
with their own CPU without contention.

I don't recall if Eric layed out the proposed rules, but:

 1) Map never blocks on map. Clients interested in dealing with this 
are on their own.


 2) Submit blocks on map. You must unmap all buffers before submitting
them. Doing the relocations in the kernel makes this all possible.

 3) Map blocks on the fence from submit. We can play with pending the
flush until the app asks for the buffer back, or we can play with
figuring out when flushes are useful automatically. Doesn't matter
if the policy is in the kernel.

I'm interested in making deadlock avoidence trivial and eliminating any
map-map contention.

  
It's rare to have two clients access the same buffer at the same time. 
In what situation will this occur?


If we think of map / unmap and validation / fence  as taking a buffer 
mutex either for the CPU or for the GPU, that's the way implementation 
is done today. The CPU side of the mutex should IIRC be per-client 
recursive. OTOH, the TTM implementation won't stop the CPU from 
accessing the buffer when it is unmapped, but then you're on your own. 
"Mutexes" need to be taken in the correct order, otherwise a deadlock 
will occur, and GL will, as outlined in Eric's illustration, more or 
less encourage us to take buffers in the "incorrect" order.


In essence what you propose is to eliminate the deadlock problem by just 
avoiding taking the buffer mutex unless we know the GPU has it. I see 
two problems with this:


   * It will encourage different DRI clients to simultaneously access
 the same buffer.
   * Inter-client and GPU data coherence can be guaranteed if we issue
 a mb() / write-combining flush with the unmap operation (which,
 BTW, I'm not sure is done today). Otherwise it is up to the
 clients, and very easy to forget.

I'm a bit afraid we might in the future regret taking the easy way out?

OTOH, letting DRM resolve the deadlock by unmapping and remapping shared 
buffers in the correct order might not be the best one either. It will 
certainly mean some CPU overhead and what if we have to do the same with 
buffer validation? (Yes for some operations with thousands and thousands 
of relocations, the user space validation might need to stay).


Personally, I'm slightly biased towards having DRM resolve the deadlock, 
but I think any solution will do as long as the implications and why we 
choose a certain solution are totally clear.


For item 3) above the kernel must have a way to issue a flush when 
needed for buffer eviction.
The current implementation also requires the buffer to be completely 
flushed before mapping.
Other than that the flushing policy is currently completely up to the 
DRM drivers.


/Thomas











-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Thomas Hellström

Keith Packard wrote:

On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:

  
It might be possible to find schemes that work around this. One way 
could possibly be to have a buffer mapping -and validate order for 
shared buffers.



If mapping never blocks on anything other than the fence, then there
isn't any dead lock possibility. What this says is that ordering of
rendering between clients is *not DRMs problem*. I think that's a good
solution though; I want to let multiple apps work on DRM-able memory
with their own CPU without contention.

I don't recall if Eric layed out the proposed rules, but:

 1) Map never blocks on map. Clients interested in dealing with this 
are on their own.


 2) Submit blocks on map. You must unmap all buffers before submitting
them. Doing the relocations in the kernel makes this all possible.

 3) Map blocks on the fence from submit. We can play with pending the
flush until the app asks for the buffer back, or we can play with
figuring out when flushes are useful automatically. Doesn't matter
if the policy is in the kernel.

I'm interested in making deadlock avoidence trivial and eliminating any
map-map contention.

  
It's rare to have two clients access the same buffer at the same time. 
In what situation will this occur?


If we think of map / unmap and validation / fence  as taking a buffer 
mutex either for the CPU or for the GPU, that's the way implementation 
is done today. The CPU side of the mutex should IIRC be per-client 
recursive. OTOH, the TTM implementation won't stop the CPU from 
accessing the buffer when it is unmapped, but then you're on your own. 
Mutexes need to be taken in the correct order, otherwise a deadlock 
will occur, and GL will, as outlined in Eric's illustration, more or 
less encourage us to take buffers in the incorrect order.


In essence what you propose is to eliminate the deadlock problem by just 
avoiding taking the buffer mutex unless we know the GPU has it. I see 
two problems with this:


   * It will encourage different DRI clients to simultaneously access
 the same buffer.
   * Inter-client and GPU data coherence can be guaranteed if we issue
 a mb() / write-combining flush with the unmap operation (which,
 BTW, I'm not sure is done today). Otherwise it is up to the
 clients, and very easy to forget.

I'm a bit afraid we might in the future regret taking the easy way out?

OTOH, letting DRM resolve the deadlock by unmapping and remapping shared 
buffers in the correct order might not be the best one either. It will 
certainly mean some CPU overhead and what if we have to do the same with 
buffer validation? (Yes for some operations with thousands and thousands 
of relocations, the user space validation might need to stay).


Personally, I'm slightly biased towards having DRM resolve the deadlock, 
but I think any solution will do as long as the implications and why we 
choose a certain solution are totally clear.


For item 3) above the kernel must have a way to issue a flush when 
needed for buffer eviction.
The current implementation also requires the buffer to be completely 
flushed before mapping.
Other than that the flushing policy is currently completely up to the 
DRM drivers.


/Thomas











-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Jerome Glisse

On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote:

Keith Packard wrote:
 On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:


 It might be possible to find schemes that work around this. One way
 could possibly be to have a buffer mapping -and validate order for
 shared buffers.


 If mapping never blocks on anything other than the fence, then there
 isn't any dead lock possibility. What this says is that ordering of
 rendering between clients is *not DRMs problem*. I think that's a good
 solution though; I want to let multiple apps work on DRM-able memory
 with their own CPU without contention.

 I don't recall if Eric layed out the proposed rules, but:

  1) Map never blocks on map. Clients interested in dealing with this
 are on their own.

  2) Submit blocks on map. You must unmap all buffers before submitting
 them. Doing the relocations in the kernel makes this all possible.

  3) Map blocks on the fence from submit. We can play with pending the
 flush until the app asks for the buffer back, or we can play with
 figuring out when flushes are useful automatically. Doesn't matter
 if the policy is in the kernel.

 I'm interested in making deadlock avoidence trivial and eliminating any
 map-map contention.


It's rare to have two clients access the same buffer at the same time.
In what situation will this occur?

If we think of map / unmap and validation / fence  as taking a buffer
mutex either for the CPU or for the GPU, that's the way implementation
is done today. The CPU side of the mutex should IIRC be per-client
recursive. OTOH, the TTM implementation won't stop the CPU from
accessing the buffer when it is unmapped, but then you're on your own.
Mutexes need to be taken in the correct order, otherwise a deadlock
will occur, and GL will, as outlined in Eric's illustration, more or
less encourage us to take buffers in the incorrect order.

In essence what you propose is to eliminate the deadlock problem by just
avoiding taking the buffer mutex unless we know the GPU has it. I see
two problems with this:

* It will encourage different DRI clients to simultaneously access
  the same buffer.
* Inter-client and GPU data coherence can be guaranteed if we issue
  a mb() / write-combining flush with the unmap operation (which,
  BTW, I'm not sure is done today). Otherwise it is up to the
  clients, and very easy to forget.

I'm a bit afraid we might in the future regret taking the easy way out?

OTOH, letting DRM resolve the deadlock by unmapping and remapping shared
buffers in the correct order might not be the best one either. It will
certainly mean some CPU overhead and what if we have to do the same with
buffer validation? (Yes for some operations with thousands and thousands
of relocations, the user space validation might need to stay).

Personally, I'm slightly biased towards having DRM resolve the deadlock,
but I think any solution will do as long as the implications and why we
choose a certain solution are totally clear.

For item 3) above the kernel must have a way to issue a flush when
needed for buffer eviction.
The current implementation also requires the buffer to be completely
flushed before mapping.
Other than that the flushing policy is currently completely up to the
DRM drivers.

/Thomas


I might say stupid things as i don't think i fully understand all
the input to this problem. Anyway here is my thought on all this:

1) First client map never block (as in Keith layout) except on
   fence from drm side (point 3 in Keith layout)
2) Client should always unmap buffer before submitting (as in Keith layout)
3) In drm side you always acquire buffer lock in a give order for
   instance each buffer got and id and you lock from smaller id to bigger
   one (with a clever implementation the cost for that will be small)
4) We got 2 gpu queue:
- one with pending apps ask in which we do all stuff necessary
  to be done before submitting (locking buffer, validation, ...)
  for instance we might wait here for each buffer that are still
  mapped by some other apps in user space
- one run queue in which we add each apps ask that are now
  ready to be submited to the gpu

Of course in this scheme we keep the fencing stuff so user space can
know when it safe to use previously submited buffer again. The outcome
of having two seperate queue in drm is that if two apps lockup each other
other apps can still use the gpu so only the apps fighting for a buffer will
suffer.

And for user space synchronization i believe it a user space problem i.e.
it's up to user space to add proper synch. For instance as map doesn't
block for any client in user space two apps can mess with same buffer
it's up to user to have a policy to exclude each other (i believe this will
be a dri or xorg problem to synch btw consumer).

I believe in this scheme you can only have deadlock btw two buggy
apps (like one 

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Jerome Glisse

On 5/4/07, Jerome Glisse [EMAIL PROTECTED] wrote:

On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote:
 Keith Packard wrote:
  On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:
 
 
  It might be possible to find schemes that work around this. One way
  could possibly be to have a buffer mapping -and validate order for
  shared buffers.
 
 
  If mapping never blocks on anything other than the fence, then there
  isn't any dead lock possibility. What this says is that ordering of
  rendering between clients is *not DRMs problem*. I think that's a good
  solution though; I want to let multiple apps work on DRM-able memory
  with their own CPU without contention.
 
  I don't recall if Eric layed out the proposed rules, but:
 
   1) Map never blocks on map. Clients interested in dealing with this
  are on their own.
 
   2) Submit blocks on map. You must unmap all buffers before submitting
  them. Doing the relocations in the kernel makes this all possible.
 
   3) Map blocks on the fence from submit. We can play with pending the
  flush until the app asks for the buffer back, or we can play with
  figuring out when flushes are useful automatically. Doesn't matter
  if the policy is in the kernel.
 
  I'm interested in making deadlock avoidence trivial and eliminating any
  map-map contention.
 
 
 It's rare to have two clients access the same buffer at the same time.
 In what situation will this occur?

 If we think of map / unmap and validation / fence  as taking a buffer
 mutex either for the CPU or for the GPU, that's the way implementation
 is done today. The CPU side of the mutex should IIRC be per-client
 recursive. OTOH, the TTM implementation won't stop the CPU from
 accessing the buffer when it is unmapped, but then you're on your own.
 Mutexes need to be taken in the correct order, otherwise a deadlock
 will occur, and GL will, as outlined in Eric's illustration, more or
 less encourage us to take buffers in the incorrect order.

 In essence what you propose is to eliminate the deadlock problem by just
 avoiding taking the buffer mutex unless we know the GPU has it. I see
 two problems with this:

 * It will encourage different DRI clients to simultaneously access
   the same buffer.
 * Inter-client and GPU data coherence can be guaranteed if we issue
   a mb() / write-combining flush with the unmap operation (which,
   BTW, I'm not sure is done today). Otherwise it is up to the
   clients, and very easy to forget.

 I'm a bit afraid we might in the future regret taking the easy way out?

 OTOH, letting DRM resolve the deadlock by unmapping and remapping shared
 buffers in the correct order might not be the best one either. It will
 certainly mean some CPU overhead and what if we have to do the same with
 buffer validation? (Yes for some operations with thousands and thousands
 of relocations, the user space validation might need to stay).

 Personally, I'm slightly biased towards having DRM resolve the deadlock,
 but I think any solution will do as long as the implications and why we
 choose a certain solution are totally clear.

 For item 3) above the kernel must have a way to issue a flush when
 needed for buffer eviction.
 The current implementation also requires the buffer to be completely
 flushed before mapping.
 Other than that the flushing policy is currently completely up to the
 DRM drivers.

 /Thomas

I might say stupid things as i don't think i fully understand all
the input to this problem. Anyway here is my thought on all this:

1) First client map never block (as in Keith layout) except on
fence from drm side (point 3 in Keith layout)
2) Client should always unmap buffer before submitting (as in Keith layout)
3) In drm side you always acquire buffer lock in a give order for
instance each buffer got and id and you lock from smaller id to bigger
one (with a clever implementation the cost for that will be small)
4) We got 2 gpu queue:
 - one with pending apps ask in which we do all stuff necessary
   to be done before submitting (locking buffer, validation, ...)
   for instance we might wait here for each buffer that are still
   mapped by some other apps in user space
 - one run queue in which we add each apps ask that are now
   ready to be submited to the gpu

Of course in this scheme we keep the fencing stuff so user space can
know when it safe to use previously submited buffer again. The outcome
of having two seperate queue in drm is that if two apps lockup each other
other apps can still use the gpu so only the apps fighting for a buffer will
suffer.

And for user space synchronization i believe it a user space problem i.e.
it's up to user space to add proper synch. For instance as map doesn't
block for any client in user space two apps can mess with same buffer
it's up to user to have a policy to exclude each other (i believe this will
be a 

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Thomas Hellström

Jerome Glisse wrote:

On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote:

Keith Packard wrote:
 On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:


 It might be possible to find schemes that work around this. One way
 could possibly be to have a buffer mapping -and validate order for
 shared buffers.


 If mapping never blocks on anything other than the fence, then there
 isn't any dead lock possibility. What this says is that ordering of
 rendering between clients is *not DRMs problem*. I think that's a good
 solution though; I want to let multiple apps work on DRM-able memory
 with their own CPU without contention.

 I don't recall if Eric layed out the proposed rules, but:

  1) Map never blocks on map. Clients interested in dealing with this
 are on their own.

  2) Submit blocks on map. You must unmap all buffers before submitting
 them. Doing the relocations in the kernel makes this all possible.

  3) Map blocks on the fence from submit. We can play with pending the
 flush until the app asks for the buffer back, or we can play with
 figuring out when flushes are useful automatically. Doesn't matter
 if the policy is in the kernel.

 I'm interested in making deadlock avoidence trivial and eliminating 
any

 map-map contention.


It's rare to have two clients access the same buffer at the same time.
In what situation will this occur?

If we think of map / unmap and validation / fence  as taking a buffer
mutex either for the CPU or for the GPU, that's the way implementation
is done today. The CPU side of the mutex should IIRC be per-client
recursive. OTOH, the TTM implementation won't stop the CPU from
accessing the buffer when it is unmapped, but then you're on your own.
Mutexes need to be taken in the correct order, otherwise a deadlock
will occur, and GL will, as outlined in Eric's illustration, more or
less encourage us to take buffers in the incorrect order.

In essence what you propose is to eliminate the deadlock problem by just
avoiding taking the buffer mutex unless we know the GPU has it. I see
two problems with this:

* It will encourage different DRI clients to simultaneously access
  the same buffer.
* Inter-client and GPU data coherence can be guaranteed if we issue
  a mb() / write-combining flush with the unmap operation (which,
  BTW, I'm not sure is done today). Otherwise it is up to the
  clients, and very easy to forget.

I'm a bit afraid we might in the future regret taking the easy way out?

OTOH, letting DRM resolve the deadlock by unmapping and remapping shared
buffers in the correct order might not be the best one either. It will
certainly mean some CPU overhead and what if we have to do the same with
buffer validation? (Yes for some operations with thousands and thousands
of relocations, the user space validation might need to stay).

Personally, I'm slightly biased towards having DRM resolve the deadlock,
but I think any solution will do as long as the implications and why we
choose a certain solution are totally clear.

For item 3) above the kernel must have a way to issue a flush when
needed for buffer eviction.
The current implementation also requires the buffer to be completely
flushed before mapping.
Other than that the flushing policy is currently completely up to the
DRM drivers.

/Thomas


I might say stupid things as i don't think i fully understand all
the input to this problem. Anyway here is my thought on all this:

1) First client map never block (as in Keith layout) except on
   fence from drm side (point 3 in Keith layout)

But is there really a need for this except to avoid the above-mentioned 
deadlock?
As I'm not too up to date with all the possibilities the servers and GL 
clients may be using shared buffers,

I need some enlightenment :). Could we have an example, please?


4) We got 2 gpu queue:
- one with pending apps ask in which we do all stuff 
necessary
  to be done before submitting (locking buffer, 
validation, ...)
  for instance we might wait here for each buffer that are 
still

  mapped by some other apps in user space
- one run queue in which we add each apps ask that are now
  ready to be submited to the gpu


This is getting closer and closer to a GPU scheduler, an interesting 
topic indeed.
Perhaps we should have a separate  discussion on the  needs and 
requirements for such a thing?


Regards,
/Thomas



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Jerome Glisse

On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote:

Jerome Glisse wrote:
 On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote:
 Keith Packard wrote:
  On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:
 
 
  It might be possible to find schemes that work around this. One way
  could possibly be to have a buffer mapping -and validate order for
  shared buffers.
 
 
  If mapping never blocks on anything other than the fence, then there
  isn't any dead lock possibility. What this says is that ordering of
  rendering between clients is *not DRMs problem*. I think that's a good
  solution though; I want to let multiple apps work on DRM-able memory
  with their own CPU without contention.
 
  I don't recall if Eric layed out the proposed rules, but:
 
   1) Map never blocks on map. Clients interested in dealing with this
  are on their own.
 
   2) Submit blocks on map. You must unmap all buffers before submitting
  them. Doing the relocations in the kernel makes this all possible.
 
   3) Map blocks on the fence from submit. We can play with pending the
  flush until the app asks for the buffer back, or we can play with
  figuring out when flushes are useful automatically. Doesn't matter
  if the policy is in the kernel.
 
  I'm interested in making deadlock avoidence trivial and eliminating
 any
  map-map contention.
 
 
 It's rare to have two clients access the same buffer at the same time.
 In what situation will this occur?

 If we think of map / unmap and validation / fence  as taking a buffer
 mutex either for the CPU or for the GPU, that's the way implementation
 is done today. The CPU side of the mutex should IIRC be per-client
 recursive. OTOH, the TTM implementation won't stop the CPU from
 accessing the buffer when it is unmapped, but then you're on your own.
 Mutexes need to be taken in the correct order, otherwise a deadlock
 will occur, and GL will, as outlined in Eric's illustration, more or
 less encourage us to take buffers in the incorrect order.

 In essence what you propose is to eliminate the deadlock problem by just
 avoiding taking the buffer mutex unless we know the GPU has it. I see
 two problems with this:

 * It will encourage different DRI clients to simultaneously access
   the same buffer.
 * Inter-client and GPU data coherence can be guaranteed if we issue
   a mb() / write-combining flush with the unmap operation (which,
   BTW, I'm not sure is done today). Otherwise it is up to the
   clients, and very easy to forget.

 I'm a bit afraid we might in the future regret taking the easy way out?

 OTOH, letting DRM resolve the deadlock by unmapping and remapping shared
 buffers in the correct order might not be the best one either. It will
 certainly mean some CPU overhead and what if we have to do the same with
 buffer validation? (Yes for some operations with thousands and thousands
 of relocations, the user space validation might need to stay).

 Personally, I'm slightly biased towards having DRM resolve the deadlock,
 but I think any solution will do as long as the implications and why we
 choose a certain solution are totally clear.

 For item 3) above the kernel must have a way to issue a flush when
 needed for buffer eviction.
 The current implementation also requires the buffer to be completely
 flushed before mapping.
 Other than that the flushing policy is currently completely up to the
 DRM drivers.

 /Thomas

 I might say stupid things as i don't think i fully understand all
 the input to this problem. Anyway here is my thought on all this:

 1) First client map never block (as in Keith layout) except on
fence from drm side (point 3 in Keith layout)

But is there really a need for this except to avoid the above-mentioned
deadlock?
As I'm not too up to date with all the possibilities the servers and GL
clients may be using shared buffers,
I need some enlightenment :). Could we have an example, please?


I think the current main consumer would be compiz or any other
compositor which use TextureFromPixmap, i really think the we
might see further use of sharing graphical data among applications,
i got example here at my work of such use case even thought this
doesn't use GL at all but another indoor protocol. Another possible
case where such buffer sharing might occur is inside same application
with two or more GL context (i am ready to bet that we already have
some where example of such application).


 4) We got 2 gpu queue:
 - one with pending apps ask in which we do all stuff
 necessary
   to be done before submitting (locking buffer,
 validation, ...)
   for instance we might wait here for each buffer that are
 still
   mapped by some other apps in user space
 - one run queue in which we add each apps ask that are now
   ready to be submited to the gpu

This is getting closer and closer to a GPU scheduler, an interesting
topic indeed.

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Thomas Hellström

Jerome Glisse wrote:

On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote:

Jerome Glisse wrote:
 On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote:
 Keith Packard wrote:
  On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:
 
 
  It might be possible to find schemes that work around this. One 
way

  could possibly be to have a buffer mapping -and validate order for
  shared buffers.
 
 
  If mapping never blocks on anything other than the fence, then 
there

  isn't any dead lock possibility. What this says is that ordering of
  rendering between clients is *not DRMs problem*. I think that's 
a good
  solution though; I want to let multiple apps work on DRM-able 
memory

  with their own CPU without contention.
 
  I don't recall if Eric layed out the proposed rules, but:
 
   1) Map never blocks on map. Clients interested in dealing with 
this

  are on their own.
 
   2) Submit blocks on map. You must unmap all buffers before 
submitting
  them. Doing the relocations in the kernel makes this all 
possible.

 
   3) Map blocks on the fence from submit. We can play with 
pending the
  flush until the app asks for the buffer back, or we can play 
with
  figuring out when flushes are useful automatically. Doesn't 
matter

  if the policy is in the kernel.
 
  I'm interested in making deadlock avoidence trivial and eliminating
 any
  map-map contention.
 
 
 It's rare to have two clients access the same buffer at the same 
time.

 In what situation will this occur?

 If we think of map / unmap and validation / fence  as taking a buffer
 mutex either for the CPU or for the GPU, that's the way 
implementation

 is done today. The CPU side of the mutex should IIRC be per-client
 recursive. OTOH, the TTM implementation won't stop the CPU from
 accessing the buffer when it is unmapped, but then you're on your 
own.

 Mutexes need to be taken in the correct order, otherwise a deadlock
 will occur, and GL will, as outlined in Eric's illustration, more or
 less encourage us to take buffers in the incorrect order.

 In essence what you propose is to eliminate the deadlock problem 
by just

 avoiding taking the buffer mutex unless we know the GPU has it. I see
 two problems with this:

 * It will encourage different DRI clients to simultaneously 
access

   the same buffer.
 * Inter-client and GPU data coherence can be guaranteed if we 
issue

   a mb() / write-combining flush with the unmap operation (which,
   BTW, I'm not sure is done today). Otherwise it is up to the
   clients, and very easy to forget.

 I'm a bit afraid we might in the future regret taking the easy way 
out?


 OTOH, letting DRM resolve the deadlock by unmapping and remapping 
shared
 buffers in the correct order might not be the best one either. It 
will
 certainly mean some CPU overhead and what if we have to do the 
same with
 buffer validation? (Yes for some operations with thousands and 
thousands

 of relocations, the user space validation might need to stay).

 Personally, I'm slightly biased towards having DRM resolve the 
deadlock,
 but I think any solution will do as long as the implications and 
why we

 choose a certain solution are totally clear.

 For item 3) above the kernel must have a way to issue a flush when
 needed for buffer eviction.
 The current implementation also requires the buffer to be completely
 flushed before mapping.
 Other than that the flushing policy is currently completely up to the
 DRM drivers.

 /Thomas

 I might say stupid things as i don't think i fully understand all
 the input to this problem. Anyway here is my thought on all this:

 1) First client map never block (as in Keith layout) except on
fence from drm side (point 3 in Keith layout)

But is there really a need for this except to avoid the above-mentioned
deadlock?
As I'm not too up to date with all the possibilities the servers and GL
clients may be using shared buffers,
I need some enlightenment :). Could we have an example, please?


I think the current main consumer would be compiz or any other
compositor which use TextureFromPixmap, i really think the we
might see further use of sharing graphical data among applications,
i got example here at my work of such use case even thought this
doesn't use GL at all but another indoor protocol. Another possible
case where such buffer sharing might occur is inside same application
with two or more GL context (i am ready to bet that we already have
some where example of such application).

I was actually referring to an example where two clients need to have a 
buffer mapped and access it at exactly the same time.
If there is such a situation, we have no other choice than to drop the 
buffer locking on map. If there isn't we can at least consider other 
alternatives that resolve the deadlock issue but that also will help 
clients synchronize and keep data coherent.


/Thomas



-
To unsubscribe from this list: send the line unsubscribe 

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Jerome Glisse

On 5/4/07, Thomas Hellström [EMAIL PROTECTED] wrote:

I was actually referring to an example where two clients need to have a
buffer mapped and access it at exactly the same time.
If there is such a situation, we have no other choice than to drop the
buffer locking on map. If there isn't we can at least consider other
alternatives that resolve the deadlock issue but that also will help
clients synchronize and keep data coherent.

/Thomas


One might be a texture where a portion is updated by one thread
and another portion update by another one, i believe the application
will know better than us if such concurrent access will conflict or not.
If this both thread access different pixel it make sense to let them
work together at the same time on the texture. If they are writing
to same pixel then they will have to sync between them so they
don't do somethings stupid.

My point is that the user space will know better if sync is needed
or not and how to sync access to the same buffer. Moreover we
can still add a locking mechanism in user space (in libdrm for
instance).

There is very likely others use case for such concurrent access which
i can't think off right now.

best,
Jerome Glisse
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Packard
On Fri, 2007-05-04 at 10:07 +0200, Thomas Hellström wrote:
  
 It's rare to have two clients access the same buffer at the same time. 
 In what situation will this occur?

Right, what I'm trying to avoid is having any contention for
applications *not* sharing the same objects. 

If there is any locking for mapping, we can either attempt to define a
locking order, or we can have a single global lock. The former leaves us
prone to deadlocks, the latter eliminates the ability for un-contended
parallel access.

 * It will encourage different DRI clients to simultaneously access
   the same buffer.

Sure. Separate 'DRI' from 'GL' and this may be a sensible plan. If you
want to prevent this *that's not DRI's problem*.

 * Inter-client and GPU data coherence can be guaranteed if we issue
   a mb() / write-combining flush with the unmap operation (which,
   BTW, I'm not sure is done today). Otherwise it is up to the
   clients, and very easy to forget.

CPU-GPU coherence is ensured by the mutual exclusion between mapping and
submitting. You may either have data available to the CPU or to the GPU.
I think that's a basic requirement for any solution in this space.
Keying the submit and map as to whether writing will occur means that
appropriate flushing and fencing can be automatically applied within the
kernel.

 OTOH, letting DRM resolve the deadlock by unmapping and remapping shared 
 buffers in the correct order might not be the best one either. It will 
 certainly mean some CPU overhead and what if we have to do the same with 
 buffer validation? (Yes for some operations with thousands and thousands 
 of relocations, the user space validation might need to stay).

I do not want to do relocations in user space. I don't see why doing
thousands of these requires moving this operation out of the kernel.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Packard
On Fri, 2007-05-04 at 11:40 +0200, Jerome Glisse wrote:

 On a side note i think this scheme also fit well with gpu having
 several context and which doesn't need big validation (read
 nv gpu).

Yeah, I want to make sure we have a simple model that supports
multi-context hardware while also avoiding failing over badly when we
have more users than hardware contexts.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Packard
On Fri, 2007-05-04 at 14:32 +0200, Thomas Hellström wrote:
  If there isn't we can at least consider other 
 alternatives that resolve the deadlock issue but that also will help 
 clients synchronize and keep data coherent.

If clients want coherence, they're welcome to implement their own
locking. Let's make sure we separate the semantics required for GPU
operation from semantics required by DRM users.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Whitwell

Keith Packard wrote:

OTOH, letting DRM resolve the deadlock by unmapping and remapping shared 
buffers in the correct order might not be the best one either. It will 
certainly mean some CPU overhead and what if we have to do the same with 
buffer validation? (Yes for some operations with thousands and thousands 
of relocations, the user space validation might need to stay).


I do not want to do relocations in user space. I don't see why doing
thousands of these requires moving this operation out of the kernel.


Agreed.  The original conception for this was to have valdiation plus 
relocations be a single operation, and by implication in the kernel. 
Although the code as it stands doesn't do this, I think that should 
still be the approach.


The issue with thousands of relocations from my point of view isn't a 
problem - that's just a matter of getting appropriate data structures in 
place.


Where things get a bit more interesting is with hardware where you are 
required to submit a whole scene's worth of rendering before the 
hardware will kick off, and with the expectation that the texture 
placement will remain unchanged throughout the scene.  This is a very 
easy way to hit any upper limit on texture memory - the agp aperture 
size in the case of integrated chipsets.


That's a special case of a the general problem of what do you do when a 
client submits any validation list that can't be satisfied.  Failing to 
render isn't really an option, either the client or the memory manager 
has to either prevent it happening in the first place or have some 
mechanism for chopping up the dma buffer into segments which are 
satisfiable...  Neither of which I can see an absolutely reliable way to 
do.


I think that any memory manager we can propose will have flaws of some 
sort - either it is prone to failures that aren't really allowed by the 
API, is excessively complex or somewhat pessimistic.  We've chosen a 
design that is simple, optimistic, but can potentially say no 
unexpectedly.  It would then be up to the client to somehow pick up the 
pieces  potentially submit a smaller list.  So far we just haven't 
touched on how that might work.


The way to get around this is to mandate that hardware supports paged 
virtual memory...  But that seems to be a difficult trick.


Keith
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-04 Thread Keith Packard
On Fri, 2007-05-04 at 16:57 +0100, Keith Whitwell wrote:

 That's a special case of a the general problem of what do you do when a 
 client submits any validation list that can't be satisfied.  Failing to 
 render isn't really an option, either the client or the memory manager 
 has to either prevent it happening in the first place or have some 
 mechanism for chopping up the dma buffer into segments which are 
 satisfiable...  Neither of which I can see an absolutely reliable way to 
 do.

I think we must return an error from the kernel and let user mode sort
it out; potentially by breaking up the operation into smaller pieces, or
(ick), simply falling back to software. Eliminating per-submit flushing
will even make this reasonably efficient as we remap the GTT as objects
are used. I don't think we want to support automatic partitioning of the
operation in the kernel; punting that step to user mode seems like a
sensible option.

Certainly presenting all of the objects to the kernel atomically will
permit it to succeed if the device can possibly perform the operation;
ejecting all existing objects and reloading with precisely the objects
proposed by the application can be done, and is even inexpensive on UMA
hardware.

 The way to get around this is to mandate that hardware supports paged 
 virtual memory...  But that seems to be a difficult trick.

Yeah, especially as we don't currently have any examples in our
environment.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-03 Thread Keith Packard
On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:

> It might be possible to find schemes that work around this. One way 
> could possibly be to have a buffer mapping -and validate order for 
> shared buffers.

If mapping never blocks on anything other than the fence, then there
isn't any dead lock possibility. What this says is that ordering of
rendering between clients is *not DRMs problem*. I think that's a good
solution though; I want to let multiple apps work on DRM-able memory
with their own CPU without contention.

I don't recall if Eric layed out the proposed rules, but:

 1) Map never blocks on map. Clients interested in dealing with this 
are on their own.

 2) Submit blocks on map. You must unmap all buffers before submitting
them. Doing the relocations in the kernel makes this all possible.

 3) Map blocks on the fence from submit. We can play with pending the
flush until the app asks for the buffer back, or we can play with
figuring out when flushes are useful automatically. Doesn't matter
if the policy is in the kernel.

I'm interested in making deadlock avoidence trivial and eliminating any
map-map contention.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-03 Thread Keith Packard
On Thu, 2007-05-03 at 01:01 +0200, Thomas Hellström wrote:

 It might be possible to find schemes that work around this. One way 
 could possibly be to have a buffer mapping -and validate order for 
 shared buffers.

If mapping never blocks on anything other than the fence, then there
isn't any dead lock possibility. What this says is that ordering of
rendering between clients is *not DRMs problem*. I think that's a good
solution though; I want to let multiple apps work on DRM-able memory
with their own CPU without contention.

I don't recall if Eric layed out the proposed rules, but:

 1) Map never blocks on map. Clients interested in dealing with this 
are on their own.

 2) Submit blocks on map. You must unmap all buffers before submitting
them. Doing the relocations in the kernel makes this all possible.

 3) Map blocks on the fence from submit. We can play with pending the
flush until the app asks for the buffer back, or we can play with
figuring out when flushes are useful automatically. Doesn't matter
if the policy is in the kernel.

I'm interested in making deadlock avoidence trivial and eliminating any
map-map contention.

-- 
[EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-02 Thread Thomas Hellström

Eric Anholt wrote:


On Thu, 2007-04-26 at 16:55 +1000, Dave Airlie wrote:
 


Hi,

The patch is too big to fit on the list and I've no idea how we could
break it down further, it just happens to be a lot of new code..

http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt

The patch header and diffstat are below,

This isn't for integration yet but we'd like an initial review by
anyone with the spare time and inclination, there is a lot stuff
relying on getting this code bet into shape and into the kernel but
any cleanups people can suggest now especially to the user interfaces
would be appreciated as once we set that stuff in stone it'll be a
pain to change... also it doesn't have any driver side code, this is
just the generic pieces. I'll post the intel 915 side code later but
there isn't that much to it..

It applies on top of my drm-2.6 git tree drm-mm branch

-

This patch brings in the TTM (Translation Table Maps) memory
management
system from Thomas Hellstrom at Tungsten Graphics.

This patch only covers the core functionality and changes to the drm
core.

The TTM memory manager enables dynamic mapping of memory objects in
and
out of the graphic card accessible memory (e.g. AGP), this implements
the AGP backend for TTM to be used by the i915 driver.
   




I've been slow responding, but we've been talking a lot on IRC and at
Intel about the TTM interface recently, and trying to come up with a
concensus between us as to what we'd like to see.

1) Multiplexed ioctls hurt
The first issue I have with this version is the userland interface.
You've got two ioctls for buffer management and once for fence
management, yet these 3 ioctls are actually just attempting to be
generic interfaces for around 25 actual functions you want to call
(except for the unimplemented ones, drm_bo_fence and drm_bo_ref_fence).
So there are quasi-generic arguments to these ioctls, where most of the
members are ignored by any given function, but it's not obvious to the
caller which ones.  There's no comments or anything as to what the
arguments to these functions are or what exactly they do.  We've got 100
generic ioctl numbers allocated and unused still, so I don't think we
should be shy about having separate ioctls for separate functions, if
this is the interface we expect to use going forward.

 

Right. This interface was in its infancy when there were only (without 
looking to deeply) three generic IOCTLS left.


this is definitely a good point and I agree completely.


2) Early microoptimizations
There's also apparently an unused way to chain these function calls in a
single ioctl call for the buffer object ioctl.  This is one of a couple
of microoptimizations at the expense of code clarity which have bothered
me while reviewing the TTM code, when I'm guessing no testing was done
to see if it was actually a bottleneck.
 

Yes. The function chaining is currently only used to validate buffer 
lists. I still believe it is
needed for that functionality, a bit depending on what we want to be 
able to change when a buffer is validated. But I can currently not see 
any other use for it in the future.



3) Fencing and flushing troubles
I'm definitely concerned by the fencing interface.  Right now, the
driver is flushing caches and fencing every buffer with EXE and its
driver-specific FLUSHED flag in dispatching command buffers.  We almost
surely don't want to be flushing for every batch buffer just in case
someone wants to do CPU reads from something.  However, with the current
mechanism, if I fence my operation with just EXE and no flush, then
somebody goes to map that buffer, they'll wait for the fence, but no
flush will be emitted.  The interface we've been imagining wouldn't have
driver-specific fence flags, but instead be only a marker of when
command execution has passed a certain point (which is what fencing is
about, anyway).  In validating buffers, you would pass whether they're
for READ or WRITE as we currently do, and you'd put it on the unfenced
read/write lists as appropriate.  Add one buffer object function for
emitting the flush, which would then determine whether the next
fence-all-unfenced call would cover just the list of unfenced reads or
the list of both unfenced reads and unfenced writes.  Then, in mapping,
check if it's on the unfenced-writes list and emit the flush and fence,
and then wait for a fence on the buffer before continuing with the
mapping.

 

Right. This functionality is actually available in the current code, 
except that we have only one unfenced list and the fence flags indicate 
what type of flushes are needed. There's even an implementation of the 
intel sync flush mechanism in the fencing code. If the batch-buffer 
flush is omitted the driver specific flush flag is not needed and the 
fence mechanism will do a sync flush whenever 

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-02 Thread Eric Anholt
On Thu, 2007-04-26 at 16:55 +1000, Dave Airlie wrote:
> Hi,
> 
> The patch is too big to fit on the list and I've no idea how we could
> break it down further, it just happens to be a lot of new code..
> 
> http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt
> 
> The patch header and diffstat are below,
> 
> This isn't for integration yet but we'd like an initial review by
> anyone with the spare time and inclination, there is a lot stuff
> relying on getting this code bet into shape and into the kernel but
> any cleanups people can suggest now especially to the user interfaces
> would be appreciated as once we set that stuff in stone it'll be a
> pain to change... also it doesn't have any driver side code, this is
> just the generic pieces. I'll post the intel 915 side code later but
> there isn't that much to it..
> 
> It applies on top of my drm-2.6 git tree drm-mm branch
> 
> -
> 
> This patch brings in the TTM (Translation Table Maps) memory
> management
> system from Thomas Hellstrom at Tungsten Graphics.
> 
> This patch only covers the core functionality and changes to the drm
> core.
> 
> The TTM memory manager enables dynamic mapping of memory objects in
> and
> out of the graphic card accessible memory (e.g. AGP), this implements
> the AGP backend for TTM to be used by the i915 driver.


I've been slow responding, but we've been talking a lot on IRC and at
Intel about the TTM interface recently, and trying to come up with a
concensus between us as to what we'd like to see.

1) Multiplexed ioctls hurt
The first issue I have with this version is the userland interface.
You've got two ioctls for buffer management and once for fence
management, yet these 3 ioctls are actually just attempting to be
generic interfaces for around 25 actual functions you want to call
(except for the unimplemented ones, drm_bo_fence and drm_bo_ref_fence).
So there are quasi-generic arguments to these ioctls, where most of the
members are ignored by any given function, but it's not obvious to the
caller which ones.  There's no comments or anything as to what the
arguments to these functions are or what exactly they do.  We've got 100
generic ioctl numbers allocated and unused still, so I don't think we
should be shy about having separate ioctls for separate functions, if
this is the interface we expect to use going forward.

2) Early microoptimizations
There's also apparently an unused way to chain these function calls in a
single ioctl call for the buffer object ioctl.  This is one of a couple
of microoptimizations at the expense of code clarity which have bothered
me while reviewing the TTM code, when I'm guessing no testing was done
to see if it was actually a bottleneck.

3) Fencing and flushing troubles
I'm definitely concerned by the fencing interface.  Right now, the
driver is flushing caches and fencing every buffer with EXE and its
driver-specific FLUSHED flag in dispatching command buffers.  We almost
surely don't want to be flushing for every batch buffer just in case
someone wants to do CPU reads from something.  However, with the current
mechanism, if I fence my operation with just EXE and no flush, then
somebody goes to map that buffer, they'll wait for the fence, but no
flush will be emitted.  The interface we've been imagining wouldn't have
driver-specific fence flags, but instead be only a marker of when
command execution has passed a certain point (which is what fencing is
about, anyway).  In validating buffers, you would pass whether they're
for READ or WRITE as we currently do, and you'd put it on the unfenced
read/write lists as appropriate.  Add one buffer object function for
emitting the flush, which would then determine whether the next
fence-all-unfenced call would cover just the list of unfenced reads or
the list of both unfenced reads and unfenced writes.  Then, in mapping,
check if it's on the unfenced-writes list and emit the flush and fence,
and then wait for a fence on the buffer before continuing with the
mapping.

4) Locking and deadlocking
Right now, the main rendering path I'm seeing is something like this:

lock_hardware() # I'm not really sure why this lock is here
map(texture)
write texture data
unmap(texture)
unlock_hardware()

map(some_buffer) # GL lets you have buffers mapped for a long time
write to some_buffer
map(some_buffer_2)
write to some_buffer_2

map(batchbuffer) # start some actual rendering
map(vertices)
write vertices
write commands
unmap(vertices)
unmap(batchbuffer)

unmap(some_buffer) # the GL client called this
unmap(some_buffer_2)

lock_hardware()
validate(batchbuffer)
validate(vertices)
validate(texture)
validate(some_buffer)
validate(some_buffer_2)
validate(backbuffer)
map_while_validated(batchbuffer)
write relocations in batchbuffer
unmap(batchbuffer)
emit(batchbuffer)
fence_unfenced()
unlock_hardware()


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-02 Thread Eric Anholt
On Thu, 2007-04-26 at 16:55 +1000, Dave Airlie wrote:
 Hi,
 
 The patch is too big to fit on the list and I've no idea how we could
 break it down further, it just happens to be a lot of new code..
 
 http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt
 
 The patch header and diffstat are below,
 
 This isn't for integration yet but we'd like an initial review by
 anyone with the spare time and inclination, there is a lot stuff
 relying on getting this code bet into shape and into the kernel but
 any cleanups people can suggest now especially to the user interfaces
 would be appreciated as once we set that stuff in stone it'll be a
 pain to change... also it doesn't have any driver side code, this is
 just the generic pieces. I'll post the intel 915 side code later but
 there isn't that much to it..
 
 It applies on top of my drm-2.6 git tree drm-mm branch
 
 -
 
 This patch brings in the TTM (Translation Table Maps) memory
 management
 system from Thomas Hellstrom at Tungsten Graphics.
 
 This patch only covers the core functionality and changes to the drm
 core.
 
 The TTM memory manager enables dynamic mapping of memory objects in
 and
 out of the graphic card accessible memory (e.g. AGP), this implements
 the AGP backend for TTM to be used by the i915 driver.


I've been slow responding, but we've been talking a lot on IRC and at
Intel about the TTM interface recently, and trying to come up with a
concensus between us as to what we'd like to see.

1) Multiplexed ioctls hurt
The first issue I have with this version is the userland interface.
You've got two ioctls for buffer management and once for fence
management, yet these 3 ioctls are actually just attempting to be
generic interfaces for around 25 actual functions you want to call
(except for the unimplemented ones, drm_bo_fence and drm_bo_ref_fence).
So there are quasi-generic arguments to these ioctls, where most of the
members are ignored by any given function, but it's not obvious to the
caller which ones.  There's no comments or anything as to what the
arguments to these functions are or what exactly they do.  We've got 100
generic ioctl numbers allocated and unused still, so I don't think we
should be shy about having separate ioctls for separate functions, if
this is the interface we expect to use going forward.

2) Early microoptimizations
There's also apparently an unused way to chain these function calls in a
single ioctl call for the buffer object ioctl.  This is one of a couple
of microoptimizations at the expense of code clarity which have bothered
me while reviewing the TTM code, when I'm guessing no testing was done
to see if it was actually a bottleneck.

3) Fencing and flushing troubles
I'm definitely concerned by the fencing interface.  Right now, the
driver is flushing caches and fencing every buffer with EXE and its
driver-specific FLUSHED flag in dispatching command buffers.  We almost
surely don't want to be flushing for every batch buffer just in case
someone wants to do CPU reads from something.  However, with the current
mechanism, if I fence my operation with just EXE and no flush, then
somebody goes to map that buffer, they'll wait for the fence, but no
flush will be emitted.  The interface we've been imagining wouldn't have
driver-specific fence flags, but instead be only a marker of when
command execution has passed a certain point (which is what fencing is
about, anyway).  In validating buffers, you would pass whether they're
for READ or WRITE as we currently do, and you'd put it on the unfenced
read/write lists as appropriate.  Add one buffer object function for
emitting the flush, which would then determine whether the next
fence-all-unfenced call would cover just the list of unfenced reads or
the list of both unfenced reads and unfenced writes.  Then, in mapping,
check if it's on the unfenced-writes list and emit the flush and fence,
and then wait for a fence on the buffer before continuing with the
mapping.

4) Locking and deadlocking
Right now, the main rendering path I'm seeing is something like this:

lock_hardware() # I'm not really sure why this lock is here
map(texture)
write texture data
unmap(texture)
unlock_hardware()

map(some_buffer) # GL lets you have buffers mapped for a long time
write to some_buffer
map(some_buffer_2)
write to some_buffer_2

map(batchbuffer) # start some actual rendering
map(vertices)
write vertices
write commands
unmap(vertices)
unmap(batchbuffer)

unmap(some_buffer) # the GL client called this
unmap(some_buffer_2)

lock_hardware()
validate(batchbuffer)
validate(vertices)
validate(texture)
validate(some_buffer)
validate(some_buffer_2)
validate(backbuffer)
map_while_validated(batchbuffer)
write relocations in batchbuffer
unmap(batchbuffer)
emit(batchbuffer)
fence_unfenced()
unlock_hardware()

There's also a fallback path:

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-02 Thread Thomas Hellström

Eric Anholt wrote:


On Thu, 2007-04-26 at 16:55 +1000, Dave Airlie wrote:
 


Hi,

The patch is too big to fit on the list and I've no idea how we could
break it down further, it just happens to be a lot of new code..

http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt

The patch header and diffstat are below,

This isn't for integration yet but we'd like an initial review by
anyone with the spare time and inclination, there is a lot stuff
relying on getting this code bet into shape and into the kernel but
any cleanups people can suggest now especially to the user interfaces
would be appreciated as once we set that stuff in stone it'll be a
pain to change... also it doesn't have any driver side code, this is
just the generic pieces. I'll post the intel 915 side code later but
there isn't that much to it..

It applies on top of my drm-2.6 git tree drm-mm branch

-

This patch brings in the TTM (Translation Table Maps) memory
management
system from Thomas Hellstrom at Tungsten Graphics.

This patch only covers the core functionality and changes to the drm
core.

The TTM memory manager enables dynamic mapping of memory objects in
and
out of the graphic card accessible memory (e.g. AGP), this implements
the AGP backend for TTM to be used by the i915 driver.
   




I've been slow responding, but we've been talking a lot on IRC and at
Intel about the TTM interface recently, and trying to come up with a
concensus between us as to what we'd like to see.

1) Multiplexed ioctls hurt
The first issue I have with this version is the userland interface.
You've got two ioctls for buffer management and once for fence
management, yet these 3 ioctls are actually just attempting to be
generic interfaces for around 25 actual functions you want to call
(except for the unimplemented ones, drm_bo_fence and drm_bo_ref_fence).
So there are quasi-generic arguments to these ioctls, where most of the
members are ignored by any given function, but it's not obvious to the
caller which ones.  There's no comments or anything as to what the
arguments to these functions are or what exactly they do.  We've got 100
generic ioctl numbers allocated and unused still, so I don't think we
should be shy about having separate ioctls for separate functions, if
this is the interface we expect to use going forward.

 

Right. This interface was in its infancy when there were only (without 
looking to deeply) three generic IOCTLS left.


this is definitely a good point and I agree completely.


2) Early microoptimizations
There's also apparently an unused way to chain these function calls in a
single ioctl call for the buffer object ioctl.  This is one of a couple
of microoptimizations at the expense of code clarity which have bothered
me while reviewing the TTM code, when I'm guessing no testing was done
to see if it was actually a bottleneck.
 

Yes. The function chaining is currently only used to validate buffer 
lists. I still believe it is
needed for that functionality, a bit depending on what we want to be 
able to change when a buffer is validated. But I can currently not see 
any other use for it in the future.



3) Fencing and flushing troubles
I'm definitely concerned by the fencing interface.  Right now, the
driver is flushing caches and fencing every buffer with EXE and its
driver-specific FLUSHED flag in dispatching command buffers.  We almost
surely don't want to be flushing for every batch buffer just in case
someone wants to do CPU reads from something.  However, with the current
mechanism, if I fence my operation with just EXE and no flush, then
somebody goes to map that buffer, they'll wait for the fence, but no
flush will be emitted.  The interface we've been imagining wouldn't have
driver-specific fence flags, but instead be only a marker of when
command execution has passed a certain point (which is what fencing is
about, anyway).  In validating buffers, you would pass whether they're
for READ or WRITE as we currently do, and you'd put it on the unfenced
read/write lists as appropriate.  Add one buffer object function for
emitting the flush, which would then determine whether the next
fence-all-unfenced call would cover just the list of unfenced reads or
the list of both unfenced reads and unfenced writes.  Then, in mapping,
check if it's on the unfenced-writes list and emit the flush and fence,
and then wait for a fence on the buffer before continuing with the
mapping.

 

Right. This functionality is actually available in the current code, 
except that we have only one unfenced list and the fence flags indicate 
what type of flushes are needed. There's even an implementation of the 
intel sync flush mechanism in the fencing code. If the batch-buffer 
flush is omitted the driver specific flush flag is not needed and the 
fence mechanism will do a sync flush whenever 

Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-01 Thread Greg KH
On Wed, May 02, 2007 at 12:36:36AM +0200, Ingo Oeser wrote:
> On Tuesday 01 May 2007, Dave Airlie wrote:
> > >   - what's with the /proc interface?  Don't add new proc code for
> > > non-process related things.  This should all go into sysfs
> > > somewhere.  And yes, I know /proc/dri/ is there today, but don't add
> > > new stuff please.
> > 
> > Well we should move all that stuff to sysfs, but we have all the
> > infrastructure for publishing this stuff under /proc/dri and adding
> > new files doesn't take a major amount, as much as I appreciate sysfs,
> > it isn't suitable for this sort of information dump, the whole one
> > value per file is quite useless to provide this sort of information
> > which is uni-directional for users to send to us for debugging without
> > have to install some special tool to join all the values into one
> > place.. and I don't think drmfs is the answer either... or maybe it
> > is
> 
> Ok, what about debugfs then? If it is just for debugging blobs -> debugfs,
> if it is crucial for operation -> sysfs and representation of one value 
> per file.

You beat me too it :)

Yes, this is _exactly_ what debugfs is for, in fact, it will probably
make your code a bit smaller to use it instead of /proc.

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-01 Thread Ingo Oeser
On Tuesday 01 May 2007, Dave Airlie wrote:
> >   - what's with the /proc interface?  Don't add new proc code for
> > non-process related things.  This should all go into sysfs
> > somewhere.  And yes, I know /proc/dri/ is there today, but don't add
> > new stuff please.
> 
> Well we should move all that stuff to sysfs, but we have all the
> infrastructure for publishing this stuff under /proc/dri and adding
> new files doesn't take a major amount, as much as I appreciate sysfs,
> it isn't suitable for this sort of information dump, the whole one
> value per file is quite useless to provide this sort of information
> which is uni-directional for users to send to us for debugging without
> have to install some special tool to join all the values into one
> place.. and I don't think drmfs is the answer either... or maybe it
> is

Ok, what about debugfs then? If it is just for debugging blobs -> debugfs,
if it is crucial for operation -> sysfs and representation of one value 
per file.


Regards

Ingo Oeser
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-01 Thread Ingo Oeser
On Tuesday 01 May 2007, Dave Airlie wrote:
- what's with the /proc interface?  Don't add new proc code for
  non-process related things.  This should all go into sysfs
  somewhere.  And yes, I know /proc/dri/ is there today, but don't add
  new stuff please.
 
 Well we should move all that stuff to sysfs, but we have all the
 infrastructure for publishing this stuff under /proc/dri and adding
 new files doesn't take a major amount, as much as I appreciate sysfs,
 it isn't suitable for this sort of information dump, the whole one
 value per file is quite useless to provide this sort of information
 which is uni-directional for users to send to us for debugging without
 have to install some special tool to join all the values into one
 place.. and I don't think drmfs is the answer either... or maybe it
 is

Ok, what about debugfs then? If it is just for debugging blobs - debugfs,
if it is crucial for operation - sysfs and representation of one value 
per file.


Regards

Ingo Oeser
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-05-01 Thread Greg KH
On Wed, May 02, 2007 at 12:36:36AM +0200, Ingo Oeser wrote:
 On Tuesday 01 May 2007, Dave Airlie wrote:
 - what's with the /proc interface?  Don't add new proc code for
   non-process related things.  This should all go into sysfs
   somewhere.  And yes, I know /proc/dri/ is there today, but don't add
   new stuff please.
  
  Well we should move all that stuff to sysfs, but we have all the
  infrastructure for publishing this stuff under /proc/dri and adding
  new files doesn't take a major amount, as much as I appreciate sysfs,
  it isn't suitable for this sort of information dump, the whole one
  value per file is quite useless to provide this sort of information
  which is uni-directional for users to send to us for debugging without
  have to install some special tool to join all the values into one
  place.. and I don't think drmfs is the answer either... or maybe it
  is
 
 Ok, what about debugfs then? If it is just for debugging blobs - debugfs,
 if it is crucial for operation - sysfs and representation of one value 
 per file.

You beat me too it :)

Yes, this is _exactly_ what debugfs is for, in fact, it will probably
make your code a bit smaller to use it instead of /proc.

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-04-30 Thread Thomas Hellström

Dave Airlie wrote:



Most likely in doxygen as that is what Mesa uses and the intersection
of developers is higher in that area, I'll take it as a task to try
and kerneldoc the drm at some stage..


  - what's with the /proc interface?  Don't add new proc code for
non-process related things.  This should all go into sysfs
somewhere.  And yes, I know /proc/dri/ is there today, but don't add
new stuff please.



Well we should move all that stuff to sysfs, but we have all the
infrastructure for publishing this stuff under /proc/dri and adding
new files doesn't take a major amount, as much as I appreciate sysfs,
it isn't suitable for this sort of information dump, the whole one
value per file is quite useless to provide this sort of information
which is uni-directional for users to send to us for debugging without
have to install some special tool to join all the values into one
place.. and I don't think drmfs is the answer either... or maybe it
is


  - struct drm_bo_arg can't use an int or unsigned, as it crosses the
userspace/kernelspace boundry, use the proper types for all values
in those types of structures (__u32, etc.)



int is defined, unsigned I'm not so sure about, the drm user space
interface is usually specified in non-system specific types so the
drm.h file is consistent across systems, so we would probably have to
use uint32_t which other people have objected to, but I'd rather use
uint32_t than unspecified types..


  - there doesn't seem to be any validity checking for the arguments
passed into this new ioctl.  Possibly that's just the way the rest
of the dri interface is, which is scary, but with the memory stuff,
you really should check things properly...



Okay this needs fixing, we do check most ioctls args, the main thing
passed in are handles and these are all looked up in the hash table,
it may not be so obvious, also most of the ioctls are probably going
to end up root or DRM master only, I'd like do an ioctl fuzzer at some
stage, I'd suspect a lot more then the dri would be oopsable with
permissions...

Thanks,
Dave.


I agree with Dave for most if not all of the above.
Typing, for example unsigned / uint32 vs u32, __u32 is very easily 
fixable once we decide on a clear way to go to keep  (if we want to 
keep) compatibility with *bsd.


For the IOCTL checking, as Dave states, most invalid data will be 
trapped in hash lookups and
checks in the buffer validation system, but probably far from all. A 
fuzzer would be a nice tool to trap the exceptions.


/Thomas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-04-30 Thread Dave Jones
On Tue, May 01, 2007 at 09:10:00AM +1000, Dave Airlie wrote:
 
 > Okay this needs fixing, we do check most ioctls args, the main thing
 > passed in are handles and these are all looked up in the hash table,
 > it may not be so obvious, also most of the ioctls are probably going
 > to end up root or DRM master only, I'd like do an ioctl fuzzer at some
 > stage, I'd suspect a lot more then the dri would be oopsable with
 > permissions...

http://www.digitaldwarf.be/products/ioctlfuzz.tar.gz
Not had chance to play with it, but I'd not be surprised if
there's any fallout.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-04-30 Thread Dave Airlie


A few easy and simple comments based on looking at this for 5 minutes:
  - drop the typedefs.  Yeah, they might be a drm thing, but we don't
need them here.


Okay I think in-kernel typedefs have to go, but we have a defined DRM
interface like it or not and I'd like to be consistent on the
userspace interface and continue to use typedefs, I don't want to have
a major inconsistency like half the interface in typedefs the other
half in non-typedefs, considering the old interface is frozen...


  - the comment style for the functions is "odd" and not in kerneldoc
form, but something else.  Either use kerneldoc or nothing, don't
invent something new please.


Most likely in doxygen as that is what Mesa uses and the intersection
of developers is higher in that area, I'll take it as a task to try
and kerneldoc the drm at some stage..


  - what's with the /proc interface?  Don't add new proc code for
non-process related things.  This should all go into sysfs
somewhere.  And yes, I know /proc/dri/ is there today, but don't add
new stuff please.


Well we should move all that stuff to sysfs, but we have all the
infrastructure for publishing this stuff under /proc/dri and adding
new files doesn't take a major amount, as much as I appreciate sysfs,
it isn't suitable for this sort of information dump, the whole one
value per file is quite useless to provide this sort of information
which is uni-directional for users to send to us for debugging without
have to install some special tool to join all the values into one
place.. and I don't think drmfs is the answer either... or maybe it
is


  - struct drm_bo_arg can't use an int or unsigned, as it crosses the
userspace/kernelspace boundry, use the proper types for all values
in those types of structures (__u32, etc.)


int is defined, unsigned I'm not so sure about, the drm user space
interface is usually specified in non-system specific types so the
drm.h file is consistent across systems, so we would probably have to
use uint32_t which other people have objected to, but I'd rather use
uint32_t than unspecified types..


  - there doesn't seem to be any validity checking for the arguments
passed into this new ioctl.  Possibly that's just the way the rest
of the dri interface is, which is scary, but with the memory stuff,
you really should check things properly...


Okay this needs fixing, we do check most ioctls args, the main thing
passed in are handles and these are all looked up in the hash table,
it may not be so obvious, also most of the ioctls are probably going
to end up root or DRM master only, I'd like do an ioctl fuzzer at some
stage, I'd suspect a lot more then the dri would be oopsable with
permissions...

Thanks,
Dave.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-04-30 Thread Dave Airlie


A few easy and simple comments based on looking at this for 5 minutes:
  - drop the typedefs.  Yeah, they might be a drm thing, but we don't
need them here.


Okay I think in-kernel typedefs have to go, but we have a defined DRM
interface like it or not and I'd like to be consistent on the
userspace interface and continue to use typedefs, I don't want to have
a major inconsistency like half the interface in typedefs the other
half in non-typedefs, considering the old interface is frozen...


  - the comment style for the functions is odd and not in kerneldoc
form, but something else.  Either use kerneldoc or nothing, don't
invent something new please.


Most likely in doxygen as that is what Mesa uses and the intersection
of developers is higher in that area, I'll take it as a task to try
and kerneldoc the drm at some stage..


  - what's with the /proc interface?  Don't add new proc code for
non-process related things.  This should all go into sysfs
somewhere.  And yes, I know /proc/dri/ is there today, but don't add
new stuff please.


Well we should move all that stuff to sysfs, but we have all the
infrastructure for publishing this stuff under /proc/dri and adding
new files doesn't take a major amount, as much as I appreciate sysfs,
it isn't suitable for this sort of information dump, the whole one
value per file is quite useless to provide this sort of information
which is uni-directional for users to send to us for debugging without
have to install some special tool to join all the values into one
place.. and I don't think drmfs is the answer either... or maybe it
is


  - struct drm_bo_arg can't use an int or unsigned, as it crosses the
userspace/kernelspace boundry, use the proper types for all values
in those types of structures (__u32, etc.)


int is defined, unsigned I'm not so sure about, the drm user space
interface is usually specified in non-system specific types so the
drm.h file is consistent across systems, so we would probably have to
use uint32_t which other people have objected to, but I'd rather use
uint32_t than unspecified types..


  - there doesn't seem to be any validity checking for the arguments
passed into this new ioctl.  Possibly that's just the way the rest
of the dri interface is, which is scary, but with the memory stuff,
you really should check things properly...


Okay this needs fixing, we do check most ioctls args, the main thing
passed in are handles and these are all looked up in the hash table,
it may not be so obvious, also most of the ioctls are probably going
to end up root or DRM master only, I'd like do an ioctl fuzzer at some
stage, I'd suspect a lot more then the dri would be oopsable with
permissions...

Thanks,
Dave.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-04-30 Thread Dave Jones
On Tue, May 01, 2007 at 09:10:00AM +1000, Dave Airlie wrote:
 
  Okay this needs fixing, we do check most ioctls args, the main thing
  passed in are handles and these are all looked up in the hash table,
  it may not be so obvious, also most of the ioctls are probably going
  to end up root or DRM master only, I'd like do an ioctl fuzzer at some
  stage, I'd suspect a lot more then the dri would be oopsable with
  permissions...

http://www.digitaldwarf.be/products/ioctlfuzz.tar.gz
Not had chance to play with it, but I'd not be surprised if
there's any fallout.

Dave

-- 
http://www.codemonkey.org.uk
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-04-30 Thread Thomas Hellström

Dave Airlie wrote:



Most likely in doxygen as that is what Mesa uses and the intersection
of developers is higher in that area, I'll take it as a task to try
and kerneldoc the drm at some stage..


  - what's with the /proc interface?  Don't add new proc code for
non-process related things.  This should all go into sysfs
somewhere.  And yes, I know /proc/dri/ is there today, but don't add
new stuff please.



Well we should move all that stuff to sysfs, but we have all the
infrastructure for publishing this stuff under /proc/dri and adding
new files doesn't take a major amount, as much as I appreciate sysfs,
it isn't suitable for this sort of information dump, the whole one
value per file is quite useless to provide this sort of information
which is uni-directional for users to send to us for debugging without
have to install some special tool to join all the values into one
place.. and I don't think drmfs is the answer either... or maybe it
is


  - struct drm_bo_arg can't use an int or unsigned, as it crosses the
userspace/kernelspace boundry, use the proper types for all values
in those types of structures (__u32, etc.)



int is defined, unsigned I'm not so sure about, the drm user space
interface is usually specified in non-system specific types so the
drm.h file is consistent across systems, so we would probably have to
use uint32_t which other people have objected to, but I'd rather use
uint32_t than unspecified types..


  - there doesn't seem to be any validity checking for the arguments
passed into this new ioctl.  Possibly that's just the way the rest
of the dri interface is, which is scary, but with the memory stuff,
you really should check things properly...



Okay this needs fixing, we do check most ioctls args, the main thing
passed in are handles and these are all looked up in the hash table,
it may not be so obvious, also most of the ioctls are probably going
to end up root or DRM master only, I'd like do an ioctl fuzzer at some
stage, I'd suspect a lot more then the dri would be oopsable with
permissions...

Thanks,
Dave.


I agree with Dave for most if not all of the above.
Typing, for example unsigned / uint32 vs u32, __u32 is very easily 
fixable once we decide on a clear way to go to keep  (if we want to 
keep) compatibility with *bsd.


For the IOCTL checking, as Dave states, most invalid data will be 
trapped in hash lookups and
checks in the buffer validation system, but probably far from all. A 
fuzzer would be a nice tool to trap the exceptions.


/Thomas

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-04-27 Thread Greg KH
On Thu, Apr 26, 2007 at 04:55:14PM +1000, Dave Airlie wrote:
>  Hi,
> 
>  The patch is too big to fit on the list and I've no idea how we could
>  break it down further, it just happens to be a lot of new code..
> 
>  
> http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt
> 
>  The patch header and diffstat are below,

A few easy and simple comments based on looking at this for 5 minutes:
  - drop the typedefs.  Yeah, they might be a drm thing, but we don't
need them here.
  - the comment style for the functions is "odd" and not in kerneldoc
form, but something else.  Either use kerneldoc or nothing, don't
invent something new please.
  - what's with the /proc interface?  Don't add new proc code for
non-process related things.  This should all go into sysfs
somewhere.  And yes, I know /proc/dri/ is there today, but don't add
new stuff please.
  - struct drm_bo_arg can't use an int or unsigned, as it crosses the
userspace/kernelspace boundry, use the proper types for all values
in those types of structures (__u32, etc.)
  - there doesn't seem to be any validity checking for the arguments
passed into this new ioctl.  Possibly that's just the way the rest
of the dri interface is, which is scary, but with the memory stuff,
you really should check things properly...

that's enough to start with :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] [PATCH] DRM TTM Memory Manager patch

2007-04-27 Thread Greg KH
On Thu, Apr 26, 2007 at 04:55:14PM +1000, Dave Airlie wrote:
  Hi,
 
  The patch is too big to fit on the list and I've no idea how we could
  break it down further, it just happens to be a lot of new code..
 
  
 http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt
 
  The patch header and diffstat are below,

A few easy and simple comments based on looking at this for 5 minutes:
  - drop the typedefs.  Yeah, they might be a drm thing, but we don't
need them here.
  - the comment style for the functions is odd and not in kerneldoc
form, but something else.  Either use kerneldoc or nothing, don't
invent something new please.
  - what's with the /proc interface?  Don't add new proc code for
non-process related things.  This should all go into sysfs
somewhere.  And yes, I know /proc/dri/ is there today, but don't add
new stuff please.
  - struct drm_bo_arg can't use an int or unsigned, as it crosses the
userspace/kernelspace boundry, use the proper types for all values
in those types of structures (__u32, etc.)
  - there doesn't seem to be any validity checking for the arguments
passed into this new ioctl.  Possibly that's just the way the rest
of the dri interface is, which is scary, but with the memory stuff,
you really should check things properly...

that's enough to start with :)

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] [PATCH] DRM TTM Memory Manager patch

2007-04-26 Thread Dave Airlie

Hi,

The patch is too big to fit on the list and I've no idea how we could
break it down further, it just happens to be a lot of new code..

http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt

The patch header and diffstat are below,

This isn't for integration yet but we'd like an initial review by
anyone with the spare time and inclination, there is a lot stuff
relying on getting this code bet into shape and into the kernel but
any cleanups people can suggest now especially to the user interfaces
would be appreciated as once we set that stuff in stone it'll be a
pain to change... also it doesn't have any driver side code, this is
just the generic pieces. I'll post the intel 915 side code later but
there isn't that much to it..

It applies on top of my drm-2.6 git tree drm-mm branch

-

This patch brings in the TTM (Translation Table Maps) memory management
system from Thomas Hellstrom at Tungsten Graphics.

This patch only covers the core functionality and changes to the drm core.

The TTM memory manager enables dynamic mapping of memory objects in and
out of the graphic card accessible memory (e.g. AGP), this implements
the AGP backend for TTM to be used by the i915 driver.


drivers/char/drm/Makefile |3 +-
drivers/char/drm/drm.h|  238 -
drivers/char/drm/drmP.h   |  153 +++-
drivers/char/drm/drm_agpsupport.c |  160 +++
drivers/char/drm/drm_bo.c | 2329 
drivers/char/drm/drm_bo_move.c|  395 +++
drivers/char/drm/drm_bufs.c   |2 +
drivers/char/drm/drm_drv.c|   46 +-
drivers/char/drm/drm_fence.c  |  661 +++
drivers/char/drm/drm_fops.c   |   75 ++-
drivers/char/drm/drm_memory.c |   69 ++
drivers/char/drm/drm_mm.c |   13 +-
drivers/char/drm/drm_object.c |  293 +
drivers/char/drm/drm_objects.h|  464 
drivers/char/drm/drm_proc.c   |   90 ++
drivers/char/drm/drm_stub.c   |   24 +-
drivers/char/drm/drm_ttm.c|  337 ++
drivers/char/drm/drm_vm.c |  193 +++-
18 files changed, 5503 insertions(+), 42 deletions(-)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC] [PATCH] DRM TTM Memory Manager patch

2007-04-26 Thread Dave Airlie

Hi,

The patch is too big to fit on the list and I've no idea how we could
break it down further, it just happens to be a lot of new code..

http://people.freedesktop.org/~airlied/ttm/0001-drm-Implement-TTM-Memory-manager-core-functionality.txt

The patch header and diffstat are below,

This isn't for integration yet but we'd like an initial review by
anyone with the spare time and inclination, there is a lot stuff
relying on getting this code bet into shape and into the kernel but
any cleanups people can suggest now especially to the user interfaces
would be appreciated as once we set that stuff in stone it'll be a
pain to change... also it doesn't have any driver side code, this is
just the generic pieces. I'll post the intel 915 side code later but
there isn't that much to it..

It applies on top of my drm-2.6 git tree drm-mm branch

-

This patch brings in the TTM (Translation Table Maps) memory management
system from Thomas Hellstrom at Tungsten Graphics.

This patch only covers the core functionality and changes to the drm core.

The TTM memory manager enables dynamic mapping of memory objects in and
out of the graphic card accessible memory (e.g. AGP), this implements
the AGP backend for TTM to be used by the i915 driver.


drivers/char/drm/Makefile |3 +-
drivers/char/drm/drm.h|  238 -
drivers/char/drm/drmP.h   |  153 +++-
drivers/char/drm/drm_agpsupport.c |  160 +++
drivers/char/drm/drm_bo.c | 2329 
drivers/char/drm/drm_bo_move.c|  395 +++
drivers/char/drm/drm_bufs.c   |2 +
drivers/char/drm/drm_drv.c|   46 +-
drivers/char/drm/drm_fence.c  |  661 +++
drivers/char/drm/drm_fops.c   |   75 ++-
drivers/char/drm/drm_memory.c |   69 ++
drivers/char/drm/drm_mm.c |   13 +-
drivers/char/drm/drm_object.c |  293 +
drivers/char/drm/drm_objects.h|  464 
drivers/char/drm/drm_proc.c   |   90 ++
drivers/char/drm/drm_stub.c   |   24 +-
drivers/char/drm/drm_ttm.c|  337 ++
drivers/char/drm/drm_vm.c |  193 +++-
18 files changed, 5503 insertions(+), 42 deletions(-)
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/