Re: [PATCH 0/3] Add support for Tegra Activity Monitor

2014-11-10 Thread Terje Bergström
On 11.11.2014 06:29, Alexandre Courbot wrote:
> I think (after a quick look at devfreq's source) that you can avoid
> polling altogether if you set polling_ms to 0 in your
> devfreq_dev_profile instance. Then it is up to you to call
> update_devfreq() from your custom governor whenever it sees fit.
> 
> ACTMON support seems to overlap between being a governor (which reacts
> to ACTMON interrupts and calls update_devfreq() when needed) and part of
> a devfreq_dev_profile (get_dev_status() needs to use the actmon
> counters). If we keep your current design where the driver simply
> controls a clock, you could have the ACTMON driver obtain that clock,
> register its governor, create a non-polling devfreq_dev_profile that
> controls that clock, and just call devfreq_add_device() with both. Then
> we will have the benefit of being able to use ACTMON as well as the
> performance and powersave governors on EMC, and switch policies
> dynamically.

Another way to use it is that governor is just a governor. It takes in
load values and generates new target frequencies, and doesn't know
anything about HW.

Device profile is the one that enables threshold interrupts and disables
polling. Device profile receives the interrupt, retrieves new load
number from hardware, and calls update_devfreq().

This way we keep all HW specific code in device profile, and there's
potential to use a generic governor instead of writing your own.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] Add support for Tegra Activity Monitor

2014-11-10 Thread Terje Bergström
On 11.11.2014 06:29, Alexandre Courbot wrote:
 I think (after a quick look at devfreq's source) that you can avoid
 polling altogether if you set polling_ms to 0 in your
 devfreq_dev_profile instance. Then it is up to you to call
 update_devfreq() from your custom governor whenever it sees fit.
 
 ACTMON support seems to overlap between being a governor (which reacts
 to ACTMON interrupts and calls update_devfreq() when needed) and part of
 a devfreq_dev_profile (get_dev_status() needs to use the actmon
 counters). If we keep your current design where the driver simply
 controls a clock, you could have the ACTMON driver obtain that clock,
 register its governor, create a non-polling devfreq_dev_profile that
 controls that clock, and just call devfreq_add_device() with both. Then
 we will have the benefit of being able to use ACTMON as well as the
 performance and powersave governors on EMC, and switch policies
 dynamically.

Another way to use it is that governor is just a governor. It takes in
load values and generates new target frequencies, and doesn't know
anything about HW.

Device profile is the one that enables threshold interrupts and disables
polling. Device profile receives the interrupt, retrieves new load
number from hardware, and calls update_devfreq().

This way we keep all HW specific code in device profile, and there's
potential to use a generic governor instead of writing your own.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] drm/nouveau: disable caching for VRAM BOs on ARM

2014-05-26 Thread Terje Bergström
On 23.05.2014 17:40, Alex Courbot wrote:
> On 05/23/2014 06:59 PM, Lucas Stach wrote:
> So after checking with more knowledgeable people, it turns out this is 
> the expected behavior on ARM and BAR regions should be mapped uncached 
> on GK20A. All the more reasons to avoid using the BAR at all.

This is actually specific to Tegra.

>> You may want to make yourself aware of all the quirks required for
>> sharing memory between the GPU and CPU on an ARM host. I think there are
>> far more involved than what you see now and writing an replacement for
>> TTM will not be an easy task.
>>
>> Doing away with the concept of two memory areas will not get you to a
>> single unified address space. You would have to deal with things like
>> not being able to change the caching state of pages in the systems
>> lowmem yourself. You will still have to deal with remapping pages that
>> aren't currently visible to the CPU (ok this is not an issue on Jetson
>> right now as it only has 2GB of RAM), because it's in systems highmem,
>> or even in a different LPAE area.
>>
>> You really want to be sure you are aware of all the consequences of
>> this, before considering this task.
> 
> Yep, that's why I am seeking advice here. My first hope is that with a 
> few tweaks we will be able to keep using TTM and the current nouveau_bo 
> implementation. But unless I missed something this is not going to be easy.
> 
> We can also use something like the patch I originally sent to make it 
> work, although not with good performance, on GK20A. Not very graceful, 
> but it will allow applications to run.
> 
> In the long run though, we will want to achieve better performance, and 
> it seems like a BO implementation targeted at UMA devices would also be 
> beneficial to quite a few desktop GPUs. So as tricky as it may be I'm 
> interested in gathering thoughts and why not giving it a first try with 
> GK20A, even if it imposes some limitations like having buffers in lowmem 
> in a first time (we can probably live with this one for a short while, 
> and 64 bits will also be coming to the rescue :))

I don't think lowmem or LPAE is any problem, if the memory manager is
designed with that in mind. Vast majority of the buffers kernel
allocates do not need to be touched in kernel space.

Actually I can't think of any buffers that we allocate on behalf of user
space that would need to be permanently mapped also to kernel. In case
or relocs only push buffer needs to be temporarily mapped to kernel.

Ultimately even relocs are not necessary if we expose GPU virtual
addresses directly to user space. But that's another topic.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] drm/nouveau: disable caching for VRAM BOs on ARM

2014-05-26 Thread Terje Bergström
On 23.05.2014 17:40, Alex Courbot wrote:
 On 05/23/2014 06:59 PM, Lucas Stach wrote:
 So after checking with more knowledgeable people, it turns out this is 
 the expected behavior on ARM and BAR regions should be mapped uncached 
 on GK20A. All the more reasons to avoid using the BAR at all.

This is actually specific to Tegra.

 You may want to make yourself aware of all the quirks required for
 sharing memory between the GPU and CPU on an ARM host. I think there are
 far more involved than what you see now and writing an replacement for
 TTM will not be an easy task.

 Doing away with the concept of two memory areas will not get you to a
 single unified address space. You would have to deal with things like
 not being able to change the caching state of pages in the systems
 lowmem yourself. You will still have to deal with remapping pages that
 aren't currently visible to the CPU (ok this is not an issue on Jetson
 right now as it only has 2GB of RAM), because it's in systems highmem,
 or even in a different LPAE area.

 You really want to be sure you are aware of all the consequences of
 this, before considering this task.
 
 Yep, that's why I am seeking advice here. My first hope is that with a 
 few tweaks we will be able to keep using TTM and the current nouveau_bo 
 implementation. But unless I missed something this is not going to be easy.
 
 We can also use something like the patch I originally sent to make it 
 work, although not with good performance, on GK20A. Not very graceful, 
 but it will allow applications to run.
 
 In the long run though, we will want to achieve better performance, and 
 it seems like a BO implementation targeted at UMA devices would also be 
 beneficial to quite a few desktop GPUs. So as tricky as it may be I'm 
 interested in gathering thoughts and why not giving it a first try with 
 GK20A, even if it imposes some limitations like having buffers in lowmem 
 in a first time (we can probably live with this one for a short while, 
 and 64 bits will also be coming to the rescue :))

I don't think lowmem or LPAE is any problem, if the memory manager is
designed with that in mind. Vast majority of the buffers kernel
allocates do not need to be touched in kernel space.

Actually I can't think of any buffers that we allocate on behalf of user
space that would need to be permanently mapped also to kernel. In case
or relocs only push buffer needs to be temporarily mapped to kernel.

Ultimately even relocs are not necessary if we expose GPU virtual
addresses directly to user space. But that's another topic.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 04/10] drm/nouveau/fb: add GK20A support

2014-05-02 Thread Terje Bergström
On 01.05.2014 07:49, Alexandre Courbot wrote:
> Agreed. Besides I hope the day won't come where we have to go through
> 2 layers of memory translation for the GPU...

That's actually the method of operation that gives us the best
performance. GPU likes big pages, and without IOMMU translation you'd
have a hard time finding enough contiguous physical memory.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 04/10] drm/nouveau/fb: add GK20A support

2014-05-02 Thread Terje Bergström
On 01.05.2014 07:49, Alexandre Courbot wrote:
 Agreed. Besides I hope the day won't come where we have to go through
 2 layers of memory translation for the GPU...

That's actually the method of operation that gives us the best
performance. GPU likes big pages, and without IOMMU translation you'd
have a hard time finding enough contiguous physical memory.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gpu: host1x: do not check previously handled gathers

2014-01-30 Thread Terje Bergström
On 07.01.2014 22:03, Erik Faye-Lund wrote:
> When patching gathers, we don't need to check against
> gathers with lower indices than the current one, as
> they are guaranteed to already have been handled.
> 
> Signed-off-by: Erik Faye-Lund 
> ---
> 
> Here's a trivial optimization I have been running with for a while.
> 
>  drivers/gpu/host1x/job.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index de5ec33..e965805 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -534,7 +534,7 @@ int host1x_job_pin(struct host1x_job *job, struct device 
> *dev)
>  
>   g->base = job->gather_addr_phys[i];
>  
> - for (j = 0; j < job->num_gathers; j++)
> + for (j = i + 1; j < job->num_gathers; j++)
>   if (job->gathers[j].bo == g->bo)
>   job->gathers[j].handled = true;
>  

Hi,

Thanks. This looks good logically, and I ran some tests that agree.

Acked-By: Terje Bergstrom 

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gpu: host1x: do not check previously handled gathers

2014-01-30 Thread Terje Bergström
On 07.01.2014 22:03, Erik Faye-Lund wrote:
 When patching gathers, we don't need to check against
 gathers with lower indices than the current one, as
 they are guaranteed to already have been handled.
 
 Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
 ---
 
 Here's a trivial optimization I have been running with for a while.
 
  drivers/gpu/host1x/job.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
 index de5ec33..e965805 100644
 --- a/drivers/gpu/host1x/job.c
 +++ b/drivers/gpu/host1x/job.c
 @@ -534,7 +534,7 @@ int host1x_job_pin(struct host1x_job *job, struct device 
 *dev)
  
   g-base = job-gather_addr_phys[i];
  
 - for (j = 0; j  job-num_gathers; j++)
 + for (j = i + 1; j  job-num_gathers; j++)
   if (job-gathers[j].bo == g-bo)
   job-gathers[j].handled = true;
  

Hi,

Thanks. This looks good logically, and I ran some tests that agree.

Acked-By: Terje Bergstrom tbergst...@nvidia.com

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why have another variable deciding a tracepoint?

2013-11-17 Thread Terje Bergström
On 15.11.2013 16:52, Steven Rostedt wrote:
> So let me get this straight. The recording of the cdma_push() data is
> slow, correct? What mapping are you talking about?
> 
>   trace_host1x_cdma_push(dev_name(cdma_to_channel(cdma)->dev),
>  op1, op2);
> 
> #define cdma_to_channel(cdma) container_of(cdma, struct host1x_channel, cdma)
> 
> That just references the cdma field of struct host1x_channel, to get
> the dev field, which does dev_name() that just gets the dev->init_name
> (if defined, or kobject_name() if not). And the two u32 fields that are
> passed in.
> 
> The tracepoint assigns with:
> 
>   TP_fast_assign(
>   __entry->name = name;
>   __entry->op1 = op1;
>   __entry->op2 = op2;
>   ),
> 
> So I still have to ask; what do you have to set up and take down here?

Oops, there's a piece missing that I didn't mention. What we want is a
full trace of what we're sending to the push buffer. See
hw/channel_hw.c/trace_write_gather(). It maps the buffer handed from
user space, dumps it to ftrace and unmaps it. That costs a lot of time
that we don't want to spend in normal operation and hence the special
condition.

trace_host1x_cdma_push() in the code you refer to just makes the dump
complete by adding some overhead opcodes that kernel needs to add. It
doesn't make sense to output that if we don't get the bulk from
trace_write_gather() and that's why it's also checking the same
conditional. It'd just make the trace look corrupted.

> It's documented in the code ;-)  (kidding)
> 
> Yeah, I need to get that out a bit more. That's actually how I found
> this code, I'm doing searches for all the locations that can benefit
> from TRACE_EVENT_CONDITION(), and I'm trying to fix them up. I'm still
> not convinced that this extra variable checking is required for this
> tracepoint.
> 
> As you state though, I'll have to get time to document CONDITION() and
> how and when to use it.

Sounds good. :-)

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why have another variable deciding a tracepoint?

2013-11-17 Thread Terje Bergström
On 15.11.2013 16:52, Steven Rostedt wrote:
 So let me get this straight. The recording of the cdma_push() data is
 slow, correct? What mapping are you talking about?
 
   trace_host1x_cdma_push(dev_name(cdma_to_channel(cdma)-dev),
  op1, op2);
 
 #define cdma_to_channel(cdma) container_of(cdma, struct host1x_channel, cdma)
 
 That just references the cdma field of struct host1x_channel, to get
 the dev field, which does dev_name() that just gets the dev-init_name
 (if defined, or kobject_name() if not). And the two u32 fields that are
 passed in.
 
 The tracepoint assigns with:
 
   TP_fast_assign(
   __entry-name = name;
   __entry-op1 = op1;
   __entry-op2 = op2;
   ),
 
 So I still have to ask; what do you have to set up and take down here?

Oops, there's a piece missing that I didn't mention. What we want is a
full trace of what we're sending to the push buffer. See
hw/channel_hw.c/trace_write_gather(). It maps the buffer handed from
user space, dumps it to ftrace and unmaps it. That costs a lot of time
that we don't want to spend in normal operation and hence the special
condition.

trace_host1x_cdma_push() in the code you refer to just makes the dump
complete by adding some overhead opcodes that kernel needs to add. It
doesn't make sense to output that if we don't get the bulk from
trace_write_gather() and that's why it's also checking the same
conditional. It'd just make the trace look corrupted.

 It's documented in the code ;-)  (kidding)
 
 Yeah, I need to get that out a bit more. That's actually how I found
 this code, I'm doing searches for all the locations that can benefit
 from TRACE_EVENT_CONDITION(), and I'm trying to fix them up. I'm still
 not convinced that this extra variable checking is required for this
 tracepoint.
 
 As you state though, I'll have to get time to document CONDITION() and
 how and when to use it.

Sounds good. :-)

Terje

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why have another variable deciding a tracepoint?

2013-11-15 Thread Terje Bergström
On 15.11.2013 06:48, Steven Rostedt wrote:
> I've been reviewing different users of tracepoints and I stumbled
> across this:
> 
> drivers/gpu/host1x/cdma.c: host1x_cdma_push()
> 
>   if (host1x_debug_trace_cmdbuf)
>   trace_host1x_cdma_push(dev_name(cdma_to_channel(cdma)->dev),
>  op1, op2);
> 
> That host1x_debug_trace_cmdbuf is a variable that gets set by another
> debugfs file "trace_cmdbuf" that is custom to this driver.
> 
> Why?

This is because it takes a lot of time to prepare for dumping the cmdbuf
data, like mapping buffer and unmapping after tracing. We want to avoid
all that preparation time.

> The tracepoint host1x_cdma_push is already controlled by either ftrace
> or perf. If it gets enabled by perf or ftrace, it still wont be traced
> unless we also enable this trace_cmdbuf. Is there some reason for this?
> I can't figure it out from the change log: 6236451d83a720 ("gpu:
> host1x: Add debug support").
> 
> As tracepoints uses jump labels, there is no branch cost associated
> with them. That is, they are either a direct jump, or a nop (in most
> cases a nop). But here you added the overhead of a conditional branch
> depending on this variable.
> 
> If this is truly needed, then use TRACE_EVENT_CONDITION() for that
> tracepoint.

TRACE_EVENT_CONDITION() isn't documented, so I don't know how that would
help.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Why have another variable deciding a tracepoint?

2013-11-15 Thread Terje Bergström
On 15.11.2013 06:48, Steven Rostedt wrote:
 I've been reviewing different users of tracepoints and I stumbled
 across this:
 
 drivers/gpu/host1x/cdma.c: host1x_cdma_push()
 
   if (host1x_debug_trace_cmdbuf)
   trace_host1x_cdma_push(dev_name(cdma_to_channel(cdma)-dev),
  op1, op2);
 
 That host1x_debug_trace_cmdbuf is a variable that gets set by another
 debugfs file trace_cmdbuf that is custom to this driver.
 
 Why?

This is because it takes a lot of time to prepare for dumping the cmdbuf
data, like mapping buffer and unmapping after tracing. We want to avoid
all that preparation time.

 The tracepoint host1x_cdma_push is already controlled by either ftrace
 or perf. If it gets enabled by perf or ftrace, it still wont be traced
 unless we also enable this trace_cmdbuf. Is there some reason for this?
 I can't figure it out from the change log: 6236451d83a720 (gpu:
 host1x: Add debug support).
 
 As tracepoints uses jump labels, there is no branch cost associated
 with them. That is, they are either a direct jump, or a nop (in most
 cases a nop). But here you added the overhead of a conditional branch
 depending on this variable.
 
 If this is truly needed, then use TRACE_EVENT_CONDITION() for that
 tracepoint.

TRACE_EVENT_CONDITION() isn't documented, so I don't know how that would
help.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 0/4] gpu: host1x: Add syncpoint base support

2013-10-28 Thread Terje Bergström
On 14.10.2013 15:21, Arto Merilainen wrote:
> The host1x driver uses currently syncpoints statically from host1x point of
> view. If we do a wait inside a job, it always has a constant value to wait.
> host1x supports also doing relative syncpoint waits with respect to syncpoint
> bases. This allows doing multiple operations inside a single submit and
> waiting an operation to complete before moving to next one.
> 
> This set of patches adds support for syncpoint bases to host1x driver and
> enables the support for gr2d client.
> 
> I have tested the series using the host1x test application (available at [0],
> function test_wait_base() in tests/tegra/host1x/tegra_host1x_test.c) on 
> cardhu.
> I would appreciate help in reviewing the series and testing the patches
> on other boards.
> 
> Changes in v2:
> - Reordered various code blocks to improve code consistency
> - Functions host1x_syncpt_alloc() and host1x_syncpt_request() take now a 
> single
> bitfield argument instead of separate boolean arguments
> - Added a separate ioctl call for querying the base associated with some
> syncpoint
> 
> [0] https://gitorious.org/linux-host1x/libdrm-host1x
> 
> Arto Merilainen (4):
>   gpu: host1x: Add 'flags' field to syncpt request
>   gpu: host1x: Add syncpoint base support
>   drm/tegra: Deliver syncpoint base to user space
>   drm/tegra: Reserve base for gr2d
> 
>  drivers/gpu/host1x/dev.h   |  2 ++
>  drivers/gpu/host1x/drm/drm.c   | 25 +
>  drivers/gpu/host1x/drm/gr2d.c  |  2 +-
>  drivers/gpu/host1x/hw/channel_hw.c | 19 ++
>  drivers/gpu/host1x/hw/hw_host1x01_uclass.h |  6 
>  drivers/gpu/host1x/syncpt.c| 58 
> +-
>  drivers/gpu/host1x/syncpt.h| 10 +-
>  include/uapi/drm/tegra_drm.h   | 26 +-
>  8 files changed, 128 insertions(+), 20 deletions(-)
> 

The series,

Reviewed-by: Terje Bergstrom 

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 0/4] gpu: host1x: Add syncpoint base support

2013-10-28 Thread Terje Bergström
On 14.10.2013 15:21, Arto Merilainen wrote:
 The host1x driver uses currently syncpoints statically from host1x point of
 view. If we do a wait inside a job, it always has a constant value to wait.
 host1x supports also doing relative syncpoint waits with respect to syncpoint
 bases. This allows doing multiple operations inside a single submit and
 waiting an operation to complete before moving to next one.
 
 This set of patches adds support for syncpoint bases to host1x driver and
 enables the support for gr2d client.
 
 I have tested the series using the host1x test application (available at [0],
 function test_wait_base() in tests/tegra/host1x/tegra_host1x_test.c) on 
 cardhu.
 I would appreciate help in reviewing the series and testing the patches
 on other boards.
 
 Changes in v2:
 - Reordered various code blocks to improve code consistency
 - Functions host1x_syncpt_alloc() and host1x_syncpt_request() take now a 
 single
 bitfield argument instead of separate boolean arguments
 - Added a separate ioctl call for querying the base associated with some
 syncpoint
 
 [0] https://gitorious.org/linux-host1x/libdrm-host1x
 
 Arto Merilainen (4):
   gpu: host1x: Add 'flags' field to syncpt request
   gpu: host1x: Add syncpoint base support
   drm/tegra: Deliver syncpoint base to user space
   drm/tegra: Reserve base for gr2d
 
  drivers/gpu/host1x/dev.h   |  2 ++
  drivers/gpu/host1x/drm/drm.c   | 25 +
  drivers/gpu/host1x/drm/gr2d.c  |  2 +-
  drivers/gpu/host1x/hw/channel_hw.c | 19 ++
  drivers/gpu/host1x/hw/hw_host1x01_uclass.h |  6 
  drivers/gpu/host1x/syncpt.c| 58 
 +-
  drivers/gpu/host1x/syncpt.h| 10 +-
  include/uapi/drm/tegra_drm.h   | 26 +-
  8 files changed, 128 insertions(+), 20 deletions(-)
 

The series,

Reviewed-by: Terje Bergstrom tbergst...@nvidia.com

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/5] gpu: host1x: Add runtime pm support

2013-10-09 Thread Terje Bergström
On 08.10.2013 09:27, Arto Merilainen wrote:
> This series adds runtime pm support for host1x, gr2d and dc. It retains the
> current behaviour if CONFIG_PM_RUNTIME is not enabled.
> 
> The gr2d clock is enabled when a new job is submitted and disabled when
> the work is done. Due to parent->child relations between host1x->gr2d, this
> scheme enables and disables host1x clock.
> 
> For dc, the clocks are enabled in .probe and disabled in .remove via runtime
> pm instead of direct clock APIs.
> 
> Mayuresh is unfortunately not available to continue with the series and hence
> I will continue working on the patches.

The series,

Acked-By: Terje Bergstrom 

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/5] gpu: host1x: Add runtime pm support

2013-10-09 Thread Terje Bergström
On 08.10.2013 09:27, Arto Merilainen wrote:
 This series adds runtime pm support for host1x, gr2d and dc. It retains the
 current behaviour if CONFIG_PM_RUNTIME is not enabled.
 
 The gr2d clock is enabled when a new job is submitted and disabled when
 the work is done. Due to parent-child relations between host1x-gr2d, this
 scheme enables and disables host1x clock.
 
 For dc, the clocks are enabled in .probe and disabled in .remove via runtime
 pm instead of direct clock APIs.
 
 Mayuresh is unfortunately not available to continue with the series and hence
 I will continue working on the patches.

The series,

Acked-By: Terje Bergstrom tbergst...@nvidia.com

Terje

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gpu: host1x: check relocs after all gathers are consumed

2013-10-08 Thread Terje Bergström
On 04.10.2013 23:18, Erik Faye-Lund wrote:
> The num_relocs count are passed to the kernel per job, not per gather.
> 
> For multi-gather jobs, we would previously fail if there were relocs in
> other gathers aside from the first one.
> 
> Fix this by simply moving the check until all gathers have been
> consumed.
> 
> Signed-off-by: Erik Faye-Lund 
> ---
>  drivers/gpu/host1x/job.c |8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index c4e1050..c9ddff8 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -436,10 +436,6 @@ static int validate(struct host1x_firewall *fw, struct 
> host1x_job_gather *g)
>   }
>   }
>  
> - /* No relocs should remain at this point */
> - if (fw->num_relocs)
> - err = -EINVAL;
> -
>  out:
>   return err;
>  }
> @@ -493,6 +489,10 @@ static inline int copy_gathers(struct host1x_job *job, 
> struct device *dev)
>   offset += g->words * sizeof(u32);
>   }
>  
> + /* No relocs should remain at this point */
> + if (fw.num_relocs)
> + return -EINVAL;
> +
>   return 0;
>  }

Good catch.

Acked-By: Terje Bergstrom 

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] gpu: host1x: check relocs after all gathers are consumed

2013-10-08 Thread Terje Bergström
On 04.10.2013 23:18, Erik Faye-Lund wrote:
 The num_relocs count are passed to the kernel per job, not per gather.
 
 For multi-gather jobs, we would previously fail if there were relocs in
 other gathers aside from the first one.
 
 Fix this by simply moving the check until all gathers have been
 consumed.
 
 Signed-off-by: Erik Faye-Lund kusmab...@gmail.com
 ---
  drivers/gpu/host1x/job.c |8 
  1 file changed, 4 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
 index c4e1050..c9ddff8 100644
 --- a/drivers/gpu/host1x/job.c
 +++ b/drivers/gpu/host1x/job.c
 @@ -436,10 +436,6 @@ static int validate(struct host1x_firewall *fw, struct 
 host1x_job_gather *g)
   }
   }
  
 - /* No relocs should remain at this point */
 - if (fw-num_relocs)
 - err = -EINVAL;
 -
  out:
   return err;
  }
 @@ -493,6 +489,10 @@ static inline int copy_gathers(struct host1x_job *job, 
 struct device *dev)
   offset += g-words * sizeof(u32);
   }
  
 + /* No relocs should remain at this point */
 + if (fw.num_relocs)
 + return -EINVAL;
 +
   return 0;
  }

Good catch.

Acked-By: Terje Bergstrom tbergst...@nvidia.com

Terje

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] ARM: tegra: Add host1x, dc and hdmi to Tegra114 device tree

2013-08-29 Thread Terje Bergström
On 28.08.2013 16:18, Thierry Reding wrote:
> I think that's not all. I have local patches that also introduce a v2 of
> host1x, because the number of syncpoints is different. There may also be
> other differences, but Terje might be more qualified to answer that.

Tegra4 host1x has an extra channel(totals 9), which caused bitfields in
a couple of registers to shift. The registers are mainly used in the
debug code to dump the channel FIFO. Same number of sync points as
Tegra3, but 12 wait bases.

Other changes are minor and driver already deals with them, for example
32-bit versus 16-bit sync point value comparison.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] ARM: tegra: Add host1x, dc and hdmi to Tegra114 device tree

2013-08-29 Thread Terje Bergström
On 28.08.2013 16:18, Thierry Reding wrote:
 I think that's not all. I have local patches that also introduce a v2 of
 host1x, because the number of syncpoints is different. There may also be
 other differences, but Terje might be more qualified to answer that.

Tegra4 host1x has an extra channel(totals 9), which caused bitfields in
a couple of registers to shift. The registers are mainly used in the
debug code to dump the channel FIFO. Same number of sync points as
Tegra3, but 12 wait bases.

Other changes are minor and driver already deals with them, for example
32-bit versus 16-bit sync point value comparison.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/27] drivers/gpu/host1x/drm: don't check resource with devm_ioremap_resource

2013-08-01 Thread Terje Bergström
On 23.07.2013 21:01, Wolfram Sang wrote:
> diff --git a/drivers/gpu/host1x/drm/hdmi.c b/drivers/gpu/host1x/drm/hdmi.c
> index 01097da..9ffece6 100644
> --- a/drivers/gpu/host1x/drm/hdmi.c
> +++ b/drivers/gpu/host1x/drm/hdmi.c
> @@ -1248,9 +1248,6 @@ static int tegra_hdmi_probe(struct platform_device 
> *pdev)
>   return err;
>  
>   regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> - if (!regs)
> - return -ENXIO;
> -
>   hdmi->regs = devm_ioremap_resource(>dev, regs);
>   if (IS_ERR(hdmi->regs))
>   return PTR_ERR(hdmi->regs);
> 

Looks good to me.

Reviewed-By: Terje Bergstrom 

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 05/27] drivers/gpu/host1x/drm: don't check resource with devm_ioremap_resource

2013-08-01 Thread Terje Bergström
On 23.07.2013 21:01, Wolfram Sang wrote:
 diff --git a/drivers/gpu/host1x/drm/hdmi.c b/drivers/gpu/host1x/drm/hdmi.c
 index 01097da..9ffece6 100644
 --- a/drivers/gpu/host1x/drm/hdmi.c
 +++ b/drivers/gpu/host1x/drm/hdmi.c
 @@ -1248,9 +1248,6 @@ static int tegra_hdmi_probe(struct platform_device 
 *pdev)
   return err;
  
   regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
 - if (!regs)
 - return -ENXIO;
 -
   hdmi-regs = devm_ioremap_resource(pdev-dev, regs);
   if (IS_ERR(hdmi-regs))
   return PTR_ERR(hdmi-regs);
 

Looks good to me.

Reviewed-By: Terje Bergstrom tbergst...@nvidia.com

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drm/tegra: add support for runtime pm

2013-06-11 Thread Terje Bergström
On 10.06.2013 23:43, Thierry Reding wrote:
> Can you post the corresponding wrappers to make it easier to discuss
> them? If they just wrap runtime PM calls then they don't solve the
> locality problems that Terje brought up.

I don't think the wrappers are applicable here. They're there in
downstream to fix a problem with runtime PM core: some host1x clients
require host1x to be active when they're processing, and some (dc) need
it only when CPU is programming DC or when a signal towards host1x is
pending. Runtime PM wants to keep parent enabled when any of the clients
are enabled.

We solved this downstream by enabling ignore_children for host1x. As a
side effect, we need to explicitly enable host1x when we're enabling one
of its clients that need host1x, and that's what the wrapper does.

We can fix this particular problem by moving calls to runtime PM to job.c.

Mayuresh, once you've managed to test your patches, please send the v2.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drm/tegra: add support for runtime pm

2013-06-11 Thread Terje Bergström
On 10.06.2013 23:43, Thierry Reding wrote:
 Can you post the corresponding wrappers to make it easier to discuss
 them? If they just wrap runtime PM calls then they don't solve the
 locality problems that Terje brought up.

I don't think the wrappers are applicable here. They're there in
downstream to fix a problem with runtime PM core: some host1x clients
require host1x to be active when they're processing, and some (dc) need
it only when CPU is programming DC or when a signal towards host1x is
pending. Runtime PM wants to keep parent enabled when any of the clients
are enabled.

We solved this downstream by enabling ignore_children for host1x. As a
side effect, we need to explicitly enable host1x when we're enabling one
of its clients that need host1x, and that's what the wrapper does.

We can fix this particular problem by moving calls to runtime PM to job.c.

Mayuresh, once you've managed to test your patches, please send the v2.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 0/7] Miscellaneous fixes to host1x

2013-05-31 Thread Terje Bergström
On 29.05.2013 13:26, Arto Merilainen wrote:
> This patch series fixes two issues in the host1x driver: First, the
> command buffer validation routine had vulnerabilities that were not
> detected in earlier testing. Second, the return codes of some
> functions were misleading or completely missing. This caused the
> driver to give wrong return codes also to the userspace.
> 
> The series is based on top of 3.10rc3. I have tested the patch series
> on cardhu by running host1x and gr2d test cases (available at [0]).
> I would appreciate any help in testing/reviewing these patches.
> 
> Changes from v1:
>  * Rebased on top of 3.10rc3
>  * Split firewall fixes to smaller patches
>  * Reworked no-reloc case in firewall code. Fix in v1 was not
>sufficient in all cases
>  * Dropped patch "Fix syncpoint wait return value" as it is not
>critical and discussion on it has not yet settled.
>  * Fixed style and whitespace issues
> 
> [0] https://gitorious.org/linux-host1x/libdrm-host1x
> 
> Arto Merilainen (5):
>   gpu: host1x: Check reloc table before usage
>   gpu: host1x: Copy gathers before verification
>   gpu: host1x: Fix memory access in syncpt request
>   gpu: host1x: Fix client_managed type
>   gpu: host1x: Rework CPU syncpoint increment
> 
> Terje Bergstrom (2):
>   gpu: host1x: Check INCR opcode correctly
>   gpu: host1x: Don't reset firewall between gathers
> 
>  drivers/gpu/host1x/dev.h  |   8 +--
>  drivers/gpu/host1x/drm/drm.c  |   3 +-
>  drivers/gpu/host1x/drm/gr2d.c |   2 +-
>  drivers/gpu/host1x/hw/cdma_hw.c   |   2 +-
>  drivers/gpu/host1x/hw/syncpt_hw.c |  12 ++--
>  drivers/gpu/host1x/job.c  | 135 
> +-
>  drivers/gpu/host1x/syncpt.c   |  26 +++-
>  drivers/gpu/host1x/syncpt.h   |  13 ++--
>  8 files changed, 85 insertions(+), 116 deletions(-)
> 

Arto's patches above,

Acked-By: Terje Bergstrom 

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv2 0/7] Miscellaneous fixes to host1x

2013-05-31 Thread Terje Bergström
On 29.05.2013 13:26, Arto Merilainen wrote:
 This patch series fixes two issues in the host1x driver: First, the
 command buffer validation routine had vulnerabilities that were not
 detected in earlier testing. Second, the return codes of some
 functions were misleading or completely missing. This caused the
 driver to give wrong return codes also to the userspace.
 
 The series is based on top of 3.10rc3. I have tested the patch series
 on cardhu by running host1x and gr2d test cases (available at [0]).
 I would appreciate any help in testing/reviewing these patches.
 
 Changes from v1:
  * Rebased on top of 3.10rc3
  * Split firewall fixes to smaller patches
  * Reworked no-reloc case in firewall code. Fix in v1 was not
sufficient in all cases
  * Dropped patch Fix syncpoint wait return value as it is not
critical and discussion on it has not yet settled.
  * Fixed style and whitespace issues
 
 [0] https://gitorious.org/linux-host1x/libdrm-host1x
 
 Arto Merilainen (5):
   gpu: host1x: Check reloc table before usage
   gpu: host1x: Copy gathers before verification
   gpu: host1x: Fix memory access in syncpt request
   gpu: host1x: Fix client_managed type
   gpu: host1x: Rework CPU syncpoint increment
 
 Terje Bergstrom (2):
   gpu: host1x: Check INCR opcode correctly
   gpu: host1x: Don't reset firewall between gathers
 
  drivers/gpu/host1x/dev.h  |   8 +--
  drivers/gpu/host1x/drm/drm.c  |   3 +-
  drivers/gpu/host1x/drm/gr2d.c |   2 +-
  drivers/gpu/host1x/hw/cdma_hw.c   |   2 +-
  drivers/gpu/host1x/hw/syncpt_hw.c |  12 ++--
  drivers/gpu/host1x/job.c  | 135 
 +-
  drivers/gpu/host1x/syncpt.c   |  26 +++-
  drivers/gpu/host1x/syncpt.h   |  13 ++--
  8 files changed, 85 insertions(+), 116 deletions(-)
 

Arto's patches above,

Acked-By: Terje Bergstrom tbergst...@nvidia.com

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drm/tegra: add support for runtime pm

2013-05-27 Thread Terje Bergström
On 27.05.2013 18:45, Thierry Reding wrote:
> On Mon, May 27, 2013 at 07:19:28PM +0530, Mayuresh Kulkarni wrote:
>> +#ifdef CONFIG_PM_RUNTIME
>> +static int host1x_runtime_suspend(struct device *dev)
>> +{
>> +struct host1x *host;
>> +
>> +host = dev_get_drvdata(dev);
>> +if (IS_ERR_OR_NULL(host))
> 
> I think a simple
> 
>   if (!host)
>   return -EINVAL;
> 
> would be enough here. The driver-data of the device should never be an
> ERR_PTR()-encoded value, but either a valid pointer to a host1x object
> or NULL.

True, we should avoid IS_ERR_OR_NULL() like plague. We always know if
the called API returns a NULL on error or an error code. In case of
error code we should just propagate that.

> Same comments apply here. Also I think it might be a good idea to split
> the host1x and gr2d changes into separate patches.

That's a bit tricky, but doable. We just need to enable it for 2D first,
and then host1x to keep bisectability.

>>  static void action_submit_complete(struct host1x_waitlist *waiter)
>>  {
>> +int completed = waiter->count;
>>  struct host1x_channel *channel = waiter->data;
>>  
>> +/* disable clocks for all the submits that got completed in this lot */
>> +while (completed--)
>> +pm_runtime_put(channel->dev);
>> +
>>  host1x_cdma_update(>cdma);
>>  
>> -/*  Add nr_completed to trace */
>> +/* Add nr_completed to trace */
>>  trace_host1x_channel_submit_complete(dev_name(channel->dev),
>>   waiter->count, waiter->thresh);
>> -
>>  }
> 
> This feels hackish. But I can't see any better place to do this. Terje,
> Arto: any ideas how we can do this in a cleaner way? If there's nothing
> better then maybe moving the code into a separate function, say
> host1x_waitlist_complete(), might make this less awkward?

Yeah, it's a bit awkward. action_submit_complete() actually does handle
completion of multiple jobs, and we do one pm_runtime_get() per job.

We could do pm_runtime_put() in host1x_cdma_update(). It anyway goes
through each job that is completed, so while freeing the job it could as
well call runtime PM. That way we could even remove the waiter->count
variable altogether as it's not needed anymore.

The not-so-beautiful aspect is that we do pm_runtime_get() in
host1x_channel.c and pm_runtime_put() in host1x_cdma.c. For code
readability it's be great to have them in the same file. I actually get
questions every now and then because in downstream because of doing
these operations in different files.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] drm/tegra: add support for runtime pm

2013-05-27 Thread Terje Bergström
On 27.05.2013 18:45, Thierry Reding wrote:
 On Mon, May 27, 2013 at 07:19:28PM +0530, Mayuresh Kulkarni wrote:
 +#ifdef CONFIG_PM_RUNTIME
 +static int host1x_runtime_suspend(struct device *dev)
 +{
 +struct host1x *host;
 +
 +host = dev_get_drvdata(dev);
 +if (IS_ERR_OR_NULL(host))
 
 I think a simple
 
   if (!host)
   return -EINVAL;
 
 would be enough here. The driver-data of the device should never be an
 ERR_PTR()-encoded value, but either a valid pointer to a host1x object
 or NULL.

True, we should avoid IS_ERR_OR_NULL() like plague. We always know if
the called API returns a NULL on error or an error code. In case of
error code we should just propagate that.

 Same comments apply here. Also I think it might be a good idea to split
 the host1x and gr2d changes into separate patches.

That's a bit tricky, but doable. We just need to enable it for 2D first,
and then host1x to keep bisectability.

  static void action_submit_complete(struct host1x_waitlist *waiter)
  {
 +int completed = waiter-count;
  struct host1x_channel *channel = waiter-data;
  
 +/* disable clocks for all the submits that got completed in this lot */
 +while (completed--)
 +pm_runtime_put(channel-dev);
 +
  host1x_cdma_update(channel-cdma);
  
 -/*  Add nr_completed to trace */
 +/* Add nr_completed to trace */
  trace_host1x_channel_submit_complete(dev_name(channel-dev),
   waiter-count, waiter-thresh);
 -
  }
 
 This feels hackish. But I can't see any better place to do this. Terje,
 Arto: any ideas how we can do this in a cleaner way? If there's nothing
 better then maybe moving the code into a separate function, say
 host1x_waitlist_complete(), might make this less awkward?

Yeah, it's a bit awkward. action_submit_complete() actually does handle
completion of multiple jobs, and we do one pm_runtime_get() per job.

We could do pm_runtime_put() in host1x_cdma_update(). It anyway goes
through each job that is completed, so while freeing the job it could as
well call runtime PM. That way we could even remove the waiter-count
variable altogether as it's not needed anymore.

The not-so-beautiful aspect is that we do pm_runtime_get() in
host1x_channel.c and pm_runtime_put() in host1x_cdma.c. For code
readability it's be great to have them in the same file. I actually get
questions every now and then because in downstream because of doing
these operations in different files.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv9 0/9] Support Tegra 2D hardware

2013-04-04 Thread Terje Bergström
On 22.03.2013 16:54, Thierry Reding wrote:
> On Fri, Mar 22, 2013 at 04:34:00PM +0200, Terje Bergstrom wrote:
>> This set of patches adds support for Tegra20 and Tegra30 host1x and
>> 2D. It is based on linux-next-20130322 with RTC fixes applied.
(...)
> For the series:
> 
> Reviewed-by: Thierry Reding 
> Tested-by: Thierry Reding 
> 
> For the Tegra DRM bits:
> 
> Acked-by: Thierry Reding 

Dave,

What's the status of this patch? The patch set is not yet merged in your
tree. Did I forget to do some step?

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv9 0/9] Support Tegra 2D hardware

2013-04-04 Thread Terje Bergström
On 22.03.2013 16:54, Thierry Reding wrote:
 On Fri, Mar 22, 2013 at 04:34:00PM +0200, Terje Bergstrom wrote:
 This set of patches adds support for Tegra20 and Tegra30 host1x and
 2D. It is based on linux-next-20130322 with RTC fixes applied.
(...)
 For the series:
 
 Reviewed-by: Thierry Reding thierry.red...@avionic-design.de
 Tested-by: Thierry Reding thierry.red...@avionic-design.de
 
 For the Tegra DRM bits:
 
 Acked-by: Thierry Reding thierry.red...@avionic-design.de

Dave,

What's the status of this patch? The patch set is not yet merged in your
tree. Did I forget to do some step?

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv7 10/10] drm: tegra: Add gr2d device

2013-03-19 Thread Terje Bergström
On 15.03.2013 14:13, Thierry Reding wrote:
> Also you now create a lookup table (or bitfield actually) as we
> discussed but you still pass around a function to check the lookup table
> against. What I originally intended was to not pass a function around at
> all but directly use the lookup table/bitfield. However I think we can
> leave this as-is for now and change it later if required.

Yeah, I think it's better if we leave this as is now. We should actually
have one table for host1x and one for 2D. The host1x one should be
shared between the drivers, but the table for client unit should be
local to the driver.

Let's take this up again when we have another driver.

> 
>> +static int gr2d_probe(struct platform_device *pdev)
>> +{
>> + struct device *dev = >dev;
>> + struct host1x_drm *host1x = host1x_get_drm_data(dev->parent);
>> + int err;
>> + struct gr2d *gr2d = NULL;
>> + struct host1x_syncpt **syncpts;
> 
> I don't think there's a need for this temporary variable. You could just
> use gr2d->client.syncpts directly.

I actually ended up with pretty long lines when I tried this, so I hope
it's ok to leave as is.

> 
>> + gr2d = devm_kzalloc(dev, sizeof(*gr2d), GFP_KERNEL);
>> + if (!gr2d)
>> + return -ENOMEM;
>> +
>> + syncpts = devm_kzalloc(dev, sizeof(*syncpts), GFP_KERNEL);

F.ex. this line would split two two lines.

Otherwise the changes should be now pretty much ok. You sent a bunch of
changes (thanks) that I merged to my tree. I'll just need to do some
testing and re-split the patches and send v8.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv7 10/10] drm: tegra: Add gr2d device

2013-03-19 Thread Terje Bergström
On 15.03.2013 14:13, Thierry Reding wrote:
 Also you now create a lookup table (or bitfield actually) as we
 discussed but you still pass around a function to check the lookup table
 against. What I originally intended was to not pass a function around at
 all but directly use the lookup table/bitfield. However I think we can
 leave this as-is for now and change it later if required.

Yeah, I think it's better if we leave this as is now. We should actually
have one table for host1x and one for 2D. The host1x one should be
shared between the drivers, but the table for client unit should be
local to the driver.

Let's take this up again when we have another driver.

 
 +static int gr2d_probe(struct platform_device *pdev)
 +{
 + struct device *dev = pdev-dev;
 + struct host1x_drm *host1x = host1x_get_drm_data(dev-parent);
 + int err;
 + struct gr2d *gr2d = NULL;
 + struct host1x_syncpt **syncpts;
 
 I don't think there's a need for this temporary variable. You could just
 use gr2d-client.syncpts directly.

I actually ended up with pretty long lines when I tried this, so I hope
it's ok to leave as is.

 
 + gr2d = devm_kzalloc(dev, sizeof(*gr2d), GFP_KERNEL);
 + if (!gr2d)
 + return -ENOMEM;
 +
 + syncpts = devm_kzalloc(dev, sizeof(*syncpts), GFP_KERNEL);

F.ex. this line would split two two lines.

Otherwise the changes should be now pretty much ok. You sent a bunch of
changes (thanks) that I merged to my tree. I'll just need to do some
testing and re-split the patches and send v8.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv7 10/10] drm: tegra: Add gr2d device

2013-03-15 Thread Terje Bergström
On 15.03.2013 14:13, Thierry Reding wrote:
> On Wed, Mar 13, 2013 at 02:36:26PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c
> [...]
>> +static inline struct host1x_drm_context *tegra_drm_get_context(
>> + struct list_head *contexts,
>> + struct host1x_drm_context *_ctx)
>> +{
>> + struct host1x_drm_context *ctx;
>> +
>> + list_for_each_entry(ctx, contexts, list)
>> + if (ctx == _ctx)
>> + return ctx;
>> + return NULL;
>> +}
> 
> Maybe make this a little more high-level, such as:
> 
> static bool host1x_drm_file_owns_context(struct host1x_drm_file *file,
>  struct host1x_drm_context *context)
> 
> ?

We only need the true/false value, so changing signature makes sense.

> 
>> +static int tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
>> + struct drm_file *file_priv)
>> +{
>> + struct tegra_drm_open_channel_args *args = data;
>> + struct host1x_client *client;
>> + struct host1x_drm_context *context;
>> + struct host1x_drm_file *fpriv = file_priv->driver_priv;
>> + struct host1x_drm *host1x = drm->dev_private;
>> + int err = -ENODEV;
>> +
>> + context = kzalloc(sizeof(*context), GFP_KERNEL);
>> + if (!context)
>> + return -ENOMEM;
>> +
>> + list_for_each_entry(client, >clients, list)
>> + if (client->class == args->class) {
>> + context->client = client;
> 
> Why do you assign this here? Should it perhaps be assigned only when the
> opening of the channel succeeds? The .open_channel() already receives a
> pointer to the client as parameter, so it doesn't have to be associated
> with the context.

True, we can move the assignment to happen after checking the
open_channel result.

> 
>> +static int tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
>> +  struct drm_file *file_priv)
>> +{
>> + struct tegra_drm_get_syncpoint *args = data;
>> + struct host1x_drm_file *fpriv = file_priv->driver_priv;
>> + struct host1x_drm_context *context =
>> + (struct host1x_drm_context *)(uintptr_t)args->context;
>> +
>> + if (!tegra_drm_get_context(>contexts, context))
>> + return -ENODEV;
>> +
>> + if (context->client->num_syncpts < args->param)
>> + return -ENODEV;
> 
> I think this might be easier to read as:
> 
> if (args->param >= context->client->num_syncpts)
> return -ENODEV;

Ok, will do.

> 
>> + args->value = host1x_syncpt_id(context->client->syncpts[args->param]);
> 
> This could use a temporary variable "syncpt" to make it easier to read.

Ok.

> 
>> +static int tegra_drm_gem_create_ioctl(struct drm_device *drm, void *data,
>> +   struct drm_file *file_priv)
> 
> tegra_drm_ioctl_gem_create()?

Sure.

> 
>> +{
>> + struct tegra_drm_create *args = data;
>> + struct drm_gem_cma_object *cma_obj;
>> + struct tegra_drm_bo *cma_bo;
> 
> These can probably just be named "cma"/"gem" and "bo" instead.

Ok.

> 
>> +static int tegra_drm_ioctl_get_offset(struct drm_device *drm, void *data,
>> +   struct drm_file *file_priv)
> 
> I think this might be more generically named tegra_drm_ioctl_mmap()
> which might come in handy if we ever need to do something more than just
> retrieve the offset.

Yeah, that changes the API semantics a bit, but in general there
shouldn't be a need to just get the offset without doing the actual mmap.

> 
>> +{
>> + struct tegra_drm_get_offset *args = data;
>> + struct drm_gem_cma_object *cma_obj;
>> + struct drm_gem_object *obj;
>> +
>> + obj = drm_gem_object_lookup(drm, file_priv, args->handle);
>> + if (!obj)
>> + return -EINVAL;
>> + cma_obj = to_drm_gem_cma_obj(obj);
> 
> The above two lines should be separated by a blank line.

Ok.

> 
>> +
>> + args->offset = cma_obj->base.map_list.hash.key << PAGE_SHIFT;
> 
> Perhaps a better way would be to export the get_gem_mmap_offset() from
> drivers/gpu/drm/drm_gem_cma_helper.c and reuse that.

Ok, we'll add that as a patch to the series.

> 
>>  static struct drm_ioctl_desc tegra_drm_ioctls[] = {
>> + DRM_IOCTL_DEF_DRV(TEGRA_DRM_GEM_CREATE,
>> + tegra_drm_gem_create_ioctl, DRM_UNLOCKED | DRM_AUTH),
>> + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_READ,
>> + tegra_drm_ioctl_syncpt_read, DRM_UNLOCKED),
>> + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_INCR,
>> + tegra_drm_ioctl_syncpt_incr, DRM_UNLOCKED),
>> + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_WAIT,
>> + tegra_drm_ioctl_syncpt_wait, DRM_UNLOCKED),
>> + DRM_IOCTL_DEF_DRV(TEGRA_DRM_OPEN_CHANNEL,
>> + 

Re: [PATCHv7 10/10] drm: tegra: Add gr2d device

2013-03-15 Thread Terje Bergström
On 15.03.2013 14:13, Thierry Reding wrote:
 On Wed, Mar 13, 2013 at 02:36:26PM +0200, Terje Bergstrom wrote:
 [...]
 diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c
 [...]
 +static inline struct host1x_drm_context *tegra_drm_get_context(
 + struct list_head *contexts,
 + struct host1x_drm_context *_ctx)
 +{
 + struct host1x_drm_context *ctx;
 +
 + list_for_each_entry(ctx, contexts, list)
 + if (ctx == _ctx)
 + return ctx;
 + return NULL;
 +}
 
 Maybe make this a little more high-level, such as:
 
 static bool host1x_drm_file_owns_context(struct host1x_drm_file *file,
  struct host1x_drm_context *context)
 
 ?

We only need the true/false value, so changing signature makes sense.

 
 +static int tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
 + struct drm_file *file_priv)
 +{
 + struct tegra_drm_open_channel_args *args = data;
 + struct host1x_client *client;
 + struct host1x_drm_context *context;
 + struct host1x_drm_file *fpriv = file_priv-driver_priv;
 + struct host1x_drm *host1x = drm-dev_private;
 + int err = -ENODEV;
 +
 + context = kzalloc(sizeof(*context), GFP_KERNEL);
 + if (!context)
 + return -ENOMEM;
 +
 + list_for_each_entry(client, host1x-clients, list)
 + if (client-class == args-class) {
 + context-client = client;
 
 Why do you assign this here? Should it perhaps be assigned only when the
 opening of the channel succeeds? The .open_channel() already receives a
 pointer to the client as parameter, so it doesn't have to be associated
 with the context.

True, we can move the assignment to happen after checking the
open_channel result.

 
 +static int tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
 +  struct drm_file *file_priv)
 +{
 + struct tegra_drm_get_syncpoint *args = data;
 + struct host1x_drm_file *fpriv = file_priv-driver_priv;
 + struct host1x_drm_context *context =
 + (struct host1x_drm_context *)(uintptr_t)args-context;
 +
 + if (!tegra_drm_get_context(fpriv-contexts, context))
 + return -ENODEV;
 +
 + if (context-client-num_syncpts  args-param)
 + return -ENODEV;
 
 I think this might be easier to read as:
 
 if (args-param = context-client-num_syncpts)
 return -ENODEV;

Ok, will do.

 
 + args-value = host1x_syncpt_id(context-client-syncpts[args-param]);
 
 This could use a temporary variable syncpt to make it easier to read.

Ok.

 
 +static int tegra_drm_gem_create_ioctl(struct drm_device *drm, void *data,
 +   struct drm_file *file_priv)
 
 tegra_drm_ioctl_gem_create()?

Sure.

 
 +{
 + struct tegra_drm_create *args = data;
 + struct drm_gem_cma_object *cma_obj;
 + struct tegra_drm_bo *cma_bo;
 
 These can probably just be named cma/gem and bo instead.

Ok.

 
 +static int tegra_drm_ioctl_get_offset(struct drm_device *drm, void *data,
 +   struct drm_file *file_priv)
 
 I think this might be more generically named tegra_drm_ioctl_mmap()
 which might come in handy if we ever need to do something more than just
 retrieve the offset.

Yeah, that changes the API semantics a bit, but in general there
shouldn't be a need to just get the offset without doing the actual mmap.

 
 +{
 + struct tegra_drm_get_offset *args = data;
 + struct drm_gem_cma_object *cma_obj;
 + struct drm_gem_object *obj;
 +
 + obj = drm_gem_object_lookup(drm, file_priv, args-handle);
 + if (!obj)
 + return -EINVAL;
 + cma_obj = to_drm_gem_cma_obj(obj);
 
 The above two lines should be separated by a blank line.

Ok.

 
 +
 + args-offset = cma_obj-base.map_list.hash.key  PAGE_SHIFT;
 
 Perhaps a better way would be to export the get_gem_mmap_offset() from
 drivers/gpu/drm/drm_gem_cma_helper.c and reuse that.

Ok, we'll add that as a patch to the series.

 
  static struct drm_ioctl_desc tegra_drm_ioctls[] = {
 + DRM_IOCTL_DEF_DRV(TEGRA_DRM_GEM_CREATE,
 + tegra_drm_gem_create_ioctl, DRM_UNLOCKED | DRM_AUTH),
 + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_READ,
 + tegra_drm_ioctl_syncpt_read, DRM_UNLOCKED),
 + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_INCR,
 + tegra_drm_ioctl_syncpt_incr, DRM_UNLOCKED),
 + DRM_IOCTL_DEF_DRV(TEGRA_DRM_SYNCPT_WAIT,
 + tegra_drm_ioctl_syncpt_wait, DRM_UNLOCKED),
 + DRM_IOCTL_DEF_DRV(TEGRA_DRM_OPEN_CHANNEL,
 + tegra_drm_ioctl_open_channel, DRM_UNLOCKED),
 + DRM_IOCTL_DEF_DRV(TEGRA_DRM_CLOSE_CHANNEL,
 + tegra_drm_ioctl_close_channel, DRM_UNLOCKED),
 + DRM_IOCTL_DEF_DRV(TEGRA_DRM_GET_SYNCPOINT,
 

Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

2013-03-11 Thread Terje Bergström
On 11.03.2013 09:18, Thierry Reding wrote:
> This sound a bit over-engineered at this point in time. DRM is currently
> the only user. Is anybody working on any non-DRM drivers that would use
> this?

Well, this contains beginning of that:

http://nv-tegra.nvidia.com/gitweb/?p=linux-2.6.git;a=blob;f=drivers/media/video/tegra_v4l2_camera.c;h=644d0be5380367aca4c826c49724c03aad08387c;hb=l4t/l4t-r16-r2

I don't want to give these guys any excuse not to port it over to host1x
code base. :-)

> Even that aside, I don't think host1x_mem_handle is a good choice of
> name here. The objects are much more than handles. They are in fact
> buffer objects, which can optionally be attached to a handle. I also
> think that using a void * to store the handle specific data isn't such a
> good idea.

Naming if not an issue for me - we can easily agree on using _bo.

> So how about the following proposal, which I think might satisfy both of
> us:
> 
>   struct host1x_bo;
> 
>   struct host1x_bo_ops {
>   struct host1x_bo *(*get)(struct host1x_bo *bo);
>   void (*put)(struct host1x_bo *bo);
>   dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt);
>   ...
>   };
> 
>   struct host1x_bo *host1x_bo_get(struct host1x_bo *bo);
>   void host1x_bo_put(struct host1x_bo *bo);
>   dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt);
>   ...
> 
>   struct host1x_bo {
>   const struct host1x_bo_ops *ops;
>   ...
>   };
> 
>   struct tegra_drm_bo {
>   struct host1x_bo base;
>   ...
>   };
> 
> That way you can get rid of the host1x_memmgr_create_handle() helper and
> instead embed host1x_bo into driver-/framework-specific structures with
> the necessary initialization.

This would make sense. We'll get back when we have enough of
implementation done to understand it all. One consequence is that we
cannot use drm_gem_cma_create() anymore. We'll have to introduce a
function that does the same as drm_gem_cma_create(), but it takes a
pre-allocated drm_gem_cma_object pointer. That way we can allocate the
struct, and use DRM CMA just to initialize the drm_gem_cma_object.

Other way would be just taking a copy of DRM CMA helper, but I'd like to
defer that to the next step when we implement IOMMU aware allocator.

> It also allows you to interact directly with the objects instead of
> having to go through the memmgr API. The memory manager doesn't really
> exist anymore so keeping the name in the API is only confusing. Your
> current proposal deals with memory handles directly already so it's
> really just making the naming more consistent.

The memmgr APIs are currently just a shortcut wrapper to the ops, so in
that sense the memmgr does not really exist. I think it might still make
sense to keep static inline wrappers for calling the ops within, but we
could rename them to host1x_bo_somethingandother. Then it'd follow the
pattern we are using for the hw ops in the latest set.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

2013-03-11 Thread Terje Bergström
On 08.03.2013 22:43, Thierry Reding wrote:
> A bo is just a buffer object, so I don't see why the name shouldn't
> be used. The name is in no way specific to DRM or GEM. But the point
> that I was trying to make was that there is nothing to suggest that
> we couldn't use drm_gem_object as the underlying scaffold to base all
> host1x buffer objects on.
> 
> Furthermore I don't understand why you've chosen this approach. It
> is completely different from what other drivers do and therefore
> makes it more difficult to comprehend. That alone I could live with
> if there were any advantages to that approach, but as far as I can
> tell there are none.

I was following the plan we agreed on earlier in email discussion with
you and Lucas:

On 29.11.2012 11:09, Lucas Stach wrote:
> We should aim for a clean split here. GEM handles are something which
> is really specific to how DRM works and as such should be constructed
> by tegradrm. nvhost should really just manage allocations/virtual
> address space and provide something that is able to back all the GEM
> handle operations.
> 
> nvhost has really no reason at all to even know about GEM handles.
> If you back a GEM object by a nvhost object you can just peel out
> the nvhost handles from the GEM wrapper in the tegradrm submit ioctl
> handler and queue the job to nvhost using it's native handles.
> 
> This way you would also be able to construct different handles (like
> GEM obj or V4L2 buffers) from the same backing nvhost object. Note
> that I'm not sure how useful this would be, but it seems like a
> reasonable design to me being able to do so.

With this structure, we are already prepared for non-DRM APIs. Tt's a
matter of familiarity of code versus future expansion. Code paths for
both are as simple/complex, so neither has a direct technical
superiority in performance.

I know other DRM drivers have opted to hard code GEM dependency
throughout the code. Then again, host1x hardware is managing much more
than graphics, so we need to think outside the DRM box, too.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

2013-03-11 Thread Terje Bergström
On 08.03.2013 22:43, Thierry Reding wrote:
 A bo is just a buffer object, so I don't see why the name shouldn't
 be used. The name is in no way specific to DRM or GEM. But the point
 that I was trying to make was that there is nothing to suggest that
 we couldn't use drm_gem_object as the underlying scaffold to base all
 host1x buffer objects on.
 
 Furthermore I don't understand why you've chosen this approach. It
 is completely different from what other drivers do and therefore
 makes it more difficult to comprehend. That alone I could live with
 if there were any advantages to that approach, but as far as I can
 tell there are none.

I was following the plan we agreed on earlier in email discussion with
you and Lucas:

On 29.11.2012 11:09, Lucas Stach wrote:
 We should aim for a clean split here. GEM handles are something which
 is really specific to how DRM works and as such should be constructed
 by tegradrm. nvhost should really just manage allocations/virtual
 address space and provide something that is able to back all the GEM
 handle operations.
 
 nvhost has really no reason at all to even know about GEM handles.
 If you back a GEM object by a nvhost object you can just peel out
 the nvhost handles from the GEM wrapper in the tegradrm submit ioctl
 handler and queue the job to nvhost using it's native handles.
 
 This way you would also be able to construct different handles (like
 GEM obj or V4L2 buffers) from the same backing nvhost object. Note
 that I'm not sure how useful this would be, but it seems like a
 reasonable design to me being able to do so.

With this structure, we are already prepared for non-DRM APIs. Tt's a
matter of familiarity of code versus future expansion. Code paths for
both are as simple/complex, so neither has a direct technical
superiority in performance.

I know other DRM drivers have opted to hard code GEM dependency
throughout the code. Then again, host1x hardware is managing much more
than graphics, so we need to think outside the DRM box, too.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

2013-03-11 Thread Terje Bergström
On 11.03.2013 09:18, Thierry Reding wrote:
 This sound a bit over-engineered at this point in time. DRM is currently
 the only user. Is anybody working on any non-DRM drivers that would use
 this?

Well, this contains beginning of that:

http://nv-tegra.nvidia.com/gitweb/?p=linux-2.6.git;a=blob;f=drivers/media/video/tegra_v4l2_camera.c;h=644d0be5380367aca4c826c49724c03aad08387c;hb=l4t/l4t-r16-r2

I don't want to give these guys any excuse not to port it over to host1x
code base. :-)

 Even that aside, I don't think host1x_mem_handle is a good choice of
 name here. The objects are much more than handles. They are in fact
 buffer objects, which can optionally be attached to a handle. I also
 think that using a void * to store the handle specific data isn't such a
 good idea.

Naming if not an issue for me - we can easily agree on using _bo.

 So how about the following proposal, which I think might satisfy both of
 us:
 
   struct host1x_bo;
 
   struct host1x_bo_ops {
   struct host1x_bo *(*get)(struct host1x_bo *bo);
   void (*put)(struct host1x_bo *bo);
   dma_addr_t (*pin)(struct host1x_bo *bo, struct sg_table **sgt);
   ...
   };
 
   struct host1x_bo *host1x_bo_get(struct host1x_bo *bo);
   void host1x_bo_put(struct host1x_bo *bo);
   dma_addr_t host1x_bo_pin(struct host1x_bo *bo, struct sg_table **sgt);
   ...
 
   struct host1x_bo {
   const struct host1x_bo_ops *ops;
   ...
   };
 
   struct tegra_drm_bo {
   struct host1x_bo base;
   ...
   };
 
 That way you can get rid of the host1x_memmgr_create_handle() helper and
 instead embed host1x_bo into driver-/framework-specific structures with
 the necessary initialization.

This would make sense. We'll get back when we have enough of
implementation done to understand it all. One consequence is that we
cannot use drm_gem_cma_create() anymore. We'll have to introduce a
function that does the same as drm_gem_cma_create(), but it takes a
pre-allocated drm_gem_cma_object pointer. That way we can allocate the
struct, and use DRM CMA just to initialize the drm_gem_cma_object.

Other way would be just taking a copy of DRM CMA helper, but I'd like to
defer that to the next step when we implement IOMMU aware allocator.

 It also allows you to interact directly with the objects instead of
 having to go through the memmgr API. The memory manager doesn't really
 exist anymore so keeping the name in the API is only confusing. Your
 current proposal deals with memory handles directly already so it's
 really just making the naming more consistent.

The memmgr APIs are currently just a shortcut wrapper to the ops, so in
that sense the memmgr does not really exist. I think it might still make
sense to keep static inline wrappers for calling the ops within, but we
could rename them to host1x_bo_somethingandother. Then it'd follow the
pattern we are using for the hw ops in the latest set.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

2013-03-08 Thread Terje Bergström
On 26.02.2013 11:48, Terje Bergström wrote:
> On 25.02.2013 17:24, Thierry Reding wrote:
>> You use two different styles to indent the function parameters. You
>> might want to stick to one, preferably aligning them with the first
>> parameter on the first line.
> 
> I've generally favored "two tabs" indenting, but we'll anyway
> standardize on one.

We standardized on the convention used in tegradrm, i.e. aligning with
first parameter.

>> There's nothing in this function that requires a platform_device, so
>> passing struct device should be enough. Or maybe host1x_cdma should get
>> a struct device * field?
> 
> I think we'll just start using struct device * in general in code.
> Arto's been already fixing a lot of these, so he might've already fixed
> this.

We did a sweep in the code and now I hope everything that can, uses
struct device *. The side effect was getting rid of a lot of casting,
which is good.

>> Why don't you use any of the kernel's reference counting mechanisms?
>>
>>> +void host1x_channel_put(struct host1x_channel *ch)
>>> +{
>>> + mutex_lock(>reflock);
>>> + if (ch->refcount == 1) {
>>> + host1x_get_host(ch->dev)->cdma_op.stop(>cdma);
>>> + host1x_cdma_deinit(>cdma);
>>> + }
>>> + ch->refcount--;
>>> + mutex_unlock(>reflock);
>>> +}
>>
>> I think you can do all of this using a kref.
> 
> I think the original reason was that there's no reason to use atomic
> kref, as we anyway have to do mutual exclusion via mutex. But, using
> kref won't be any problem, so we could use that.

Actually, we ended up with a problem with this. kref assumes that once
refcount goes to zero, it gets destroyed. In ch->refcount, going to zero
is just fine and just indicates that we need to initialize. And, we
anyway need to do locking, so we didn't do the conversion to kref.

>>> +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
>>> +{
(...)
>>> +}
>>
>> I think the critical section could be shorter here. It's probably not
>> worth the extra trouble, though, given that channels are not often
>> allocated.
> 
> Yeah, boot time isn't measured in microseconds. :-) But, if we just make
> allocated_channels an atomic, we should be able to drop chlist_mutex
> altogether and it could simplify the code.

There wasn't much we could have moved outside the critical section, so
we didn't touch this area.

>> Also, is it really necessary to abstract these into an ops structure? I
>> get that newer hardware revisions might require different ops for sync-
>> point handling because the register layout or number of syncpoints may
>> be different, but the CDMA and push buffer (below) concepts are pretty
>> much a software abstraction, and as such its implementation is unlikely
>> to change with some future hardware revision.
> 
> Pushbuffer ops can become generic. There's only one catch - init uses
> the restart opcode. But the opcode is not going to change, so we can
> generalize that.

We ended up keeping the init as an operation, but rest of push buffer
ops became generic.

>>
>>> +/*
>>> + * Push two words to the push buffer
>>> + * Caller must ensure push buffer is not full
>>> + */
>>> +static void push_buffer_push_to(struct push_buffer *pb,
>>> + struct mem_handle *handle,
>>> + u32 op1, u32 op2)
>>> +{
>>> + u32 cur = pb->cur;
>>> + u32 *p = (u32 *)((u32)pb->mapped + cur);
>>
>> You do all this extra casting to make sure to increment by bytes and not
>> 32-bit words. How about you change pb->cur to contain the word index, so
>> that you don't have to go through hoops each time around.

When we changed DMASTART and DMAEND to actually denote the push buffer
area, we noticed that DMAGET and DMAPUT are actually relative to
DMASTART and DMAEND. This and the need to access both CPU and device
virtual addresses coupled with changing to word indexes didn't actually
simplify the code, so we kept still using byte indexes.

>>
>>> +/*
>>> + * Return the number of two word slots free in the push buffer
>>> + */
>>> +static u32 push_buffer_space(struct push_buffer *pb)
>>> +{
>>> + return ((pb->fence - pb->cur) & (PUSH_BUFFER_SIZE - 1)) / 8;
>>> +}
>>
>> Why & (PUSH_BUFFER_SIZE - 1) here? fence - cur can never be larger than
>> PUSH_BUFFER_SIZE, can it?
> 
> You're right, this function doesn't need to worry about wrapping.

Arto noticed this, but actually I was wro

Re: [PATCHv5,RESEND 3/8] gpu: host1x: Add channel support

2013-03-08 Thread Terje Bergström
On 26.02.2013 11:48, Terje Bergström wrote:
 On 25.02.2013 17:24, Thierry Reding wrote:
 You use two different styles to indent the function parameters. You
 might want to stick to one, preferably aligning them with the first
 parameter on the first line.
 
 I've generally favored two tabs indenting, but we'll anyway
 standardize on one.

We standardized on the convention used in tegradrm, i.e. aligning with
first parameter.

 There's nothing in this function that requires a platform_device, so
 passing struct device should be enough. Or maybe host1x_cdma should get
 a struct device * field?
 
 I think we'll just start using struct device * in general in code.
 Arto's been already fixing a lot of these, so he might've already fixed
 this.

We did a sweep in the code and now I hope everything that can, uses
struct device *. The side effect was getting rid of a lot of casting,
which is good.

 Why don't you use any of the kernel's reference counting mechanisms?

 +void host1x_channel_put(struct host1x_channel *ch)
 +{
 + mutex_lock(ch-reflock);
 + if (ch-refcount == 1) {
 + host1x_get_host(ch-dev)-cdma_op.stop(ch-cdma);
 + host1x_cdma_deinit(ch-cdma);
 + }
 + ch-refcount--;
 + mutex_unlock(ch-reflock);
 +}

 I think you can do all of this using a kref.
 
 I think the original reason was that there's no reason to use atomic
 kref, as we anyway have to do mutual exclusion via mutex. But, using
 kref won't be any problem, so we could use that.

Actually, we ended up with a problem with this. kref assumes that once
refcount goes to zero, it gets destroyed. In ch-refcount, going to zero
is just fine and just indicates that we need to initialize. And, we
anyway need to do locking, so we didn't do the conversion to kref.

 +struct host1x_channel *host1x_channel_alloc(struct platform_device *pdev)
 +{
(...)
 +}

 I think the critical section could be shorter here. It's probably not
 worth the extra trouble, though, given that channels are not often
 allocated.
 
 Yeah, boot time isn't measured in microseconds. :-) But, if we just make
 allocated_channels an atomic, we should be able to drop chlist_mutex
 altogether and it could simplify the code.

There wasn't much we could have moved outside the critical section, so
we didn't touch this area.

 Also, is it really necessary to abstract these into an ops structure? I
 get that newer hardware revisions might require different ops for sync-
 point handling because the register layout or number of syncpoints may
 be different, but the CDMA and push buffer (below) concepts are pretty
 much a software abstraction, and as such its implementation is unlikely
 to change with some future hardware revision.
 
 Pushbuffer ops can become generic. There's only one catch - init uses
 the restart opcode. But the opcode is not going to change, so we can
 generalize that.

We ended up keeping the init as an operation, but rest of push buffer
ops became generic.


 +/*
 + * Push two words to the push buffer
 + * Caller must ensure push buffer is not full
 + */
 +static void push_buffer_push_to(struct push_buffer *pb,
 + struct mem_handle *handle,
 + u32 op1, u32 op2)
 +{
 + u32 cur = pb-cur;
 + u32 *p = (u32 *)((u32)pb-mapped + cur);

 You do all this extra casting to make sure to increment by bytes and not
 32-bit words. How about you change pb-cur to contain the word index, so
 that you don't have to go through hoops each time around.

When we changed DMASTART and DMAEND to actually denote the push buffer
area, we noticed that DMAGET and DMAPUT are actually relative to
DMASTART and DMAEND. This and the need to access both CPU and device
virtual addresses coupled with changing to word indexes didn't actually
simplify the code, so we kept still using byte indexes.


 +/*
 + * Return the number of two word slots free in the push buffer
 + */
 +static u32 push_buffer_space(struct push_buffer *pb)
 +{
 + return ((pb-fence - pb-cur)  (PUSH_BUFFER_SIZE - 1)) / 8;
 +}

 Why  (PUSH_BUFFER_SIZE - 1) here? fence - cur can never be larger than
 PUSH_BUFFER_SIZE, can it?
 
 You're right, this function doesn't need to worry about wrapping.

Arto noticed this, but actually I was wrong - the wrapping is very
possible. We just have to remember that if we're processing something at
the end of push buffer, cur might be in the end, and fence in the beginning.

 diff --git a/drivers/gpu/host1x/memmgr.h b/drivers/gpu/host1x/memmgr.h
 [...]
 +struct mem_handle;
 +struct platform_device;
 +
 +struct host1x_job_unpin_data {
 + struct mem_handle *h;
 + struct sg_table *mem;
 +};
 +
 +enum mem_mgr_flag {
 + mem_mgr_flag_uncacheable = 0,
 + mem_mgr_flag_write_combine = 1,
 +};

 I'd like to see this use a more object-oriented approach and more common
 terminology. All of these handles are essentially buffer objects, so
 maybe something like host1x_bo would be a nice and short name.

We did this a bit differently

Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-11 Thread Terje Bergström
On 10.02.2013 22:44, Thierry Reding wrote:
> On Sun, Feb 10, 2013 at 04:42:53PM -0800, Terje Bergström wrote:
>> You're right about performance. We already saw quite a bad performance
>> hit with the current firewall, so we'll need to worry about performance
>> later.
> 
> I guess the additional overhead of looking up in a table vs. an actual
> function being run will be rather small compared to the total overhead
> incurred by having the firewall in the first place.

Yeah, I'll just implement a simple linear table lookup and let's see
what happens. I'll optimize with bitfield if needed.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-11 Thread Terje Bergström
On 10.02.2013 22:44, Thierry Reding wrote:
 On Sun, Feb 10, 2013 at 04:42:53PM -0800, Terje Bergström wrote:
 You're right about performance. We already saw quite a bad performance
 hit with the current firewall, so we'll need to worry about performance
 later.
 
 I guess the additional overhead of looking up in a table vs. an actual
 function being run will be rather small compared to the total overhead
 incurred by having the firewall in the first place.

Yeah, I'll just implement a simple linear table lookup and let's see
what happens. I'll optimize with bitfield if needed.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-10 Thread Terje Bergström
On 07.02.2013 23:07, Thierry Reding wrote:
> On Wed, Feb 06, 2013 at 01:23:17PM -0800, Terje Bergström wrote:
>>>> That's the security firewall. It walks through each submit, and ensures
>>>> that each register write that writes an address, goes through the host1x
>>>> reloc mechanism. This way user space cannot ask 2D to write to arbitrary
>>>> memory locations.
>>> I see. Can this be made more generic? Perhaps adding a table of valid
>>> registers to the device and use a generic function to iterate over that
>>> instead of having to provide the same function for each client.
>> For which one does gcc generate more efficient code? I've thought a
>> switch-case statement might get compiled into something more efficient
>> than a table lookup.
>> But the rest of the code is generic - just the one function which
>> compares against known address registers is specific to 2D.
> Table lookup should be pretty fast. I wouldn't worry too much about
> performance at this stage, though. Readability is more important in my
> opinion. A lookup table is a lot more readable and reusable I think. If
> it turns out that using a function is actually faster we can always
> optimize later.

You're right about performance. We already saw quite a bad performance
hit with the current firewall, so we'll need to worry about performance
later.

I'll take a look at converting the register list to a table. Instead of
always doing a linear search of a table, a bitfield might be more
appropriate.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-10 Thread Terje Bergström
On 07.02.2013 23:07, Thierry Reding wrote:
 On Wed, Feb 06, 2013 at 01:23:17PM -0800, Terje Bergström wrote:
 That's the security firewall. It walks through each submit, and ensures
 that each register write that writes an address, goes through the host1x
 reloc mechanism. This way user space cannot ask 2D to write to arbitrary
 memory locations.
 I see. Can this be made more generic? Perhaps adding a table of valid
 registers to the device and use a generic function to iterate over that
 instead of having to provide the same function for each client.
 For which one does gcc generate more efficient code? I've thought a
 switch-case statement might get compiled into something more efficient
 than a table lookup.
 But the rest of the code is generic - just the one function which
 compares against known address registers is specific to 2D.
 Table lookup should be pretty fast. I wouldn't worry too much about
 performance at this stage, though. Readability is more important in my
 opinion. A lookup table is a lot more readable and reusable I think. If
 it turns out that using a function is actually faster we can always
 optimize later.

You're right about performance. We already saw quite a bad performance
hit with the current firewall, so we'll need to worry about performance
later.

I'll take a look at converting the register list to a table. Instead of
always doing a linear search of a table, a bitfield might be more
appropriate.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-06 Thread Terje Bergström
On 05.02.2013 01:54, Thierry Reding wrote:
> On Mon, Feb 04, 2013 at 09:17:45PM -0800, Terje Bergström wrote:
>> Yeah, it's actually working around the host1x duplicate naming.
>> host1x_syncpt_get takes struct host1x as parameter, but that's different
>> host1x than in this code.
> 
> So maybe a better way would be to rename the DRM host1x after all. If it
> avoids the need for workarounds such as this I think it justifies the
> additional churn.

Ok, I'll include that. Do you have a preference for the name? Something
like "host1x_drm" might work?

>>> Also, how useful is it to create a context? Looking at the gr2d
>>> implementation for .open_channel(), it will return the same channel to
>>> whichever userspace process requests them. Can you explain why it is
>>> necessary at all? From the name I would have expected some kind of
>>> context switching to take place when different applications submit
>>> requests to the same client, but that doesn't seem to be the case.
>>
>> Hardware context switching will be a later submit, and it'll actually
>> create a new structure. Hardware context might live longer than the
>> process that created it, so they'll need to be separate.
> 
> Why would it live longer than the process? Isn't the whole purpose of
> the context to keep per-process state? What use is that state if the
> process dies?

Hardware context has to be kept alive for as long as there's a job
running from that process. If an app sends 10 jobs to 2D channel, and
dies immediately, there's no sane way for host1x to remove the jobs from
queue. The jobs will keep on running and kernel will need to track them.

>> Perhaps the justification is that this way we can keep the kernel API
>> stable even when we add support for hardware contexts and other clients.
> 
> We don't need a stable kernel API. But I guess it is fine to keep it if
> for no other reason to fill the context returned in the ioctl() with
> meaningful data.

Sorry, I meant stable IOCTL API, so we agree on this.

>> host1x_drm_file sounds a bit odd, because it's not really a file, but a
>> private data pointer stored in driver_priv.
> 
> The same is true for struct drm_file, which is stored in struct file's
> .private_data field. I find it to be very intuitive if the inheritance
> is reflected in the structure name. struct host1x_drm_file is host1x'
> driver-specific part of struct drm_file.

Ok, makes sense. I'll do that.

> I fail to see how dma_buf would require a separate mem_mgr_type. Can we
> perhaps postpone this to a later point and just go with CMA as the only
> alternative for now until we have an actual working implementation that
> we can use this for?

Each submit refers to a number of buffers. Some of them are the streams,
some are textures or other input/output buffers. Each of these buffers
might be passed as a GEM handle, or (when implemented) as a dma_buf fd.
Thus we need a field to tell host1x which API to call to handle that handle.

I think we can leave out the code for managing the type until we
actually have separate memory managers. That'd make GEM handles
effectively of type 0, as we don't set it.

> 
>>>> +static int gr2d_submit(struct host1x_drm_context *context,
>>>> + struct tegra_drm_submit_args *args,
>>>> + struct drm_device *drm,
>>>> + struct drm_file *file_priv)
>>>> +{
>>>> + struct host1x_job *job;
>>>> + int num_cmdbufs = args->num_cmdbufs;
>>>> + int num_relocs = args->num_relocs;
>>>> + int num_waitchks = args->num_waitchks;
>>>> + struct tegra_drm_cmdbuf __user *cmdbufs =
>>>> + (void * __user)(uintptr_t)args->cmdbufs;
>>>> + struct tegra_drm_reloc __user *relocs =
>>>> + (void * __user)(uintptr_t)args->relocs;
>>>> + struct tegra_drm_waitchk __user *waitchks =
>>>> + (void * __user)(uintptr_t)args->waitchks;
>>>
>>> No need for all the uintptr_t casts.
>>
>> Will try to remove - but I do remember getting compiler warnings without
>> them.
> 
> I think you shouldn't even have to cast to void * first. Just cast to
> the target type directly. I don't see why the compiler should complain.

This is what I get without them:

drivers/gpu/host1x/drm/gr2d.c:108:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-cast]
drivers/gpu/host1x/drm/gr2d.c:110:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-cast]
drivers/gpu/host1x/drm/gr2d.c:112:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-c

The pro

Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-06 Thread Terje Bergström
On 05.02.2013 01:15, Thierry Reding wrote:
> On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
>> This is used by debugfs code to direct to debugfs, and
>> nvhost_debug_dump() to send via printk.
> 
> Yes, that was precisely my point. Why bother providing the same data via
> several output methods. debugfs is good for showing large amounts of
> data such as register dumps or a tabular representation of syncpoints
> for instance.
> 
> If, however, you want to interactively show debug information using
> printk the same format isn't very useful and something more reduced is
> often better.

debugfs is there to be able to get a reliable dump of host1x state
(f.ex. no lines intermixed with other output).

printk output is there because often we get just UART logs from failure
cases, and having as much information as possible in the logs speeds up
debugging.

Both of them need to output the values of sync points, and the channel
state. Dumping all of that consists of a lot of code, and I wouldn't
want to duplicate that for two output formats.

>> I have another suggestion. In downstream I removed the decoding part and
>> I just print out a string of hex. That removes quite a bit bunch of code
>> from kernel. It makes the debug output also more compact.
> I don't know. I think if you use in-kernel debugging facilities such as
> debugfs or printk, then the output should be self-explanatory. However I
> do see the usefulness of having a binary dump that can be decoded in
> userspace. But I think if we want to go that way we should make that a
> separate interface. USB provides something like that, which can then be
> fed to libpcap or wireshark to capture and analyze USB traffic. If done
> properly you get replay functionality for free. I don't know what infra-
> structure exists to help with implementing something similar.

It's not actually binary. I think I misrepresented the suggestion.

I'm suggesting that we'd display only the contents of command FIFO and
contents of gathers (i.e. all opcodes) in hex format, not decoded. All
other text would remain as is, so syncpt values, etc would be readable
by a glance.

The user space tool can then take the streams and decode them if needed.

We've noticed that the decoded opcodes format can be very long and
sometimes takes a minute to dump out via a slow console. The hex output
is much more compact and faster to dump.

Actual tracing or wireshark kind of capability would come via decoding
the ftrace log. When enabled, everything that is written to the channel,
is also written to ftrace.

>>>> +static void show_channel_word(struct output *o, int *state, int *count,
>>>> +  u32 addr, u32 val, struct host1x_cdma *cdma)
>>>> +{
>>>> +  static int start_count, dont_print;
>>>
>>> What if two processes read debug information at the same time?
>>
>> show_channels() acquires cdma.lock, so that shouldn't happen.
> 
> Okay. Another solution would be to pass around a debug context which
> keeps track of the variables. But if we opt for a more involved dump
> interface as discussed above this will no longer be relevant.

Actually, debugging process needs cdma.lock, because it goes through the
cdma queue. Also command FIFO dumping is something that must be done by
a single thread at a time.

> Okay. In the context of a channel dump interface this may not be
> relevant anymore. Can you think of any issue that wouldn't be detectable
> or debuggable by analyzing a binary dump of the data within a channel?
> I'm asking because at that point we wouldn't be able to access any of
> the in-kernel data structures but would have to rely on the data itself
> for diagnostics. IOMMU virtual addresses won't be available and so on.

In many cases, looking at syncpt values, and channel state
(active/waiting on a syncpt, etc) gives an indication on what is the
current state of hardware. But, very often problems are ripple effects
on something that happened earlier and the job that caused the problem
has already been freed and is not visible in the dump.

To get a full history, we need often need the ftrace log.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts

2013-02-06 Thread Terje Bergström
On 06.02.2013 12:38, Thierry Reding wrote:
> On Wed, Feb 06, 2013 at 12:29:26PM -0800, Terje Bergström wrote:
>> This was done purely, because I'm hiding the struct size from the
>> caller. If the caller needs to allocate, I need to expose the struct in
>> a header, not just a forward declaration.
> 
> I don't think we need to hide the struct from the caller. This is all
> host1x internal. Even if a host1x client uses the struct it makes little
> sense to hide it. They are all part of the same code base so there's not
> much to be gained by hiding the structure definition.

I agree, and will change.

>> Ok, I'll add the wrapper, and I'll check if passing struct host1x *
>> would make sense. In effect that'd render struct host1x_intr mostly
>> unused, so how about if we just merge the contents of host1x_intr to host1x?
> 
> We can probably do that. It might make some sense to keep it in order to
> scope the related fields but struct host1x isn't very large yet, so I
> think omitting host1x_intr should be fine.

Yes, it's not very large, and it'd remove a lot of casting between
host1x and host1x_intr, so I'll just do that.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts

2013-02-06 Thread Terje Bergström
On 05.02.2013 00:42, Thierry Reding wrote:
> On Mon, Feb 04, 2013 at 08:29:08PM -0800, Terje Bergström wrote:
>> host1x_get_host() actually needs that, so this #include should've also
>> been in previous patch.
> 
> No need to if you pass struct device * instead. You might need
> linux/device.h instead, though.

Can do.

> Another variant would be host1x_syncpt_irq() for the top-level handler
> and something host1x_handle_syncpt() to handle individual syncpoints. I
> like this one best, but this is pure bike-shedding and there's nothing
> technically wrong with the names you chose, so I can't really object if
> you want to stick to them.

I could use these names. They sound logical to me,too.

> 
>>>> + queue_work(intr->wq, >work);
>>>
>>> Should the call to queue_work() perhaps be moved into
>>> host1x_intr_syncpt_thresh_isr().
>>
>> I'm not sure, either way would be ok to me. The current structure allows
>> host1x_intr_syncpt_thresh_isr() to only take one parameter
>> (host1x_intr_syncpt). If we move queue_work, we'd also need to pass
>> host1x_intr.
> 
> I think I'd still prefer to have all the code in one function because it
> make subsequent modification easier and less error-prone.

Ok, I'll do that change.

> Yeah, in that case I think we should bail out. It's not like we're
> expecting any failures. If the interrupt cannot be requested, something
> must seriously be wrong and we should tell users about it so that it can
> be fixed. Trying to continue on a best effort basis isn't useful here, I
> think.

Yep, I agree.

>> In patch 3, at submit time we first allocate waiter, then take
>> submit_lock, write submit to channel, and add the waiter while having
>> the lock. I did this so that I host1x_intr_add_action() can always
>> succeed. Otherwise I'd need to write another code path to handle the
>> case where we wrote a job to channel, but we're not able to add a
>> submit_complete action to it.
> 
> Okay. In that case why not allocate it on the stack in the first place
> so you don't have to bother with allocations (and potential failure) at
> all? The variable doesn't leave the function scope, so there shouldn't
> be any issues, right?

The submit code in patch 3 allocates a waiter, and the waiter outlives
the function scope. That waiter will clean up job queue once a job is
complete.

> Or if that doesn't work it would still be preferable to allocate memory
> in host1x_syncpt_wait() directly instead of going through the wrapper.

This was done purely, because I'm hiding the struct size from the
caller. If the caller needs to allocate, I need to expose the struct in
a header, not just a forward declaration.

>>>> +int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)
>>>
>>> Maybe you should keep the type of the irq_sync here so that it properly
>>> propagates to the call to devm_request_irq().
>>
>> I'm not sure what you mean. Do you mean that I should use unsigned int,
>> as that's the type used in devm_request_irq()?
> 
> Yes.

Ok, will do.

>>>> +void host1x_intr_stop(struct host1x_intr *intr)
>>>> +{
>>>> + unsigned int id;
>>>> + struct host1x *host1x = intr_to_host1x(intr);
>>>> + struct host1x_intr_syncpt *syncpt;
>>>> + u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
>>>> +
>>>> + mutex_lock(>mutex);
>>>> +
>>>> + host1x->intr_op.disable_all_syncpt_intrs(intr);
>>>
>>> I haven't commented on this everywhere, but I think this could benefit
>>> from a wrapper that forwards this to the intr_op. The same goes for the
>>> sync_op.
>>
>> You mean something like "host1x_disable_all_syncpt_intrs"?
> 
> Yes. I think that'd be useful for each of the op functions. Perhaps you
> could even pass in a struct host1x * to make calls more uniform.

Ok, I'll add the wrapper, and I'll check if passing struct host1x *
would make sense. In effect that'd render struct host1x_intr mostly
unused, so how about if we just merge the contents of host1x_intr to host1x?

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver

2013-02-06 Thread Terje Bergström
On 04.02.2013 23:43, Thierry Reding wrote:
> My point was that you could include the call to host1x_syncpt_reset()
> within host1x_syncpt_init(). That will keep unneeded code out of the
> host1x_probe() function. Also you don't want to use the syncpoints
> uninitialized, right?

Of course, sorry, I misunderstood. That makes a lot of sense.

 + */
 +static u32 syncpt_load_min(struct host1x_syncpt *sp)
 +{
 + struct host1x *dev = sp->dev;
 + u32 old, live;
 +
 + do {
 + old = host1x_syncpt_read_min(sp);
 + live = host1x_sync_readl(dev,
 + HOST1X_SYNC_SYNCPT_0 + sp->id * 4);
 + } while ((u32)atomic_cmpxchg(>min_val, old, live) != old);
>>>
>>> I think this warrants a comment.
>>
>> Sure. It just loops in case there's a race writing to min_val.
> 
> Oh, I see. That'd make a good comment. Is the cast to (u32) really
> necessary?

I'll add a comment. atomic_cmpxchg returns a signed value, so I think
the cast is needed.

> Save/restore has the disadvantage of the direction not being implicit.
> Save could mean save to hardware or save to software. The same is true
> for restore. However if the direction is clearly defined, save and
> restore work for me.
> 
> Maybe the comment could be changed to be more explicit. Something like:
> 
>   /*
>* Write cached syncpoint and waitbase values to hardware.
>*/
> 
> And for host1x_syncpt_save():
> 
>   /*
>* For client-managed registers, update the cached syncpoint and
>* waitbase values by reading from the registers.
>*/

I was using save in the same way as f.ex. i915 (i915_suspend.c): save
state of hardware to RAM, restore state from RAM. I'll add proper
comments, but save and restore are for all syncpts, not only client managed.

> 
 +/*
 + * Updates the last value read from hardware.
 + */
 +u32 host1x_syncpt_load_min(struct host1x_syncpt *sp)
 +{
 + u32 val;
 + val = sp->dev->syncpt_op.load_min(sp);
 + trace_host1x_syncpt_load_min(sp->id, val);
 +
 + return val;
 +}
> Maybe the function should be called host1x_syncpt_load() if there is no
> equivalent way to load the maximum value (since there is no register to
> read from).

Sounds good. Maximum is just a software concept.

> That's certainly true for interrupts. However, if you look at the DMA
> subsystem for example, you can also request an unnamed resource.
> 
> The difference is sufficiently subtle that host1x_syncpt_allocate()
> would work for me too, though. I just have a slight preference for
> host1x_syncpt_request().

I don't really have a strong preference, so I'll follow your suggestion.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver

2013-02-06 Thread Terje Bergström
On 04.02.2013 23:43, Thierry Reding wrote:
 My point was that you could include the call to host1x_syncpt_reset()
 within host1x_syncpt_init(). That will keep unneeded code out of the
 host1x_probe() function. Also you don't want to use the syncpoints
 uninitialized, right?

Of course, sorry, I misunderstood. That makes a lot of sense.

 + */
 +static u32 syncpt_load_min(struct host1x_syncpt *sp)
 +{
 + struct host1x *dev = sp-dev;
 + u32 old, live;
 +
 + do {
 + old = host1x_syncpt_read_min(sp);
 + live = host1x_sync_readl(dev,
 + HOST1X_SYNC_SYNCPT_0 + sp-id * 4);
 + } while ((u32)atomic_cmpxchg(sp-min_val, old, live) != old);

 I think this warrants a comment.

 Sure. It just loops in case there's a race writing to min_val.
 
 Oh, I see. That'd make a good comment. Is the cast to (u32) really
 necessary?

I'll add a comment. atomic_cmpxchg returns a signed value, so I think
the cast is needed.

 Save/restore has the disadvantage of the direction not being implicit.
 Save could mean save to hardware or save to software. The same is true
 for restore. However if the direction is clearly defined, save and
 restore work for me.
 
 Maybe the comment could be changed to be more explicit. Something like:
 
   /*
* Write cached syncpoint and waitbase values to hardware.
*/
 
 And for host1x_syncpt_save():
 
   /*
* For client-managed registers, update the cached syncpoint and
* waitbase values by reading from the registers.
*/

I was using save in the same way as f.ex. i915 (i915_suspend.c): save
state of hardware to RAM, restore state from RAM. I'll add proper
comments, but save and restore are for all syncpts, not only client managed.

 
 +/*
 + * Updates the last value read from hardware.
 + */
 +u32 host1x_syncpt_load_min(struct host1x_syncpt *sp)
 +{
 + u32 val;
 + val = sp-dev-syncpt_op.load_min(sp);
 + trace_host1x_syncpt_load_min(sp-id, val);
 +
 + return val;
 +}
 Maybe the function should be called host1x_syncpt_load() if there is no
 equivalent way to load the maximum value (since there is no register to
 read from).

Sounds good. Maximum is just a software concept.

 That's certainly true for interrupts. However, if you look at the DMA
 subsystem for example, you can also request an unnamed resource.
 
 The difference is sufficiently subtle that host1x_syncpt_allocate()
 would work for me too, though. I just have a slight preference for
 host1x_syncpt_request().

I don't really have a strong preference, so I'll follow your suggestion.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts

2013-02-06 Thread Terje Bergström
On 05.02.2013 00:42, Thierry Reding wrote:
 On Mon, Feb 04, 2013 at 08:29:08PM -0800, Terje Bergström wrote:
 host1x_get_host() actually needs that, so this #include should've also
 been in previous patch.
 
 No need to if you pass struct device * instead. You might need
 linux/device.h instead, though.

Can do.

 Another variant would be host1x_syncpt_irq() for the top-level handler
 and something host1x_handle_syncpt() to handle individual syncpoints. I
 like this one best, but this is pure bike-shedding and there's nothing
 technically wrong with the names you chose, so I can't really object if
 you want to stick to them.

I could use these names. They sound logical to me,too.

 
 + queue_work(intr-wq, sp-work);

 Should the call to queue_work() perhaps be moved into
 host1x_intr_syncpt_thresh_isr().

 I'm not sure, either way would be ok to me. The current structure allows
 host1x_intr_syncpt_thresh_isr() to only take one parameter
 (host1x_intr_syncpt). If we move queue_work, we'd also need to pass
 host1x_intr.
 
 I think I'd still prefer to have all the code in one function because it
 make subsequent modification easier and less error-prone.

Ok, I'll do that change.

 Yeah, in that case I think we should bail out. It's not like we're
 expecting any failures. If the interrupt cannot be requested, something
 must seriously be wrong and we should tell users about it so that it can
 be fixed. Trying to continue on a best effort basis isn't useful here, I
 think.

Yep, I agree.

 In patch 3, at submit time we first allocate waiter, then take
 submit_lock, write submit to channel, and add the waiter while having
 the lock. I did this so that I host1x_intr_add_action() can always
 succeed. Otherwise I'd need to write another code path to handle the
 case where we wrote a job to channel, but we're not able to add a
 submit_complete action to it.
 
 Okay. In that case why not allocate it on the stack in the first place
 so you don't have to bother with allocations (and potential failure) at
 all? The variable doesn't leave the function scope, so there shouldn't
 be any issues, right?

The submit code in patch 3 allocates a waiter, and the waiter outlives
the function scope. That waiter will clean up job queue once a job is
complete.

 Or if that doesn't work it would still be preferable to allocate memory
 in host1x_syncpt_wait() directly instead of going through the wrapper.

This was done purely, because I'm hiding the struct size from the
caller. If the caller needs to allocate, I need to expose the struct in
a header, not just a forward declaration.

 +int host1x_intr_init(struct host1x_intr *intr, u32 irq_sync)

 Maybe you should keep the type of the irq_sync here so that it properly
 propagates to the call to devm_request_irq().

 I'm not sure what you mean. Do you mean that I should use unsigned int,
 as that's the type used in devm_request_irq()?
 
 Yes.

Ok, will do.

 +void host1x_intr_stop(struct host1x_intr *intr)
 +{
 + unsigned int id;
 + struct host1x *host1x = intr_to_host1x(intr);
 + struct host1x_intr_syncpt *syncpt;
 + u32 nb_pts = host1x_syncpt_nb_pts(intr_to_host1x(intr));
 +
 + mutex_lock(intr-mutex);
 +
 + host1x-intr_op.disable_all_syncpt_intrs(intr);

 I haven't commented on this everywhere, but I think this could benefit
 from a wrapper that forwards this to the intr_op. The same goes for the
 sync_op.

 You mean something like host1x_disable_all_syncpt_intrs?
 
 Yes. I think that'd be useful for each of the op functions. Perhaps you
 could even pass in a struct host1x * to make calls more uniform.

Ok, I'll add the wrapper, and I'll check if passing struct host1x *
would make sense. In effect that'd render struct host1x_intr mostly
unused, so how about if we just merge the contents of host1x_intr to host1x?

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts

2013-02-06 Thread Terje Bergström
On 06.02.2013 12:38, Thierry Reding wrote:
 On Wed, Feb 06, 2013 at 12:29:26PM -0800, Terje Bergström wrote:
 This was done purely, because I'm hiding the struct size from the
 caller. If the caller needs to allocate, I need to expose the struct in
 a header, not just a forward declaration.
 
 I don't think we need to hide the struct from the caller. This is all
 host1x internal. Even if a host1x client uses the struct it makes little
 sense to hide it. They are all part of the same code base so there's not
 much to be gained by hiding the structure definition.

I agree, and will change.

 Ok, I'll add the wrapper, and I'll check if passing struct host1x *
 would make sense. In effect that'd render struct host1x_intr mostly
 unused, so how about if we just merge the contents of host1x_intr to host1x?
 
 We can probably do that. It might make some sense to keep it in order to
 scope the related fields but struct host1x isn't very large yet, so I
 think omitting host1x_intr should be fine.

Yes, it's not very large, and it'd remove a lot of casting between
host1x and host1x_intr, so I'll just do that.

Terje

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-06 Thread Terje Bergström
On 05.02.2013 01:15, Thierry Reding wrote:
 On Mon, Feb 04, 2013 at 08:41:25PM -0800, Terje Bergström wrote:
 This is used by debugfs code to direct to debugfs, and
 nvhost_debug_dump() to send via printk.
 
 Yes, that was precisely my point. Why bother providing the same data via
 several output methods. debugfs is good for showing large amounts of
 data such as register dumps or a tabular representation of syncpoints
 for instance.
 
 If, however, you want to interactively show debug information using
 printk the same format isn't very useful and something more reduced is
 often better.

debugfs is there to be able to get a reliable dump of host1x state
(f.ex. no lines intermixed with other output).

printk output is there because often we get just UART logs from failure
cases, and having as much information as possible in the logs speeds up
debugging.

Both of them need to output the values of sync points, and the channel
state. Dumping all of that consists of a lot of code, and I wouldn't
want to duplicate that for two output formats.

 I have another suggestion. In downstream I removed the decoding part and
 I just print out a string of hex. That removes quite a bit bunch of code
 from kernel. It makes the debug output also more compact.
 I don't know. I think if you use in-kernel debugging facilities such as
 debugfs or printk, then the output should be self-explanatory. However I
 do see the usefulness of having a binary dump that can be decoded in
 userspace. But I think if we want to go that way we should make that a
 separate interface. USB provides something like that, which can then be
 fed to libpcap or wireshark to capture and analyze USB traffic. If done
 properly you get replay functionality for free. I don't know what infra-
 structure exists to help with implementing something similar.

It's not actually binary. I think I misrepresented the suggestion.

I'm suggesting that we'd display only the contents of command FIFO and
contents of gathers (i.e. all opcodes) in hex format, not decoded. All
other text would remain as is, so syncpt values, etc would be readable
by a glance.

The user space tool can then take the streams and decode them if needed.

We've noticed that the decoded opcodes format can be very long and
sometimes takes a minute to dump out via a slow console. The hex output
is much more compact and faster to dump.

Actual tracing or wireshark kind of capability would come via decoding
the ftrace log. When enabled, everything that is written to the channel,
is also written to ftrace.

 +static void show_channel_word(struct output *o, int *state, int *count,
 +  u32 addr, u32 val, struct host1x_cdma *cdma)
 +{
 +  static int start_count, dont_print;

 What if two processes read debug information at the same time?

 show_channels() acquires cdma.lock, so that shouldn't happen.
 
 Okay. Another solution would be to pass around a debug context which
 keeps track of the variables. But if we opt for a more involved dump
 interface as discussed above this will no longer be relevant.

Actually, debugging process needs cdma.lock, because it goes through the
cdma queue. Also command FIFO dumping is something that must be done by
a single thread at a time.

 Okay. In the context of a channel dump interface this may not be
 relevant anymore. Can you think of any issue that wouldn't be detectable
 or debuggable by analyzing a binary dump of the data within a channel?
 I'm asking because at that point we wouldn't be able to access any of
 the in-kernel data structures but would have to rely on the data itself
 for diagnostics. IOMMU virtual addresses won't be available and so on.

In many cases, looking at syncpt values, and channel state
(active/waiting on a syncpt, etc) gives an indication on what is the
current state of hardware. But, very often problems are ripple effects
on something that happened earlier and the job that caused the problem
has already been freed and is not visible in the dump.

To get a full history, we need often need the ftrace log.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-06 Thread Terje Bergström
On 05.02.2013 01:54, Thierry Reding wrote:
 On Mon, Feb 04, 2013 at 09:17:45PM -0800, Terje Bergström wrote:
 Yeah, it's actually working around the host1x duplicate naming.
 host1x_syncpt_get takes struct host1x as parameter, but that's different
 host1x than in this code.
 
 So maybe a better way would be to rename the DRM host1x after all. If it
 avoids the need for workarounds such as this I think it justifies the
 additional churn.

Ok, I'll include that. Do you have a preference for the name? Something
like host1x_drm might work?

 Also, how useful is it to create a context? Looking at the gr2d
 implementation for .open_channel(), it will return the same channel to
 whichever userspace process requests them. Can you explain why it is
 necessary at all? From the name I would have expected some kind of
 context switching to take place when different applications submit
 requests to the same client, but that doesn't seem to be the case.

 Hardware context switching will be a later submit, and it'll actually
 create a new structure. Hardware context might live longer than the
 process that created it, so they'll need to be separate.
 
 Why would it live longer than the process? Isn't the whole purpose of
 the context to keep per-process state? What use is that state if the
 process dies?

Hardware context has to be kept alive for as long as there's a job
running from that process. If an app sends 10 jobs to 2D channel, and
dies immediately, there's no sane way for host1x to remove the jobs from
queue. The jobs will keep on running and kernel will need to track them.

 Perhaps the justification is that this way we can keep the kernel API
 stable even when we add support for hardware contexts and other clients.
 
 We don't need a stable kernel API. But I guess it is fine to keep it if
 for no other reason to fill the context returned in the ioctl() with
 meaningful data.

Sorry, I meant stable IOCTL API, so we agree on this.

 host1x_drm_file sounds a bit odd, because it's not really a file, but a
 private data pointer stored in driver_priv.
 
 The same is true for struct drm_file, which is stored in struct file's
 .private_data field. I find it to be very intuitive if the inheritance
 is reflected in the structure name. struct host1x_drm_file is host1x'
 driver-specific part of struct drm_file.

Ok, makes sense. I'll do that.

 I fail to see how dma_buf would require a separate mem_mgr_type. Can we
 perhaps postpone this to a later point and just go with CMA as the only
 alternative for now until we have an actual working implementation that
 we can use this for?

Each submit refers to a number of buffers. Some of them are the streams,
some are textures or other input/output buffers. Each of these buffers
might be passed as a GEM handle, or (when implemented) as a dma_buf fd.
Thus we need a field to tell host1x which API to call to handle that handle.

I think we can leave out the code for managing the type until we
actually have separate memory managers. That'd make GEM handles
effectively of type 0, as we don't set it.

 
 +static int gr2d_submit(struct host1x_drm_context *context,
 + struct tegra_drm_submit_args *args,
 + struct drm_device *drm,
 + struct drm_file *file_priv)
 +{
 + struct host1x_job *job;
 + int num_cmdbufs = args-num_cmdbufs;
 + int num_relocs = args-num_relocs;
 + int num_waitchks = args-num_waitchks;
 + struct tegra_drm_cmdbuf __user *cmdbufs =
 + (void * __user)(uintptr_t)args-cmdbufs;
 + struct tegra_drm_reloc __user *relocs =
 + (void * __user)(uintptr_t)args-relocs;
 + struct tegra_drm_waitchk __user *waitchks =
 + (void * __user)(uintptr_t)args-waitchks;

 No need for all the uintptr_t casts.

 Will try to remove - but I do remember getting compiler warnings without
 them.
 
 I think you shouldn't even have to cast to void * first. Just cast to
 the target type directly. I don't see why the compiler should complain.

This is what I get without them:

drivers/gpu/host1x/drm/gr2d.c:108:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-cast]
drivers/gpu/host1x/drm/gr2d.c:110:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-cast]
drivers/gpu/host1x/drm/gr2d.c:112:3: warning: cast to pointer from
integer of different size [-Wint-to-pointer-c

The problem is that the fields are __u64's and can't be cast directly
into 32-bit pointers.

 That's the security firewall. It walks through each submit, and ensures
 that each register write that writes an address, goes through the host1x
 reloc mechanism. This way user space cannot ask 2D to write to arbitrary
 memory locations.
 
 I see. Can this be made more generic? Perhaps adding a table of valid
 registers to the device and use a generic function to iterate over that
 instead of having to provide the same function for each client.

For which one does gcc generate more efficient code

Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-04 Thread Terje Bergström
On 04.02.2013 04:56, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:04PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c
>> @@ -270,7 +274,29 @@ static int tegra_drm_unload(struct drm_device *drm)
>>
>>  static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
>>  {
>> - return 0;
>> + struct host1x_drm_fpriv *fpriv;
>> + int err = 0;
> 
> Can be dropped.
> 
>> +
>> + fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
>> + if (!fpriv)
>> + return -ENOMEM;
>> +
>> + INIT_LIST_HEAD(>contexts);
>> + filp->driver_priv = fpriv;
>> +
>> + return err;
> 
> return 0;

Ok.

> 
>> +static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
>> +{
>> + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(filp);
>> + struct host1x_drm_context *context, *tmp;
>> +
>> + list_for_each_entry_safe(context, tmp, >contexts, list) {
>> + context->client->ops->close_channel(context);
>> + kfree(context);
>> + }
>> + kfree(fpriv);
>>  }
> 
> Maybe you should add host1x_drm_context_free() to wrap the loop
> contents?

Makes sense. Will do.

> 
>> @@ -280,7 +306,204 @@ static void tegra_drm_lastclose(struct drm_device *drm)
>>   drm_fbdev_cma_restore_mode(host1x->fbdev);
>>  }
>>
>> +static int
>> +tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
>> +  struct drm_file *file_priv)
> 
> static int and function name on one line, please.

Ok, will re-split the lines.

> 
>> +{
>> + struct host1x *host1x = drm->dev_private;
>> + struct tegra_drm_syncpt_read_args *args = data;
>> + struct host1x_syncpt *sp =
>> + host1x_syncpt_get_bydev(host1x->dev, args->id);
> 
> I don't know if we need this, except maybe to work around the problem
> that we have two different structures named host1x. The _bydev() suffix
> is misleading because all you really do here is obtain the syncpt from
> the host1x.

Yeah, it's actually working around the host1x duplicate naming.
host1x_syncpt_get takes struct host1x as parameter, but that's different
host1x than in this code.

> 
>> +static int
>> +tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
>> +  struct drm_file *file_priv)
>> +{
>> + struct tegra_drm_open_channel_args *args = data;
>> + struct host1x_client *client;
>> + struct host1x_drm_context *context;
>> + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
>> + struct host1x *host1x = drm->dev_private;
>> + int err = 0;
> 
> err = -ENODEV; (see below)

Ok, makes sense.

> 
>> +
>> + context = kzalloc(sizeof(*context), GFP_KERNEL);
>> + if (!context)
>> + return -ENOMEM;
>> +
>> + list_for_each_entry(client, >clients, list) {
>> + if (client->class == args->class) {
>> + context->client = client;
>> + err = client->ops->open_channel(client, context);
>> + if (err)
>> + goto out;
>> +
>> + list_add(>list, >contexts);
>> + args->context = (uintptr_t)context;
> 
> Perhaps cast this to __u64 directly instead? There's little sense in
> taking the detour via uintptr_t.

I think compiler complained about a direct cast to __u64, but I'll try
again.

> 
>> + goto out;
> 
> return 0;
> 
>> + }
>> + }
>> + err = -ENODEV;
>> +
>> +out:
>> + if (err)
>> + kfree(context);
>> +
>> + return err;
>> +}
> 
> Then this simply becomes:
> 
> kfree(context);
> return err;

Sounds good.

> 
>> +static int
>> +tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
>> +  struct drm_file *file_priv)
>> +{
>> + struct tegra_drm_open_channel_args *args = data;
>> + struct host1x_drm_context *context, *tmp;
>> + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
>> + int err = 0;
>> +
>> + list_for_each_entry_safe(context, tmp, >contexts, list) {
>> + if ((uintptr_t)context == args->context) {
>> + context->client->ops->close_channel(context);
>> + list_del(>list);
>> + kfree(context);
>> + goto out;
>> + }
>> + }
>> + err = -EINVAL;
>> +
>> +out:
>> + return err;
>> +}
> 
> Same comments as for tegra_drm_ioctl_open_channel().

Ok, will apply.

> 
>> +static int
>> +tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
>> +  struct drm_file *file_priv)
>> +{
>> + struct tegra_drm_get_channel_param_args *args = data;
>> + struct host1x_drm_context *context;
>> + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
>> + int err = 0;
>> +
>> + 

Re: [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:26, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:03PM +0200, Terje Bergstrom wrote:
>> Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
>> of 2D clock to that driver alias.
>>
>> Signed-off-by: Terje Bergstrom 
>> ---
>>  arch/arm/mach-tegra/board-dt-tegra20.c|1 +
>>  arch/arm/mach-tegra/board-dt-tegra30.c|1 +
>>  arch/arm/mach-tegra/tegra20_clocks_data.c |2 +-
>>  arch/arm/mach-tegra/tegra30_clocks_data.c |1 +
>>  4 files changed, 4 insertions(+), 1 deletion(-)
> 
> With Prashant's clock rework patches now merged this patch can be
> dropped.

Yes, and I'll also need to start calling 2D clock with name NULL in gr2d.c.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:08, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:01PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
>> index 697d49a..ffc8bf1 100644
>> --- a/drivers/gpu/host1x/Makefile
>> +++ b/drivers/gpu/host1x/Makefile
>> @@ -12,4 +12,10 @@ host1x-y = \
>>  hw/host1x01.o
>>  
>>  host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
>> +
>> +ccflags-y += -Iinclude/drm
>> +ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
>> +
>> +host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o
>> +host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o
>>  obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
> 
> Can this be moved into a separate Makefile in the drm subdirectory?

I tried, and kernel build helpfully created two .ko files. As having
cyclic dependencies between two modules isn't nice, I merged them to
same module and that seemed to force merging Makefile.

If anybody has an idea on how to do it otherwise, I'd be happy to keep
the Makefiles separate.

> 
>> diff --git a/drivers/gpu/host1x/host1x_client.h 
>> b/drivers/gpu/host1x/host1x_client.h
> [...]
>> new file mode 100644
>> index 000..fdd2920
>> --- /dev/null
>> +++ b/drivers/gpu/host1x/host1x_client.h
>> @@ -0,0 +1,25 @@
>> +/*
>> + * Copyright (c) 2013, NVIDIA Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#ifndef HOST1X_CLIENT_H
>> +#define HOST1X_CLIENT_H
>> +
>> +struct platform_device;
>> +
>> +void host1x_set_drm_data(struct platform_device *pdev, void *data);
>> +void *host1x_get_drm_data(struct platform_device *pdev);
>> +
>> +#endif
> 
> These aren't defined or used yet.

Hmm, right, they would go to patch 7.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:03, Thierry Reding wrote:
> * PGP Signed by an unknown key
> 
> On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
>> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> [...]
>> +static pid_t host1x_debug_null_kickoff_pid;
>> +unsigned int host1x_debug_trace_cmdbuf;
>> +
>> +static pid_t host1x_debug_force_timeout_pid;
>> +static u32 host1x_debug_force_timeout_val;
>> +static u32 host1x_debug_force_timeout_channel;
> 
> Please group static and non-static variables.

Will do.

> 
>> diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
> [...]
>> +struct output {
>> +void (*fn)(void *ctx, const char *str, size_t len);
>> +void *ctx;
>> +char buf[256];
>> +};
> 
> Do we really need this kind of abstraction? There really should be only
> one location where debug information is obtained, so I don't see a need
> for this.

This is used by debugfs code to direct to debugfs, and
nvhost_debug_dump() to send via printk.

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> [...]
>>  struct host1x_syncpt_ops {
>>  void (*reset)(struct host1x_syncpt *);
>>  void (*reset_wait_base)(struct host1x_syncpt *);
>> @@ -117,6 +133,7 @@ struct host1x {
>>  struct host1x_channel_ops channel_op;
>>  struct host1x_cdma_ops cdma_op;
>>  struct host1x_pushbuffer_ops cdma_pb_op;
>> +struct host1x_debug_ops debug_op;
>>  struct host1x_syncpt_ops syncpt_op;
>>  struct host1x_intr_ops intr_op;
> 
> Again, better to pass in a const pointer to the ops structure.

Ok.

> 
>> diff --git a/drivers/gpu/host1x/hw/debug_hw.c 
>> b/drivers/gpu/host1x/hw/debug_hw.c
> 
>> +static int show_channel_command(struct output *o, u32 addr, u32 val, int 
>> *count)
>> +{
>> +unsigned mask;
>> +unsigned subop;
>> +
>> +switch (val >> 28) {
>> +case 0x0:
> 
> These can easily be derived by looking at the debug output, but it may
> still make sense to assign symbolic names to them.

I have another suggestion. In downstream I removed the decoding part and
I just print out a string of hex. That removes quite a bit bunch of code
from kernel. It makes the debug output also more compact.

It's much easier to write a user space program to decode than maintain
it in kernel.

> 
>> +static void show_channel_word(struct output *o, int *state, int *count,
>> +u32 addr, u32 val, struct host1x_cdma *cdma)
>> +{
>> +static int start_count, dont_print;
> 
> What if two processes read debug information at the same time?

show_channels() acquires cdma.lock, so that shouldn't happen.

> 
>> +static void do_show_channel_gather(struct output *o,
>> +phys_addr_t phys_addr,
>> +u32 words, struct host1x_cdma *cdma,
>> +phys_addr_t pin_addr, u32 *map_addr)
>> +{
>> +/* Map dmaget cursor to corresponding mem handle */
>> +u32 offset;
>> +int state, count, i;
>> +
>> +offset = phys_addr - pin_addr;
>> +/*
>> + * Sometimes we're given different hardware address to the same
>> + * page - in these cases the offset will get an invalid number and
>> + * we just have to bail out.
>> + */
> 
> Why's that?

Because of a race - memory might've been unpinned and unmapped from
IOMMU and when we re-map (pin), we are given a new address.

But, I think this comment is a bit stale - we used to dump also old
gathers. The latest code only dumps jobs in sync queue, so the race
shouldn't happen.

> 
>> +map_addr = host1x_memmgr_mmap(mem);
>> +if (!map_addr) {
>> +host1x_debug_output(o, "[could not mmap]\n");
>> +return;
>> +}
>> +
>> +/* Get base address from mem */
>> +sgt = host1x_memmgr_pin(mem);
>> +if (IS_ERR(sgt)) {
>> +host1x_debug_output(o, "[couldn't pin]\n");
>> +host1x_memmgr_munmap(mem, map_addr);
>> +return;
>> +}
> 
> Maybe you should stick with one of "could not" or "couldn't". Makes it
> easier to search for.

I prefer "could not", so I'll use that.

> 
>> +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
>> +{
>> +struct host1x_job *job;
>> +
>> +list_for_each_entry(job, >sync_queue, list) {
>> +int i;
>> +host1x_debug_output(o,
>> +"\n%p: JOB, syncpt_id=%d, syncpt_val=%d,"
>> +" first_get=%08x, timeout=%d"
>> +" num_slots=%d, num_handles=%d\n",
>> +job,
>> +job->syncpt_id,
>> +job->syncpt_end,
>> +job->first_get,
>> +job->timeout,
>> +job->num_slots,
>> +job->num_unpins);
> 
> This could go on fewer lines.

Yes, will merge.

> 
>> +static void host1x_debug_show_channel_cdma(struct host1x *m,
>> +struct host1x_channel *ch, struct 

Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts

2013-02-04 Thread Terje Bergström
On 04.02.2013 02:30, Thierry Reding wrote:
> On Tue, Jan 15, 2013 at 01:43:58PM +0200, Terje Bergstrom wrote:
> [...]
>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> [...]
>> @@ -95,7 +96,6 @@ static int host1x_probe(struct platform_device *dev)
>>
>>   /* set common host1x device data */
>>   platform_set_drvdata(dev, host);
>> -
>>   host->regs = devm_request_and_ioremap(>dev, regs);
>>   if (!host->regs) {
>>   dev_err(>dev, "failed to remap host registers\n");
> 
> This seems an unrelated (and actually undesirable) change.
> 
>> @@ -109,28 +109,40 @@ static int host1x_probe(struct platform_device *dev)
>>   }
>>
>>   err = host1x_syncpt_init(host);
>> - if (err)
>> + if (err) {
>> + dev_err(>dev, "failed to init sync points");
>>   return err;
>> + }
> 
> This error message should probably have gone in the previous patch as
> well.

Oops, will move these to previous patch. I'm pretty sure I already fixed
this once. :-(

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
>> index d8f5979..8376092 100644
>> --- a/drivers/gpu/host1x/dev.h
>> +++ b/drivers/gpu/host1x/dev.h
>> @@ -17,11 +17,12 @@
>>  #ifndef HOST1X_DEV_H
>>  #define HOST1X_DEV_H
>>
>> +#include 
>>  #include "syncpt.h"
>> +#include "intr.h"
>>
>>  struct host1x;
>>  struct host1x_syncpt;
>> -struct platform_device;
> 
> Why include platform_device.h here?

host1x_get_host() actually needs that, so this #include should've also
been in previous patch.

> 
>> @@ -34,6 +35,18 @@ struct host1x_syncpt_ops {
>>   const char * (*name)(struct host1x_syncpt *);
>>  };
>>
>> +struct host1x_intr_ops {
>> + void (*init_host_sync)(struct host1x_intr *);
>> + void (*set_host_clocks_per_usec)(
>> + struct host1x_intr *, u32 clocks);
> 
> Could the above two not be combined? The only reason to keep them
> separate would be if the host1x clock was dynamically changed, but I
> don't think we support that, right?

I've left this as a placeholder to at some point start supporting host1x
clock scaling. But I don't think we're going to do that for a while, so
I could merge them.

> 
>> + void (*set_syncpt_threshold)(
>> + struct host1x_intr *, u32 id, u32 thresh);
>> + void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
>> + void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
>> + void (*disable_all_syncpt_intrs)(struct host1x_intr *);
> 
> Can disable_all_syncpt_intrs() not be implemented generically using the
> number of syncpoints as exposed by host1x_device_info and the
> .disable_syncpt_intr() function?

disable_all_syncpt_intrs() disables all interrupts in one write (or one
per 32 sync points), so it's more efficient.

> 
>> @@ -46,11 +59,13 @@ struct host1x_device_info {
>>  struct host1x {
>>   void __iomem *regs;
>>   struct host1x_syncpt *syncpt;
>> + struct host1x_intr intr;
>>   struct platform_device *dev;
>>   struct host1x_device_info info;
>>   struct clk *clk;
>>
>>   struct host1x_syncpt_ops syncpt_op;
>> + struct host1x_intr_ops intr_op;
> 
> I think carrying a const pointer to the interrupt operations structure
> is a better option here.

Ok.

> 
>> diff --git a/drivers/gpu/host1x/hw/intr_hw.c 
>> b/drivers/gpu/host1x/hw/intr_hw.c
> [...]
>> +static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt 
>> *syncpt);
> 
> Can we avoid this forward declaration?

I think we can, if I just move the isr to top of file.

> 
>> +static void syncpt_thresh_cascade_fn(struct work_struct *work)
> 
> syncpt_thresh_work()?

Sounds good.

> 
>> +{
>> + struct host1x_intr_syncpt *sp =
>> + container_of(work, struct host1x_intr_syncpt, work);
>> + host1x_syncpt_thresh_fn(sp);
> 
> Couldn't we inline the host1x_syncpt_thresh_fn() implementation here?
> Why do we need to go through an external function declaration?

If I move syncpt_thresh_work() to intr.c from intr_hw.c, I could do
that. That'd simplify the interrupt path.

> 
>> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
>> +{
>> + struct host1x *host1x = dev_id;
>> + struct host1x_intr *intr = >intr;
>> + unsigned long reg;
>> + int i, id;
>> +
>> + for (i = 0; i < host1x->info.nb_pts / BITS_PER_LONG; i++) {
>> + reg = host1x_sync_readl(host1x,
>> + HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
>> + i * REGISTER_STRIDE);
>> + for_each_set_bit(id, , BITS_PER_LONG) {
>> + struct host1x_intr_syncpt *sp =
>> + intr->syncpt + (i * BITS_PER_LONG + id);
>> + host1x_intr_syncpt_thresh_isr(sp);
> 
> Have you considered mimicking the IRQ API and name this something like
> host1x_intr_syncpt_thresh_handle() and name the actual ISR just
> syncpt_thresh_isr()? Not so important but it makes things a 

Re: [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver

2013-02-04 Thread Terje Bergström
On 04.02.2013 01:09, Thierry Reding wrote:
> On Tue, Jan 15, 2013 at 01:43:57PM +0200, Terje Bergstrom wrote:
>> Add host1x, the driver for host1x and its client unit 2D.
> 
> Maybe this could be a bit more verbose. Perhaps describe what host1x is.

Sure. I could just steal the paragraph from Stephen:

The Tegra host1x module is the DMA engine for register access to Tegra's
graphics- and multimedia-related modules. The modules served by host1x
are referred to as clients. host1x includes some other  functionality,
such as synchronization.

>> diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
> [...]
>> @@ -0,0 +1,6 @@
>> +config TEGRA_HOST1X
>> + tristate "Tegra host1x driver"
>> + help
>> +   Driver for the Tegra host1x hardware.
> 
> Maybe s/Tegra/NVIDIA Tegra/?

Sounds good.

>> +
>> +   Required for enabling tegradrm.
> 
> This should probably be dropped. Either encode such knowledge as
> explicit dependencies or in this case just remove it altogether since we
> will probably merge both drivers anyway.

I think this was left from previous versions. Now it just doesn't make
sense. I'll just drop it.

> 
>> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> [...]
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include "dev.h"
> 
> Maybe add a blank line between the previous two lines to visually
> separate standard Linux includes from driver-specific ones.

Ok. You commented in quite few places in a similar way. I'll fix all of
them to first include system includes, then driver's own includes, and
add a blank line in between.

> 
>> +#include "hw/host1x01.h"
>> +
>> +#define CREATE_TRACE_POINTS
>> +#include 
>> +
>> +#define DRIVER_NAME  "tegra-host1x"
> 
> You only ever use this once, so maybe it can just be dropped?

Yes.

> 
>> +static struct host1x_device_info host1x_info = {
> 
> Perhaps this should be host1x01_info in order to match the hardware
> revision? That'll avoid it having to be renamed later on when other
> revisions start to appear.

Ok, will do. I thought it'd be awkward being alone until the second
version appears, but I'll add it.

> 
>> +static int host1x_probe(struct platform_device *dev)
>> +{
> [...]
>> + syncpt_irq = platform_get_irq(dev, 0);
>> + if (IS_ERR_VALUE(syncpt_irq)) {
> 
> This is somewhat unusual. It should be fine to just do:
> 
> if (syncpt_irq < 0)
> 
> but IS_ERR_VALUE() should work fine too.

I'll use the simpler version.

> 
>> + memcpy(>info, devid->data, sizeof(struct host1x_device_info));
> 
> Why not make the .info field a pointer to struct host1x_device_info
> instead? That way you don't have to keep separate copies of the same
> information.

This had something to do with __init data and non-init data. But, we're
not really putting this data into __init, so we should be able to use
just a pointer.

> 
>> + /* set common host1x device data */
>> + platform_set_drvdata(dev, host);
>> +
>> + host->regs = devm_request_and_ioremap(>dev, regs);
>> + if (!host->regs) {
>> + dev_err(>dev, "failed to remap host registers\n");
>> + return -ENXIO;
>> + }
> 
> This should probably be rewritten as:
> 
> host->regs = devm_ioremap_resource(>dev, regs);
> if (IS_ERR(host->regs))
> return PTR_ERR(host->regs);
> 
> Though that function will only be available in 3.9-rc1.

Ok, 3.9-rc1 is fine as a basis.

>> + err = host1x_syncpt_init(host);
>> + if (err)
>> + return err;
> [...]
>> + host1x_syncpt_reset(host);
> 
> Why separate host1x_syncpt_reset() from host1x_syncpt_init()? I see why
> it might be useful to have host1x_syncpt_reset() as a separate function
> but couldn't it be called as part of host1x_syncpt_init()?

host1x_syncpt_init() is used for initializing the syncpt structures, and
is called in probe. host1x_syncpt_reset() should be called whenever we
think hardware state is lost, for example if VDD_CORE was rail gated due
to system suspend.

> 
>> + dev_info(>dev, "initialized\n");
> 
> I don't think this is very useful. We should make sure to tell people
> when things fail. When everything goes as planned we don't need to brag
> about it =)

True. I wish other kernel drivers followed that same philosophy. :-)
I'll remove that. It's mainly useful as debug help, but it's as easy to
check from sysfs the state.

> 
>> diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
> [...]
>> +struct host1x_syncpt_ops {
> [...]
>> + const char * (*name)(struct host1x_syncpt *);
> 
> Why do we need this? Could we not refer to the syncpt name directly
> instead of going through this wrapper? I'd expect the name to be static.

This must be a relic. I'll remove the wrapper.

> 
>> +struct host1x_device_info {
> 
> Maybe this should be called simply host1x_info? _device seems redundant.

Sure.

> 
>> + int nb_channels;   

Re: [PATCHv5,RESEND 1/8] gpu: host1x: Add host1x driver

2013-02-04 Thread Terje Bergström
On 04.02.2013 01:09, Thierry Reding wrote:
 On Tue, Jan 15, 2013 at 01:43:57PM +0200, Terje Bergstrom wrote:
 Add host1x, the driver for host1x and its client unit 2D.
 
 Maybe this could be a bit more verbose. Perhaps describe what host1x is.

Sure. I could just steal the paragraph from Stephen:

The Tegra host1x module is the DMA engine for register access to Tegra's
graphics- and multimedia-related modules. The modules served by host1x
are referred to as clients. host1x includes some other  functionality,
such as synchronization.

 diff --git a/drivers/gpu/host1x/Kconfig b/drivers/gpu/host1x/Kconfig
 [...]
 @@ -0,0 +1,6 @@
 +config TEGRA_HOST1X
 + tristate Tegra host1x driver
 + help
 +   Driver for the Tegra host1x hardware.
 
 Maybe s/Tegra/NVIDIA Tegra/?

Sounds good.

 +
 +   Required for enabling tegradrm.
 
 This should probably be dropped. Either encode such knowledge as
 explicit dependencies or in this case just remove it altogether since we
 will probably merge both drivers anyway.

I think this was left from previous versions. Now it just doesn't make
sense. I'll just drop it.

 
 diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
 [...]
 +#include linux/module.h
 +#include linux/list.h
 +#include linux/slab.h
 +#include linux/of.h
 +#include linux/of_device.h
 +#include linux/clk.h
 +#include linux/io.h
 +#include dev.h
 
 Maybe add a blank line between the previous two lines to visually
 separate standard Linux includes from driver-specific ones.

Ok. You commented in quite few places in a similar way. I'll fix all of
them to first include system includes, then driver's own includes, and
add a blank line in between.

 
 +#include hw/host1x01.h
 +
 +#define CREATE_TRACE_POINTS
 +#include trace/events/host1x.h
 +
 +#define DRIVER_NAME  tegra-host1x
 
 You only ever use this once, so maybe it can just be dropped?

Yes.

 
 +static struct host1x_device_info host1x_info = {
 
 Perhaps this should be host1x01_info in order to match the hardware
 revision? That'll avoid it having to be renamed later on when other
 revisions start to appear.

Ok, will do. I thought it'd be awkward being alone until the second
version appears, but I'll add it.

 
 +static int host1x_probe(struct platform_device *dev)
 +{
 [...]
 + syncpt_irq = platform_get_irq(dev, 0);
 + if (IS_ERR_VALUE(syncpt_irq)) {
 
 This is somewhat unusual. It should be fine to just do:
 
 if (syncpt_irq  0)
 
 but IS_ERR_VALUE() should work fine too.

I'll use the simpler version.

 
 + memcpy(host-info, devid-data, sizeof(struct host1x_device_info));
 
 Why not make the .info field a pointer to struct host1x_device_info
 instead? That way you don't have to keep separate copies of the same
 information.

This had something to do with __init data and non-init data. But, we're
not really putting this data into __init, so we should be able to use
just a pointer.

 
 + /* set common host1x device data */
 + platform_set_drvdata(dev, host);
 +
 + host-regs = devm_request_and_ioremap(dev-dev, regs);
 + if (!host-regs) {
 + dev_err(dev-dev, failed to remap host registers\n);
 + return -ENXIO;
 + }
 
 This should probably be rewritten as:
 
 host-regs = devm_ioremap_resource(dev-dev, regs);
 if (IS_ERR(host-regs))
 return PTR_ERR(host-regs);
 
 Though that function will only be available in 3.9-rc1.

Ok, 3.9-rc1 is fine as a basis.

 + err = host1x_syncpt_init(host);
 + if (err)
 + return err;
 [...]
 + host1x_syncpt_reset(host);
 
 Why separate host1x_syncpt_reset() from host1x_syncpt_init()? I see why
 it might be useful to have host1x_syncpt_reset() as a separate function
 but couldn't it be called as part of host1x_syncpt_init()?

host1x_syncpt_init() is used for initializing the syncpt structures, and
is called in probe. host1x_syncpt_reset() should be called whenever we
think hardware state is lost, for example if VDD_CORE was rail gated due
to system suspend.

 
 + dev_info(dev-dev, initialized\n);
 
 I don't think this is very useful. We should make sure to tell people
 when things fail. When everything goes as planned we don't need to brag
 about it =)

True. I wish other kernel drivers followed that same philosophy. :-)
I'll remove that. It's mainly useful as debug help, but it's as easy to
check from sysfs the state.

 
 diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
 [...]
 +struct host1x_syncpt_ops {
 [...]
 + const char * (*name)(struct host1x_syncpt *);
 
 Why do we need this? Could we not refer to the syncpt name directly
 instead of going through this wrapper? I'd expect the name to be static.

This must be a relic. I'll remove the wrapper.

 
 +struct host1x_device_info {
 
 Maybe this should be called simply host1x_info? _device seems redundant.

Sure.

 
 + int nb_channels;/* host1x: num channels supported */
 + int 

Re: [PATCHv5,RESEND 2/8] gpu: host1x: Add syncpoint wait and interrupts

2013-02-04 Thread Terje Bergström
On 04.02.2013 02:30, Thierry Reding wrote:
 On Tue, Jan 15, 2013 at 01:43:58PM +0200, Terje Bergstrom wrote:
 [...]
 diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
 [...]
 @@ -95,7 +96,6 @@ static int host1x_probe(struct platform_device *dev)

   /* set common host1x device data */
   platform_set_drvdata(dev, host);
 -
   host-regs = devm_request_and_ioremap(dev-dev, regs);
   if (!host-regs) {
   dev_err(dev-dev, failed to remap host registers\n);
 
 This seems an unrelated (and actually undesirable) change.
 
 @@ -109,28 +109,40 @@ static int host1x_probe(struct platform_device *dev)
   }

   err = host1x_syncpt_init(host);
 - if (err)
 + if (err) {
 + dev_err(dev-dev, failed to init sync points);
   return err;
 + }
 
 This error message should probably have gone in the previous patch as
 well.

Oops, will move these to previous patch. I'm pretty sure I already fixed
this once. :-(

 
 diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
 index d8f5979..8376092 100644
 --- a/drivers/gpu/host1x/dev.h
 +++ b/drivers/gpu/host1x/dev.h
 @@ -17,11 +17,12 @@
  #ifndef HOST1X_DEV_H
  #define HOST1X_DEV_H

 +#include linux/platform_device.h
  #include syncpt.h
 +#include intr.h

  struct host1x;
  struct host1x_syncpt;
 -struct platform_device;
 
 Why include platform_device.h here?

host1x_get_host() actually needs that, so this #include should've also
been in previous patch.

 
 @@ -34,6 +35,18 @@ struct host1x_syncpt_ops {
   const char * (*name)(struct host1x_syncpt *);
  };

 +struct host1x_intr_ops {
 + void (*init_host_sync)(struct host1x_intr *);
 + void (*set_host_clocks_per_usec)(
 + struct host1x_intr *, u32 clocks);
 
 Could the above two not be combined? The only reason to keep them
 separate would be if the host1x clock was dynamically changed, but I
 don't think we support that, right?

I've left this as a placeholder to at some point start supporting host1x
clock scaling. But I don't think we're going to do that for a while, so
I could merge them.

 
 + void (*set_syncpt_threshold)(
 + struct host1x_intr *, u32 id, u32 thresh);
 + void (*enable_syncpt_intr)(struct host1x_intr *, u32 id);
 + void (*disable_syncpt_intr)(struct host1x_intr *, u32 id);
 + void (*disable_all_syncpt_intrs)(struct host1x_intr *);
 
 Can disable_all_syncpt_intrs() not be implemented generically using the
 number of syncpoints as exposed by host1x_device_info and the
 .disable_syncpt_intr() function?

disable_all_syncpt_intrs() disables all interrupts in one write (or one
per 32 sync points), so it's more efficient.

 
 @@ -46,11 +59,13 @@ struct host1x_device_info {
  struct host1x {
   void __iomem *regs;
   struct host1x_syncpt *syncpt;
 + struct host1x_intr intr;
   struct platform_device *dev;
   struct host1x_device_info info;
   struct clk *clk;

   struct host1x_syncpt_ops syncpt_op;
 + struct host1x_intr_ops intr_op;
 
 I think carrying a const pointer to the interrupt operations structure
 is a better option here.

Ok.

 
 diff --git a/drivers/gpu/host1x/hw/intr_hw.c 
 b/drivers/gpu/host1x/hw/intr_hw.c
 [...]
 +static void host1x_intr_syncpt_thresh_isr(struct host1x_intr_syncpt 
 *syncpt);
 
 Can we avoid this forward declaration?

I think we can, if I just move the isr to top of file.

 
 +static void syncpt_thresh_cascade_fn(struct work_struct *work)
 
 syncpt_thresh_work()?

Sounds good.

 
 +{
 + struct host1x_intr_syncpt *sp =
 + container_of(work, struct host1x_intr_syncpt, work);
 + host1x_syncpt_thresh_fn(sp);
 
 Couldn't we inline the host1x_syncpt_thresh_fn() implementation here?
 Why do we need to go through an external function declaration?

If I move syncpt_thresh_work() to intr.c from intr_hw.c, I could do
that. That'd simplify the interrupt path.

 
 +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
 +{
 + struct host1x *host1x = dev_id;
 + struct host1x_intr *intr = host1x-intr;
 + unsigned long reg;
 + int i, id;
 +
 + for (i = 0; i  host1x-info.nb_pts / BITS_PER_LONG; i++) {
 + reg = host1x_sync_readl(host1x,
 + HOST1X_SYNC_SYNCPT_THRESH_CPU0_INT_STATUS +
 + i * REGISTER_STRIDE);
 + for_each_set_bit(id, reg, BITS_PER_LONG) {
 + struct host1x_intr_syncpt *sp =
 + intr-syncpt + (i * BITS_PER_LONG + id);
 + host1x_intr_syncpt_thresh_isr(sp);
 
 Have you considered mimicking the IRQ API and name this something like
 host1x_intr_syncpt_thresh_handle() and name the actual ISR just
 syncpt_thresh_isr()? Not so important but it makes things a bit clearer
 in my opinion.

This gets a bit confusing, because we have an ISR that calls a function
that is also called ISR. I've kept isr in names of both to emphasize
that 

Re: [PATCHv5,RESEND 4/8] gpu: host1x: Add debug support

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:03, Thierry Reding wrote:
 * PGP Signed by an unknown key
 
 On Tue, Jan 15, 2013 at 01:44:00PM +0200, Terje Bergstrom wrote:
 diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
 [...]
 +static pid_t host1x_debug_null_kickoff_pid;
 +unsigned int host1x_debug_trace_cmdbuf;
 +
 +static pid_t host1x_debug_force_timeout_pid;
 +static u32 host1x_debug_force_timeout_val;
 +static u32 host1x_debug_force_timeout_channel;
 
 Please group static and non-static variables.

Will do.

 
 diff --git a/drivers/gpu/host1x/debug.h b/drivers/gpu/host1x/debug.h
 [...]
 +struct output {
 +void (*fn)(void *ctx, const char *str, size_t len);
 +void *ctx;
 +char buf[256];
 +};
 
 Do we really need this kind of abstraction? There really should be only
 one location where debug information is obtained, so I don't see a need
 for this.

This is used by debugfs code to direct to debugfs, and
nvhost_debug_dump() to send via printk.

 
 diff --git a/drivers/gpu/host1x/dev.h b/drivers/gpu/host1x/dev.h
 [...]
  struct host1x_syncpt_ops {
  void (*reset)(struct host1x_syncpt *);
  void (*reset_wait_base)(struct host1x_syncpt *);
 @@ -117,6 +133,7 @@ struct host1x {
  struct host1x_channel_ops channel_op;
  struct host1x_cdma_ops cdma_op;
  struct host1x_pushbuffer_ops cdma_pb_op;
 +struct host1x_debug_ops debug_op;
  struct host1x_syncpt_ops syncpt_op;
  struct host1x_intr_ops intr_op;
 
 Again, better to pass in a const pointer to the ops structure.

Ok.

 
 diff --git a/drivers/gpu/host1x/hw/debug_hw.c 
 b/drivers/gpu/host1x/hw/debug_hw.c
 
 +static int show_channel_command(struct output *o, u32 addr, u32 val, int 
 *count)
 +{
 +unsigned mask;
 +unsigned subop;
 +
 +switch (val  28) {
 +case 0x0:
 
 These can easily be derived by looking at the debug output, but it may
 still make sense to assign symbolic names to them.

I have another suggestion. In downstream I removed the decoding part and
I just print out a string of hex. That removes quite a bit bunch of code
from kernel. It makes the debug output also more compact.

It's much easier to write a user space program to decode than maintain
it in kernel.

 
 +static void show_channel_word(struct output *o, int *state, int *count,
 +u32 addr, u32 val, struct host1x_cdma *cdma)
 +{
 +static int start_count, dont_print;
 
 What if two processes read debug information at the same time?

show_channels() acquires cdma.lock, so that shouldn't happen.

 
 +static void do_show_channel_gather(struct output *o,
 +phys_addr_t phys_addr,
 +u32 words, struct host1x_cdma *cdma,
 +phys_addr_t pin_addr, u32 *map_addr)
 +{
 +/* Map dmaget cursor to corresponding mem handle */
 +u32 offset;
 +int state, count, i;
 +
 +offset = phys_addr - pin_addr;
 +/*
 + * Sometimes we're given different hardware address to the same
 + * page - in these cases the offset will get an invalid number and
 + * we just have to bail out.
 + */
 
 Why's that?

Because of a race - memory might've been unpinned and unmapped from
IOMMU and when we re-map (pin), we are given a new address.

But, I think this comment is a bit stale - we used to dump also old
gathers. The latest code only dumps jobs in sync queue, so the race
shouldn't happen.

 
 +map_addr = host1x_memmgr_mmap(mem);
 +if (!map_addr) {
 +host1x_debug_output(o, [could not mmap]\n);
 +return;
 +}
 +
 +/* Get base address from mem */
 +sgt = host1x_memmgr_pin(mem);
 +if (IS_ERR(sgt)) {
 +host1x_debug_output(o, [couldn't pin]\n);
 +host1x_memmgr_munmap(mem, map_addr);
 +return;
 +}
 
 Maybe you should stick with one of could not or couldn't. Makes it
 easier to search for.

I prefer could not, so I'll use that.

 
 +static void show_channel_gathers(struct output *o, struct host1x_cdma *cdma)
 +{
 +struct host1x_job *job;
 +
 +list_for_each_entry(job, cdma-sync_queue, list) {
 +int i;
 +host1x_debug_output(o,
 +\n%p: JOB, syncpt_id=%d, syncpt_val=%d,
 + first_get=%08x, timeout=%d
 + num_slots=%d, num_handles=%d\n,
 +job,
 +job-syncpt_id,
 +job-syncpt_end,
 +job-first_get,
 +job-timeout,
 +job-num_slots,
 +job-num_unpins);
 
 This could go on fewer lines.

Yes, will merge.

 
 +static void host1x_debug_show_channel_cdma(struct host1x *m,
 +struct host1x_channel *ch, struct output *o, int chid)
 +{
 [...]
 +switch (cbstat) {
 +case 0x00010008:
 
 Again, symbolic names would be nice.

I propose I remove the decoding from kernel, and save 200 lines.

Terje
--
To unsubscribe from this 

Re: [PATCHv5,RESEND 5/8] drm: tegra: Move drm to live under host1x

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:08, Thierry Reding wrote:
 * PGP Signed by an unknown key
 
 On Tue, Jan 15, 2013 at 01:44:01PM +0200, Terje Bergstrom wrote:
 [...]
 diff --git a/drivers/gpu/host1x/Makefile b/drivers/gpu/host1x/Makefile
 index 697d49a..ffc8bf1 100644
 --- a/drivers/gpu/host1x/Makefile
 +++ b/drivers/gpu/host1x/Makefile
 @@ -12,4 +12,10 @@ host1x-y = \
  hw/host1x01.o
  
  host1x-$(CONFIG_TEGRA_HOST1X_CMA) += cma.o
 +
 +ccflags-y += -Iinclude/drm
 +ccflags-$(CONFIG_DRM_TEGRA_DEBUG) += -DDEBUG
 +
 +host1x-$(CONFIG_DRM_TEGRA) += drm/drm.o drm/fb.o drm/dc.o drm/host1x.o
 +host1x-$(CONFIG_DRM_TEGRA) += drm/output.o drm/rgb.o drm/hdmi.o
  obj-$(CONFIG_TEGRA_HOST1X) += host1x.o
 
 Can this be moved into a separate Makefile in the drm subdirectory?

I tried, and kernel build helpfully created two .ko files. As having
cyclic dependencies between two modules isn't nice, I merged them to
same module and that seemed to force merging Makefile.

If anybody has an idea on how to do it otherwise, I'd be happy to keep
the Makefiles separate.

 
 diff --git a/drivers/gpu/host1x/host1x_client.h 
 b/drivers/gpu/host1x/host1x_client.h
 [...]
 new file mode 100644
 index 000..fdd2920
 --- /dev/null
 +++ b/drivers/gpu/host1x/host1x_client.h
 @@ -0,0 +1,25 @@
 +/*
 + * Copyright (c) 2013, NVIDIA Corporation.
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program.  If not, see http://www.gnu.org/licenses/.
 + */
 +
 +#ifndef HOST1X_CLIENT_H
 +#define HOST1X_CLIENT_H
 +
 +struct platform_device;
 +
 +void host1x_set_drm_data(struct platform_device *pdev, void *data);
 +void *host1x_get_drm_data(struct platform_device *pdev);
 +
 +#endif
 
 These aren't defined or used yet.

Hmm, right, they would go to patch 7.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 7/8] ARM: tegra: Add board data and 2D clocks

2013-02-04 Thread Terje Bergström
On 04.02.2013 03:26, Thierry Reding wrote:
 * PGP Signed by an unknown key
 
 On Tue, Jan 15, 2013 at 01:44:03PM +0200, Terje Bergstrom wrote:
 Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
 of 2D clock to that driver alias.

 Signed-off-by: Terje Bergstrom tbergst...@nvidia.com
 ---
  arch/arm/mach-tegra/board-dt-tegra20.c|1 +
  arch/arm/mach-tegra/board-dt-tegra30.c|1 +
  arch/arm/mach-tegra/tegra20_clocks_data.c |2 +-
  arch/arm/mach-tegra/tegra30_clocks_data.c |1 +
  4 files changed, 4 insertions(+), 1 deletion(-)
 
 With Prashant's clock rework patches now merged this patch can be
 dropped.

Yes, and I'll also need to start calling 2D clock with name NULL in gr2d.c.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 8/8] drm: tegra: Add gr2d device

2013-02-04 Thread Terje Bergström
On 04.02.2013 04:56, Thierry Reding wrote:
 * PGP Signed by an unknown key
 
 On Tue, Jan 15, 2013 at 01:44:04PM +0200, Terje Bergstrom wrote:
 [...]
 diff --git a/drivers/gpu/host1x/drm/drm.c b/drivers/gpu/host1x/drm/drm.c
 @@ -270,7 +274,29 @@ static int tegra_drm_unload(struct drm_device *drm)

  static int tegra_drm_open(struct drm_device *drm, struct drm_file *filp)
  {
 - return 0;
 + struct host1x_drm_fpriv *fpriv;
 + int err = 0;
 
 Can be dropped.
 
 +
 + fpriv = kzalloc(sizeof(*fpriv), GFP_KERNEL);
 + if (!fpriv)
 + return -ENOMEM;
 +
 + INIT_LIST_HEAD(fpriv-contexts);
 + filp-driver_priv = fpriv;
 +
 + return err;
 
 return 0;

Ok.

 
 +static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
 +{
 + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(filp);
 + struct host1x_drm_context *context, *tmp;
 +
 + list_for_each_entry_safe(context, tmp, fpriv-contexts, list) {
 + context-client-ops-close_channel(context);
 + kfree(context);
 + }
 + kfree(fpriv);
  }
 
 Maybe you should add host1x_drm_context_free() to wrap the loop
 contents?

Makes sense. Will do.

 
 @@ -280,7 +306,204 @@ static void tegra_drm_lastclose(struct drm_device *drm)
   drm_fbdev_cma_restore_mode(host1x-fbdev);
  }

 +static int
 +tegra_drm_ioctl_syncpt_read(struct drm_device *drm, void *data,
 +  struct drm_file *file_priv)
 
 static int and function name on one line, please.

Ok, will re-split the lines.

 
 +{
 + struct host1x *host1x = drm-dev_private;
 + struct tegra_drm_syncpt_read_args *args = data;
 + struct host1x_syncpt *sp =
 + host1x_syncpt_get_bydev(host1x-dev, args-id);
 
 I don't know if we need this, except maybe to work around the problem
 that we have two different structures named host1x. The _bydev() suffix
 is misleading because all you really do here is obtain the syncpt from
 the host1x.

Yeah, it's actually working around the host1x duplicate naming.
host1x_syncpt_get takes struct host1x as parameter, but that's different
host1x than in this code.

 
 +static int
 +tegra_drm_ioctl_open_channel(struct drm_device *drm, void *data,
 +  struct drm_file *file_priv)
 +{
 + struct tegra_drm_open_channel_args *args = data;
 + struct host1x_client *client;
 + struct host1x_drm_context *context;
 + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
 + struct host1x *host1x = drm-dev_private;
 + int err = 0;
 
 err = -ENODEV; (see below)

Ok, makes sense.

 
 +
 + context = kzalloc(sizeof(*context), GFP_KERNEL);
 + if (!context)
 + return -ENOMEM;
 +
 + list_for_each_entry(client, host1x-clients, list) {
 + if (client-class == args-class) {
 + context-client = client;
 + err = client-ops-open_channel(client, context);
 + if (err)
 + goto out;
 +
 + list_add(context-list, fpriv-contexts);
 + args-context = (uintptr_t)context;
 
 Perhaps cast this to __u64 directly instead? There's little sense in
 taking the detour via uintptr_t.

I think compiler complained about a direct cast to __u64, but I'll try
again.

 
 + goto out;
 
 return 0;
 
 + }
 + }
 + err = -ENODEV;
 +
 +out:
 + if (err)
 + kfree(context);
 +
 + return err;
 +}
 
 Then this simply becomes:
 
 kfree(context);
 return err;

Sounds good.

 
 +static int
 +tegra_drm_ioctl_close_channel(struct drm_device *drm, void *data,
 +  struct drm_file *file_priv)
 +{
 + struct tegra_drm_open_channel_args *args = data;
 + struct host1x_drm_context *context, *tmp;
 + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
 + int err = 0;
 +
 + list_for_each_entry_safe(context, tmp, fpriv-contexts, list) {
 + if ((uintptr_t)context == args-context) {
 + context-client-ops-close_channel(context);
 + list_del(context-list);
 + kfree(context);
 + goto out;
 + }
 + }
 + err = -EINVAL;
 +
 +out:
 + return err;
 +}
 
 Same comments as for tegra_drm_ioctl_open_channel().

Ok, will apply.

 
 +static int
 +tegra_drm_ioctl_get_syncpoint(struct drm_device *drm, void *data,
 +  struct drm_file *file_priv)
 +{
 + struct tegra_drm_get_channel_param_args *args = data;
 + struct host1x_drm_context *context;
 + struct host1x_drm_fpriv *fpriv = host1x_drm_fpriv(file_priv);
 + int err = 0;
 +
 + list_for_each_entry(context, fpriv-contexts, list) {
 + if ((uintptr_t)context == args-context) {
 + args-value =
 + context-client-ops-get_syncpoint(context,
 +  

Re: [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware

2013-01-22 Thread Terje Bergström
On 15.01.2013 13:43, Terje Bergstrom wrote:
> This set of patches adds support for Tegra20 and Tegra30 host1x and
> 2D. It is based on linux-next-20130114. The set was regenerated with
> git format-patch -M.

I have pushed both the kernel patches and libdrm changes to
g...@gitorious.org:linux-host1x/linux-host1x.git and
g...@gitorious.org:linux-host1x/libdrm-host1x.git.

They're not intended to compete with any other repository - I just
wanted to have one place where people can download the kernel patches,
libdrm changes and test suite. I'll remove them once they've served
their purpose.

I'd appreciate feedback on the patches. So far the only feedback has
been from Stephen about clock changes.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5,RESEND 0/8] Support for Tegra 2D hardware

2013-01-22 Thread Terje Bergström
On 15.01.2013 13:43, Terje Bergstrom wrote:
 This set of patches adds support for Tegra20 and Tegra30 host1x and
 2D. It is based on linux-next-20130114. The set was regenerated with
 git format-patch -M.

I have pushed both the kernel patches and libdrm changes to
g...@gitorious.org:linux-host1x/linux-host1x.git and
g...@gitorious.org:linux-host1x/libdrm-host1x.git.

They're not intended to compete with any other repository - I just
wanted to have one place where people can download the kernel patches,
libdrm changes and test suite. I'll remove them once they've served
their purpose.

I'd appreciate feedback on the patches. So far the only feedback has
been from Stephen about clock changes.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 7/8] ARM: tegra: Add board data and 2D clocks

2013-01-16 Thread Terje Bergström
On 15.01.2013 20:44, Stephen Warren wrote:
> On 01/15/2013 04:26 AM, Terje Bergstrom wrote:
>> Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
>> of 2D clock to that driver alias.
> 
> FYI on this one patch - it won't be applied to the Tegra tree until
> after Prashant's common clock framework changes are applied. As such, it
> will need some rework once those patches are applied, or perhaps won't
> even be relevant any more; see below.

It looks like I need to just drop the patch 7(8 "ARM: tegra: Add board
data and 2D clocks" and change the 2D code to access the clock without a
name. Do you agree?

I'll roll this into the next patch version once I've gathered more comments.

diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c
index dc7d6c6..a1c5f5c 100644
--- a/drivers/gpu/host1x/drm/gr2d.c
+++ b/drivers/gpu/host1x/drm/gr2d.c
@@ -259,7 +259,7 @@ static int gr2d_probe(struct platform_device *dev)
if (!gr2d)
return -ENOMEM;

-   gr2d->clk = devm_clk_get(>dev, "gr2d");
+   gr2d->clk = devm_clk_get(>dev, NULL);
if (IS_ERR(gr2d->clk)) {
dev_err(>dev, "cannot get clock\n");
return PTR_ERR(gr2d->clk);

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 7/8] ARM: tegra: Add board data and 2D clocks

2013-01-16 Thread Terje Bergström
On 15.01.2013 20:44, Stephen Warren wrote:
 On 01/15/2013 04:26 AM, Terje Bergstrom wrote:
 Add a driver alias gr2d for Tegra 2D device, and assign a duplicate
 of 2D clock to that driver alias.
 
 FYI on this one patch - it won't be applied to the Tegra tree until
 after Prashant's common clock framework changes are applied. As such, it
 will need some rework once those patches are applied, or perhaps won't
 even be relevant any more; see below.

It looks like I need to just drop the patch 7(8 ARM: tegra: Add board
data and 2D clocks and change the 2D code to access the clock without a
name. Do you agree?

I'll roll this into the next patch version once I've gathered more comments.

diff --git a/drivers/gpu/host1x/drm/gr2d.c b/drivers/gpu/host1x/drm/gr2d.c
index dc7d6c6..a1c5f5c 100644
--- a/drivers/gpu/host1x/drm/gr2d.c
+++ b/drivers/gpu/host1x/drm/gr2d.c
@@ -259,7 +259,7 @@ static int gr2d_probe(struct platform_device *dev)
if (!gr2d)
return -ENOMEM;

-   gr2d-clk = devm_clk_get(dev-dev, gr2d);
+   gr2d-clk = devm_clk_get(dev-dev, NULL);
if (IS_ERR(gr2d-clk)) {
dev_err(dev-dev, cannot get clock\n);
return PTR_ERR(gr2d-clk);

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 7/8] ARM: tegra: Add board data and 2D clocks

2013-01-15 Thread Terje Bergström
On 15.01.2013 20:44, Stephen Warren wrote:
>> diff --git a/arch/arm/mach-tegra/board-dt-tegra20.c 
>> b/arch/arm/mach-tegra/board-dt-tegra20.c
> 
>> +OF_DEV_AUXDATA("nvidia,tegra20-gr2d", 0x5414, "gr2d", NULL),
> 
> I assume the only reason to add AUXDATA is to give the device a specific
> name, which will then match the driver name in the clock driver:
> 
>> -CLK_DUPLICATE("2d", "tegra_grhost", "gr2d"),
>> +CLK_DUPLICATE("2d", "gr2d", "gr2d"),
> 
> If so, this shouldn't be needed once the common clock framework patches
> are applied, since all device clocks will be retrieved from device tree,
> and hence the device name will be irrelevant; the phandle in device tree
> is all that will matter.

Yes, clock binding is the only reason for the OF_DEV_AUXDATA line. I'll
need to look into Prashant's clock changes, but I assume it's going to
be a trivial change to host1x patches.

Thanks for the heads-up.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-15 Thread Terje Bergström
On 15.01.2013 13:30, Thierry Reding wrote:
> Sorry for not getting back to you on this earlier. I just remembered
> this thread when I saw Terje's latest patch series.
> 
> I agree that having everything in one location will make things a lot
> easier, even if it means we have to add the tegra-drm driver to a new
> location. In the long run I think this will pay off, though.
> 
> That said, I see that Terje has chosen this approach in his latest
> series, so it's all good.

*whew* Thanks Thierry. I'm not entirely sure that drivers/gpu/host1x is
the correct location, though - host1x looks pretty lonely there. If
anybody has a strong opinion about the location, I'm willing to adjust.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv5 7/8] ARM: tegra: Add board data and 2D clocks

2013-01-15 Thread Terje Bergström
On 15.01.2013 20:44, Stephen Warren wrote:
 diff --git a/arch/arm/mach-tegra/board-dt-tegra20.c 
 b/arch/arm/mach-tegra/board-dt-tegra20.c
 
 +OF_DEV_AUXDATA(nvidia,tegra20-gr2d, 0x5414, gr2d, NULL),
 
 I assume the only reason to add AUXDATA is to give the device a specific
 name, which will then match the driver name in the clock driver:
 
 -CLK_DUPLICATE(2d, tegra_grhost, gr2d),
 +CLK_DUPLICATE(2d, gr2d, gr2d),
 
 If so, this shouldn't be needed once the common clock framework patches
 are applied, since all device clocks will be retrieved from device tree,
 and hence the device name will be irrelevant; the phandle in device tree
 is all that will matter.

Yes, clock binding is the only reason for the OF_DEV_AUXDATA line. I'll
need to look into Prashant's clock changes, but I assume it's going to
be a trivial change to host1x patches.

Thanks for the heads-up.

Terje

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-15 Thread Terje Bergström
On 15.01.2013 13:30, Thierry Reding wrote:
 Sorry for not getting back to you on this earlier. I just remembered
 this thread when I saw Terje's latest patch series.
 
 I agree that having everything in one location will make things a lot
 easier, even if it means we have to add the tegra-drm driver to a new
 location. In the long run I think this will pay off, though.
 
 That said, I see that Terje has chosen this approach in his latest
 series, so it's all good.

*whew* Thanks Thierry. I'm not entirely sure that drivers/gpu/host1x is
the correct location, though - host1x looks pretty lonely there. If
anybody has a strong opinion about the location, I'm willing to adjust.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-07 Thread Terje Bergström
On 04.01.2013 22:25, Stephen Warren wrote:
> On 01/04/2013 03:09 AM, Terje Bergström wrote:
> ...
>> I think we have now two ways to go forward with cons and pros:
>>  1) Keep host1x and tegra-drm as separate driver
>>+ Code almost done
>>- we need dummy device and dummy driver
>>- extra code and API when host1x creates dummy device and its passed
>> to tegra-drm
> 
> Just to play devil's advocate:
> 
> I suspect that's only a few lines of code.

Yes, that's true. There's some overhead, but there's not too many actual
code lines.

>>- tegra-drm device would need to be a child of host1x device. Having
>> virtual and real devices as host1x children sounds weird.
> 
> And I doubt that would cause problems.

True. It could become a problem if the driver just assumed that all
host1x children are actual hardware, but we could avoid that.

>>  2) Merge host1x and tegra-drm into one module. drm is a subcomponent,
>> and whatever other personalities we wish would also be subcomponents of
>> host1x. host1x calls tegra-drm directly to handle preparation for drm
>> initialization. As they're in the same module, circular dependency is ok.
>>+ Simpler conceptually (no dummy device/driver)
>>+ Less code
>>- Proposal doesn't yet exist
> 
> But that said, I agree this approach would be very reasonable; it seems
> to me that host1x really is the main HW behind a DRM driver or a V4L2
> driver or ... As such, it seems quite reasonable for a single struct
> device to exist that represents host1x, and for the driver for that
> device to register both a DRM and a V4L2 driver etc. The code could
> physically be organized into separate modules, and under different
> Kconfig options for configurability etc.
> 
> But either way, I'll let you (Thierry and Terje) work out which way to go.

If we want separate modules, we'd need the dummy device & dummy driver
binding between them. We could also just put them in the same module.
It'd make DRM a requirement to host1x driver, but given the current
structure, I think that'd be reasonable. We could later make it more
configurable if needed. Does this now make tegra-drm and host1x too
dependent on each other? I'm not sure.

I also like the fact that we don't have to export APIs to include, but
we can (if we so choose) keep all tegra-drm-host1x-APIs in local header
files. As we have noted, the two drivers are tightly interconnected,
changing the APIs is easier if we can just work on the same directory
hierarchy.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-07 Thread Terje Bergström
On 04.01.2013 22:25, Stephen Warren wrote:
 On 01/04/2013 03:09 AM, Terje Bergström wrote:
 ...
 I think we have now two ways to go forward with cons and pros:
  1) Keep host1x and tegra-drm as separate driver
+ Code almost done
- we need dummy device and dummy driver
- extra code and API when host1x creates dummy device and its passed
 to tegra-drm
 
 Just to play devil's advocate:
 
 I suspect that's only a few lines of code.

Yes, that's true. There's some overhead, but there's not too many actual
code lines.

- tegra-drm device would need to be a child of host1x device. Having
 virtual and real devices as host1x children sounds weird.
 
 And I doubt that would cause problems.

True. It could become a problem if the driver just assumed that all
host1x children are actual hardware, but we could avoid that.

  2) Merge host1x and tegra-drm into one module. drm is a subcomponent,
 and whatever other personalities we wish would also be subcomponents of
 host1x. host1x calls tegra-drm directly to handle preparation for drm
 initialization. As they're in the same module, circular dependency is ok.
+ Simpler conceptually (no dummy device/driver)
+ Less code
- Proposal doesn't yet exist
 
 But that said, I agree this approach would be very reasonable; it seems
 to me that host1x really is the main HW behind a DRM driver or a V4L2
 driver or ... As such, it seems quite reasonable for a single struct
 device to exist that represents host1x, and for the driver for that
 device to register both a DRM and a V4L2 driver etc. The code could
 physically be organized into separate modules, and under different
 Kconfig options for configurability etc.
 
 But either way, I'll let you (Thierry and Terje) work out which way to go.

If we want separate modules, we'd need the dummy device  dummy driver
binding between them. We could also just put them in the same module.
It'd make DRM a requirement to host1x driver, but given the current
structure, I think that'd be reasonable. We could later make it more
configurable if needed. Does this now make tegra-drm and host1x too
dependent on each other? I'm not sure.

I also like the fact that we don't have to export APIs to include, but
we can (if we so choose) keep all tegra-drm-host1x-APIs in local header
files. As we have noted, the two drivers are tightly interconnected,
changing the APIs is easier if we can just work on the same directory
hierarchy.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-04 Thread Terje Bergström
On 21.12.2012 23:19, Stephen Warren wrote:
> I see the situation more like:
> 
> * There's host1x hardware.
> 
> * There's a low-level driver just for host1x itself; the host1x driver.
> 
> * There's a high-level driver for the entire host1x complex of devices.
> That is tegradrm. There may be more high-level drivers in the future
> (e.g. v4l2 camera driver if it needs to aggregate a bunch of host1x
> sub-devices liek tegradrm does).
> 
> * The presence of the host1x DT node logically implies that both the two
> drivers in the previous two points should be instantiated.
> 
> * Linux instantiates a single device per DT node.
> 
> * So, it's host1x's responsibility to instantiate the other device(s). I
> think it's reasonable for the host1x driver to know exactly what
> features the host1x HW complex supports; raw host1x access being one,
> and the higher level concept of a tegradrm being another. So, it's
> reasonable for host1x to trigger the instantiation of tegradrm.
> 
> * If DRM core didn't stomp on the device object's drvdata (or whichever
> field it is), I would recommend not creating a dummy device at all, but
> rather having the host1x driver directly implement multiple driver
> interfaces. There are many examples like this already in the kernel,
> e.g. combined GPIO+IRQ driver, combined GPIO+IRQ+pinctrl driver, etc.

We had a phone call with Stephen yesterday, and I finally understood why
unbroken chain from DT to tegra-drm is important. The goals are to be
able to modprobe tegra-drm without ill effects, and to auto-load
tegra-drm module. I had been chasing a totally different goal.

Sorry about all the churn.

I think we have now two ways to go forward with cons and pros:
 1) Keep host1x and tegra-drm as separate driver
   + Code almost done
   - we need dummy device and dummy driver
   - extra code and API when host1x creates dummy device and its passed
to tegra-drm
   - tegra-drm device would need to be a child of host1x device. Having
virtual and real devices as host1x children sounds weird.

 2) Merge host1x and tegra-drm into one module. drm is a subcomponent,
and whatever other personalities we wish would also be subcomponents of
host1x. host1x calls tegra-drm directly to handle preparation for drm
initialization. As they're in the same module, circular dependency is ok.
   + Simpler conceptually (no dummy device/driver)
   + Less code
   - Proposal doesn't yet exist

Thierry, what do you think? I'd prefer option 2, as that keeps things
simple and still fulfills the requirements. We probably would redo the
patch "Remove redundant host1x" to actually move drm under host1x, and
adds calls from host1x to parse_dt() directly. We'd probably leave the
code otherwise mostly as it is now, so we'll drop whatever modifications
we've done so far in my proposals. You can later pick up some things
from our proposals if you want, but it's then up to you.

We could also later think about generalizing some of the list management
to belong to host1x, but that'd be a follow-up and we can decide that later.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-04 Thread Terje Bergström
On 21.12.2012 23:19, Stephen Warren wrote:
 I see the situation more like:
 
 * There's host1x hardware.
 
 * There's a low-level driver just for host1x itself; the host1x driver.
 
 * There's a high-level driver for the entire host1x complex of devices.
 That is tegradrm. There may be more high-level drivers in the future
 (e.g. v4l2 camera driver if it needs to aggregate a bunch of host1x
 sub-devices liek tegradrm does).
 
 * The presence of the host1x DT node logically implies that both the two
 drivers in the previous two points should be instantiated.
 
 * Linux instantiates a single device per DT node.
 
 * So, it's host1x's responsibility to instantiate the other device(s). I
 think it's reasonable for the host1x driver to know exactly what
 features the host1x HW complex supports; raw host1x access being one,
 and the higher level concept of a tegradrm being another. So, it's
 reasonable for host1x to trigger the instantiation of tegradrm.
 
 * If DRM core didn't stomp on the device object's drvdata (or whichever
 field it is), I would recommend not creating a dummy device at all, but
 rather having the host1x driver directly implement multiple driver
 interfaces. There are many examples like this already in the kernel,
 e.g. combined GPIO+IRQ driver, combined GPIO+IRQ+pinctrl driver, etc.

We had a phone call with Stephen yesterday, and I finally understood why
unbroken chain from DT to tegra-drm is important. The goals are to be
able to modprobe tegra-drm without ill effects, and to auto-load
tegra-drm module. I had been chasing a totally different goal.

Sorry about all the churn.

I think we have now two ways to go forward with cons and pros:
 1) Keep host1x and tegra-drm as separate driver
   + Code almost done
   - we need dummy device and dummy driver
   - extra code and API when host1x creates dummy device and its passed
to tegra-drm
   - tegra-drm device would need to be a child of host1x device. Having
virtual and real devices as host1x children sounds weird.

 2) Merge host1x and tegra-drm into one module. drm is a subcomponent,
and whatever other personalities we wish would also be subcomponents of
host1x. host1x calls tegra-drm directly to handle preparation for drm
initialization. As they're in the same module, circular dependency is ok.
   + Simpler conceptually (no dummy device/driver)
   + Less code
   - Proposal doesn't yet exist

Thierry, what do you think? I'd prefer option 2, as that keeps things
simple and still fulfills the requirements. We probably would redo the
patch Remove redundant host1x to actually move drm under host1x, and
adds calls from host1x to parse_dt() directly. We'd probably leave the
code otherwise mostly as it is now, so we'll drop whatever modifications
we've done so far in my proposals. You can later pick up some things
from our proposals if you want, but it's then up to you.

We could also later think about generalizing some of the list management
to belong to host1x, but that'd be a follow-up and we can decide that later.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 5/8] drm: tegra: Remove redundant host1x

2013-01-03 Thread Terje Bergström
On 21.12.2012 16:36, Thierry Reding wrote:
> On Fri, Dec 21, 2012 at 01:39:21PM +0200, Terje Bergstrom wrote:
>> +static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
>> +{
>> +
>> +}
>> +
> 
> This can be removed, right?

Yes, done.

> 
>> +static struct platform_driver tegra_drm_platform_driver = {
>> +.driver = {
>> +.name = "tegradrm",
> 
> This should be "tegra-drm" to match the module name.

Done.

>> -struct host1x_client;
>> +struct tegra_drm_client;
> 
> I don't see the point in renaming this. All of the devices are still
> host1x clients, right? This patch would be a whole shorter if we didn't
> rename these. None of these symbols are exported either so there's not
> much chance for them to clash with anything.

Yep, we renamed it back to make the patch smaller.

>> diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
>> new file mode 100644
>> index 000..8632f49
>> --- /dev/null
>> +++ b/include/drm/tegra_drm.h
>> @@ -0,0 +1,20 @@
>> +/*
>> + * Copyright (c) 2012, NVIDIA CORPORATION.  All rights reserved.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + * You should have received a copy of the GNU General Public License
>> + * along with this program.  If not, see .
>> + */
>> +
>> +#ifndef _TEGRA_DRM_H_
>> +#define _TEGRA_DRM_H_
>> +
>> +#endif
> 
> This can be removed as well.

Removed.

I posted another proposal on how to handle initialization in tegradrm.
It removes a lot of code and relies more on platform_bus keeping track
of devices. Have you had time to look into it?

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 5/8] drm: tegra: Remove redundant host1x

2013-01-03 Thread Terje Bergström
On 21.12.2012 16:36, Thierry Reding wrote:
 On Fri, Dec 21, 2012 at 01:39:21PM +0200, Terje Bergstrom wrote:
 +static void tegra_drm_close(struct drm_device *drm, struct drm_file *filp)
 +{
 +
 +}
 +
 
 This can be removed, right?

Yes, done.

 
 +static struct platform_driver tegra_drm_platform_driver = {
 +.driver = {
 +.name = tegradrm,
 
 This should be tegra-drm to match the module name.

Done.

 -struct host1x_client;
 +struct tegra_drm_client;
 
 I don't see the point in renaming this. All of the devices are still
 host1x clients, right? This patch would be a whole shorter if we didn't
 rename these. None of these symbols are exported either so there's not
 much chance for them to clash with anything.

Yep, we renamed it back to make the patch smaller.

 diff --git a/include/drm/tegra_drm.h b/include/drm/tegra_drm.h
 new file mode 100644
 index 000..8632f49
 --- /dev/null
 +++ b/include/drm/tegra_drm.h
 @@ -0,0 +1,20 @@
 +/*
 + * Copyright (c) 2012, NVIDIA CORPORATION.  All rights reserved.
 + *
 + * This program is free software; you can redistribute it and/or modify it
 + * under the terms and conditions of the GNU General Public License,
 + * version 2, as published by the Free Software Foundation.
 + *
 + * This program is distributed in the hope it will be useful, but WITHOUT
 + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 + * more details.
 + *
 + * You should have received a copy of the GNU General Public License
 + * along with this program.  If not, see http://www.gnu.org/licenses/.
 + */
 +
 +#ifndef _TEGRA_DRM_H_
 +#define _TEGRA_DRM_H_
 +
 +#endif
 
 This can be removed as well.

Removed.

I posted another proposal on how to handle initialization in tegradrm.
It removes a lot of code and relies more on platform_bus keeping track
of devices. Have you had time to look into it?

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 21.12.2012 15:50, Lucas Stach wrote:
> Am Freitag, den 21.12.2012, 13:39 +0200 schrieb Terje Bergstrom:
>> Some of the issues left open:
>>  * Register definitions still use static inline. There has been a
>>debate about code style versus ability to use compiler type
>>checking and code coverage analysis. There was no conclusion, so
>>I left it as was.
> This has to be resolved before merging. Personally I'm in favour of
> keeping reg access patterns close to what is done in other parts of the
> kernel.

How about if I define inline functions, and #defines to proxy to them?
Something like:

static inline u32 host1x_sync_cfpeek_ctrl_r(void)
{
return 0x74c;
}
#define HOST1X_SYNC_CFPEEK_CTRL \
host1x_sync_cfpeek_ctrl_r()

static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v)
{
return (v & 0x1) << 31;
}
#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(v) \
host1x_sync_cfpeek_ctrl_cfpeek_ena_f(v)

I'd use the macros in .c files. That way the references will look
familiar to reader of .c files, but we can still get the benefits of
compiler processing (type checking, better error codes etc) and gcov
coverage tracking.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 03.01.2013 05:31, Mark Zhang wrote:
> Sorry I didn't get it. Yes, in current design, you can pin all mem
> handles in one time but I haven't found anything related with "locking
> only once per submit".
> 
> My idea is:
> - remove "job->addr_phys"
> - assign "job->reloc_addr_phys" & "job->gather_addr_phys" separately
> - In "pin_job_mem", just call "host1x_memmgr_pin_array_ids" twice to
> fill the "reloc_addr_phy" & "gather_addr_phys".
> 
> Anything I misunderstood?

The current design uses CMA, which makes pinning basically a no-op. When
we have IOMMU support, pinning requires calling into IOMMU. Updating
SMMU tables requires locking, and probably maintenance before SMMU code
also requires its own locked data tables. For example, preventing
duplicate pinning might require a global table of handles.

Putting all of the handles in one table allows doing duplicate detection
across cmdbuf and reloc tables. This allows pinning each buffer exactly
once, which reduces number of calls to IOMMU.

> "host1x_cma_pin_array_ids" doesn't return negative value right now, so
> maybe you need to take a look at it.

True, and also a consequence of using CMA: pinning can never fail. With
IOMMU, pinning can fail.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 3/8] gpu: host1x: Add channel support

2013-01-02 Thread Terje Bergström
On 02.01.2013 11:31, Mark Zhang wrote:
> On 01/02/2013 05:31 PM, Terje Bergström wrote:
>> That's intentional. Writing a job to channel is atomic, so lock is taken
>> from host1x_cdma_begin() until host1x_cdma_end().
>>
> 
> Okay. So can we consider that lock and unlock this mutex in the function
> which calls host1x_cdma_begin and host1x_cdma_end, that means, in
> current implementation, function "channel_submit"?

Yes, exactly.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 28.12.2012 11:14, Mark Zhang wrote:
> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
> index a936902..c3ded60 100644
> --- a/drivers/gpu/drm/tegra/gr2d.c
> +++ b/drivers/gpu/drm/tegra/gr2d.c
> @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context
> *context,
> if (err)
> goto fail;
> 
> +   /* We define CMA as an temporary solution in host1x driver.
> +  That's also why we have a CMA kernel config in it.
> +  But seems here, in tegradrm, we hardcode the CMA here.
> +  So what do you do when host1x change to IOMMU?
> +  Do we also need to define a CMA config in tegradrm
> driver,
> +  then after host1x changes to IOMMU, we add another IOMMU
> +  config in tegradrm? Or we should invent another more
> +  generic wrapper layer here? */
> cmdbuf.mem = handle_cma_to_host1x(drm, file_priv,
> cmdbuf.mem);

Lucas is working on host1x allocator, so let's defer this comment until
we have Lucas' code.

> diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
> index cc8ca2f..0867b72 100644
> --- a/drivers/gpu/host1x/job.c
> +++ b/drivers/gpu/host1x/job.c
> @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct
> host1x_channel *ch,
> mem += num_unpins * sizeof(dma_addr_t);
> job->pin_ids = num_unpins ? mem : NULL;
> 
> +   /* I think this is a somewhat ugly design.
> +  Actually "addr_phys" is consisted by "reloc_addr_phys" and
> +  "gather_addr_phys".
> +  Why don't we just declare "reloc_addr_phys" and
> "gather_addr_phys"?
> +  In current design, let's say if one nvhost newbie changes the
> order
> +  of these 2 blocks of code in function "pin_job_mem":
> +
> +  for (i = 0; i < job->num_relocs; i++) {
> +   struct host1x_reloc *reloc = >relocarray[i];
> +   job->pin_ids[count] = reloc->target;
> +   count++;
> +  }
> +
> +  for (i = 0; i < job->num_gathers; i++) {
> +   struct host1x_job_gather *g = >gathers[i];
> +   job->pin_ids[count] = g->mem_id;
> +   count++;
> +  }
> +
> +  Then likely something weird occurs... */

We do this because this way we can implement batch pinning of memory
handles. This way we can decrease memory handle management a lot as we
need to do locking only once per submit.

Decreasing memory management overhead is really important, because in
complex graphics cases, there are might be a hundreds of buffer
references per submit, and several submits per frame. Any extra overhead
relates directly to reduced performance.

> job->reloc_addr_phys = job->addr_phys;
> job->gather_addr_phys = >addr_phys[num_relocs];
> 
> @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job,
> }
> }
> 
> +   /* I have question here. Does this mean the address info
> +  which need to be relocated(according to the libdrm codes,
> +  seems this address is "0xDEADBEEF") always staying at the
> +  beginning of a page? */
> __raw_writel(
> (job->reloc_addr_phys[i] +
> reloc->target_offset) >> reloc->shift,

No - the slot can be anywhere. That's why we have cmdbuf_offset in the
reloc struct.

> @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct
> platform_device *pdev)
> }
> }
> 
> +   /* if (host1x_firewall && !err) { */
> if (host1x_firewall) {
> err = copy_gathers(job, pdev);
> if (err) {

Will add.

> @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct
> platform_device *pdev)
> }
> }
> 
> +/* Rename this label to "out" or something else.
> +   Because if everything goes right, the codes under this label also
> +   get executed. */
>  fail:
> wmb();

Will do.

> 
> diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
> index f3954f7..bb5763d 100644
> --- a/drivers/gpu/host1x/memmgr.c
> +++ b/drivers/gpu/host1x/memmgr.c
> @@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct
> platform_device *dev,
> count, _data[pin_count],
> phys_addr);
> 
> +   /* I don't think the current "host1x_cma_pin_array_ids"
> +  is able to return a negative value. So this "if" doesn't
> +  make sense...*/
> if (cma_count < 0) {
> /* clean up previous handles */
> while (pin_count) {

It should return negative in case of error.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a 

Re: [PATCHv4 3/8] gpu: host1x: Add channel support

2013-01-02 Thread Terje Bergström
On 02.01.2013 09:40, Mark Zhang wrote:
> On 12/21/2012 07:39 PM, Terje Bergstrom wrote:
>> Add support for host1x client modules, and host1x channels to submit
>> work to the clients. The work is submitted in GEM CMA buffers, so
>> this patch adds support for them.
>>
>> Signed-off-by: Terje Bergstrom 
>> ---
> [...]
>> +/*
>> + * Begin a cdma submit
>> + */
>> +int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
>> +{
>> +struct host1x *host1x = cdma_to_host1x(cdma);
>> +
>> +mutex_lock(>lock);
>> +
>> +if (job->timeout) {
>> +/* init state on first submit with timeout value */
>> +if (!cdma->timeout.initialized) {
>> +int err;
>> +err = host1x->cdma_op.timeout_init(cdma,
>> +job->syncpt_id);
>> +if (err) {
>> +mutex_unlock(>lock);
>> +return err;
>> +}
>> +}
>> +}
>> +if (!cdma->running)
>> +host1x->cdma_op.start(cdma);
>> +
>> +cdma->slots_free = 0;
>> +cdma->slots_used = 0;
>> +cdma->first_get = host1x->cdma_pb_op.putptr(>push_buffer);
>> +
>> +trace_host1x_cdma_begin(job->ch->dev->name);
> 
> Seems missing "mutex_unlock(>lock);" here.

That's intentional. Writing a job to channel is atomic, so lock is taken
from host1x_cdma_begin() until host1x_cdma_end().

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 3/8] gpu: host1x: Add channel support

2013-01-02 Thread Terje Bergström
On 22.12.2012 06:17, Steven Rostedt wrote:
> On Fri, 2012-12-21 at 13:39 +0200, Terje Bergstrom wrote:
>> +TRACE_EVENT(host1x_cdma_begin,
>> +TP_PROTO(const char *name),
>> +
>> +TP_ARGS(name),
>> +
>> +TP_STRUCT__entry(
>> +__field(const char *, name)
>> +),
>> +
>> +TP_fast_assign(
>> +__entry->name = name;
>> +),
>> +
>> +TP_printk("name=%s",
>> +__entry->name)
>> +);
>> +
>> +TRACE_EVENT(host1x_cdma_end,
>> +TP_PROTO(const char *name),
>> +
>> +TP_ARGS(name),
>> +
>> +TP_STRUCT__entry(
>> +__field(const char *, name)
>> +),
>> +
>> +TP_fast_assign(
>> +__entry->name = name;
>> +),
>> +
>> +TP_printk("name=%s",
>> +__entry->name)
>> +);
> 
> The above two should be combined into a DECLARE_EVENT_CLASS() and
> DEFINE_EVENT()s. Saves text and data space that way.

Will do.

(...)
> If you can combine any of these others into DECLARE_EVENT_CLASS() and
> DEFINE_EVENT()s, please do.

I didn't find others with same parameters.

Thanks!

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 26.12.2012 11:42, Mark Zhang wrote:
> diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
> index a936902..28954b3 100644
> --- a/drivers/gpu/drm/tegra/gr2d.c
> +++ b/drivers/gpu/drm/tegra/gr2d.c
> @@ -247,6 +247,10 @@ static int __devinit gr2d_probe(struct
> platform_device *dev)
>  {
> int err;
> struct gr2d *gr2d = NULL;
> +   /* Cause drm_device is created in host1x driver. So
> +  if host1x drivers loads after tegradrm, then in this
> +  gr2d_probe function, this "drm_device" will be NULL.
> +  How can we handle this? Defer driver probe? */

I will push a new proposal about how the devices & drivers get probed.

> struct platform_device *drm_device =
> host1x_drm_device(to_platform_device(dev->dev.parent));
> struct tegradrm *tegradrm = platform_get_drvdata(drm_device);
> @@ -273,6 +277,11 @@ static int __devinit gr2d_probe(struct
> platform_device *dev)
> 
> gr2d->syncpt = host1x_syncpt_alloc(dev, 0);
> if (!gr2d->syncpt) {
> +   /* Do we really need this?
> +  After "host1x_channel_alloc", the refcount of this
> +  channel is 0. So calling host1x_channel_put here will
> +  make the refcount going to negative.
> +  I suppose we should call "host1x_free_channel" here. */

True. I need to also export host1x_channel_free (I will change the name,
too).

> host1x_channel_put(gr2d->channel);
> return -ENOMEM;
> }
> @@ -284,6 +293,7 @@ static int __devinit gr2d_probe(struct
> platform_device *dev)
> 
> err = tegra_drm_register_client(tegradrm, >client);
> if (err)
> +   /* Add "host1x_free_channel" */

Will do.

> return err;
> 
> platform_set_drvdata(dev, gr2d);
> diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
> index 3705cae..ed83b9e 100644
> --- a/drivers/gpu/host1x/channel.c
> +++ b/drivers/gpu/host1x/channel.c
> @@ -95,6 +95,9 @@ struct host1x_channel *host1x_channel_alloc(struct
> platform_device *pdev)
> int max_channels = host1x->info.nb_channels;
> int err;
> 
> +   /* Is it necessary to add mutex protection here?
> +  I'm just wondering in a smp system, multiple host1x clients
> +  may try to alloc their channels simultaneously... */

I'll add locking.

> if (chindex > max_channels)
> return NULL;
> 
> diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
> index 86d5c70..f2bd5aa 100644
> --- a/drivers/gpu/host1x/debug.c
> +++ b/drivers/gpu/host1x/debug.c
> @@ -164,6 +164,10 @@ static const struct file_operations
> host1x_debug_fops = {
> 
>  void host1x_debug_init(struct host1x *host1x)
>  {
> +   /* I think it's better to set this directory name the same with
> +  the driver's name -- defined in dev.c:
> +  #define DRIVER_NAME  "tegra-host1x"
> +  */
> struct dentry *de = debugfs_create_dir("tegra_host", NULL);

Will do.

> 
> if (!de)
> diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
> index 07e8813..01ed10d 100644
> --- a/drivers/gpu/host1x/dev.c
> +++ b/drivers/gpu/host1x/dev.c
> @@ -38,6 +38,7 @@
> 
>  struct host1x *host1x;
> 
> +/* Called by drm unlocked ioctl function. So do we need a lock here? */
>  void host1x_syncpt_incr_byid(u32 id)

No, host1x_syncpt_incr_byid is SMP safe.

>  {
> struct host1x_syncpt *sp = host1x->syncpt + id;
> @@ -52,6 +53,7 @@ u32 host1x_syncpt_read_byid(u32 id)
>  }
>  EXPORT_SYMBOL(host1x_syncpt_read_byid);
> 
> +/* Called by drm unlocked ioctl function. So do we need a lock here? */
>  int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)

Same here, SMP safe.

>  {
> struct host1x_syncpt *sp = host1x->syncpt + id;
> @@ -161,6 +163,8 @@ static int host1x_probe(struct platform_device *dev)
> 
> err = host1x_alloc_resources(host);
> if (err) {
> +   /* We don't have chip_ops right now, so here the
> +  error message is somewhat improper */
> dev_err(>dev, "failed to init chip support\n");
> goto fail;
> }

Actually, alloc_resources only allocates intr->syncpt, so I the code to
host1x_intr_init().

> @@ -175,6 +179,14 @@ static int host1x_probe(struct platform_device *dev)
> if (!host->syncpt)
> goto fail;
> 
> +   /* I suggest create a dedicate function for initializing nop sp.
> +  First this "_host1x_syncpt_alloc" looks like an internal/static
> +  function.
> +  Then actually "host1x_syncpt_alloc" & "_host1x_syncpt_alloc" all
> +  serve host1x client(e.g: gr2d) so it's not suitable to use them
> +  for nop sp.
> +  Just create a wrapper function to call _host1x_syncpt_alloc is OK.
> +  This will make the code more 

Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 26.12.2012 11:42, Mark Zhang wrote:
 diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
 index a936902..28954b3 100644
 --- a/drivers/gpu/drm/tegra/gr2d.c
 +++ b/drivers/gpu/drm/tegra/gr2d.c
 @@ -247,6 +247,10 @@ static int __devinit gr2d_probe(struct
 platform_device *dev)
  {
 int err;
 struct gr2d *gr2d = NULL;
 +   /* Cause drm_device is created in host1x driver. So
 +  if host1x drivers loads after tegradrm, then in this
 +  gr2d_probe function, this drm_device will be NULL.
 +  How can we handle this? Defer driver probe? */

I will push a new proposal about how the devices  drivers get probed.

 struct platform_device *drm_device =
 host1x_drm_device(to_platform_device(dev-dev.parent));
 struct tegradrm *tegradrm = platform_get_drvdata(drm_device);
 @@ -273,6 +277,11 @@ static int __devinit gr2d_probe(struct
 platform_device *dev)
 
 gr2d-syncpt = host1x_syncpt_alloc(dev, 0);
 if (!gr2d-syncpt) {
 +   /* Do we really need this?
 +  After host1x_channel_alloc, the refcount of this
 +  channel is 0. So calling host1x_channel_put here will
 +  make the refcount going to negative.
 +  I suppose we should call host1x_free_channel here. */

True. I need to also export host1x_channel_free (I will change the name,
too).

 host1x_channel_put(gr2d-channel);
 return -ENOMEM;
 }
 @@ -284,6 +293,7 @@ static int __devinit gr2d_probe(struct
 platform_device *dev)
 
 err = tegra_drm_register_client(tegradrm, gr2d-client);
 if (err)
 +   /* Add host1x_free_channel */

Will do.

 return err;
 
 platform_set_drvdata(dev, gr2d);
 diff --git a/drivers/gpu/host1x/channel.c b/drivers/gpu/host1x/channel.c
 index 3705cae..ed83b9e 100644
 --- a/drivers/gpu/host1x/channel.c
 +++ b/drivers/gpu/host1x/channel.c
 @@ -95,6 +95,9 @@ struct host1x_channel *host1x_channel_alloc(struct
 platform_device *pdev)
 int max_channels = host1x-info.nb_channels;
 int err;
 
 +   /* Is it necessary to add mutex protection here?
 +  I'm just wondering in a smp system, multiple host1x clients
 +  may try to alloc their channels simultaneously... */

I'll add locking.

 if (chindex  max_channels)
 return NULL;
 
 diff --git a/drivers/gpu/host1x/debug.c b/drivers/gpu/host1x/debug.c
 index 86d5c70..f2bd5aa 100644
 --- a/drivers/gpu/host1x/debug.c
 +++ b/drivers/gpu/host1x/debug.c
 @@ -164,6 +164,10 @@ static const struct file_operations
 host1x_debug_fops = {
 
  void host1x_debug_init(struct host1x *host1x)
  {
 +   /* I think it's better to set this directory name the same with
 +  the driver's name -- defined in dev.c:
 +  #define DRIVER_NAME  tegra-host1x
 +  */
 struct dentry *de = debugfs_create_dir(tegra_host, NULL);

Will do.

 
 if (!de)
 diff --git a/drivers/gpu/host1x/dev.c b/drivers/gpu/host1x/dev.c
 index 07e8813..01ed10d 100644
 --- a/drivers/gpu/host1x/dev.c
 +++ b/drivers/gpu/host1x/dev.c
 @@ -38,6 +38,7 @@
 
  struct host1x *host1x;
 
 +/* Called by drm unlocked ioctl function. So do we need a lock here? */
  void host1x_syncpt_incr_byid(u32 id)

No, host1x_syncpt_incr_byid is SMP safe.

  {
 struct host1x_syncpt *sp = host1x-syncpt + id;
 @@ -52,6 +53,7 @@ u32 host1x_syncpt_read_byid(u32 id)
  }
  EXPORT_SYMBOL(host1x_syncpt_read_byid);
 
 +/* Called by drm unlocked ioctl function. So do we need a lock here? */
  int host1x_syncpt_wait_byid(u32 id, u32 thresh, long timeout, u32 *value)

Same here, SMP safe.

  {
 struct host1x_syncpt *sp = host1x-syncpt + id;
 @@ -161,6 +163,8 @@ static int host1x_probe(struct platform_device *dev)
 
 err = host1x_alloc_resources(host);
 if (err) {
 +   /* We don't have chip_ops right now, so here the
 +  error message is somewhat improper */
 dev_err(dev-dev, failed to init chip support\n);
 goto fail;
 }

Actually, alloc_resources only allocates intr-syncpt, so I the code to
host1x_intr_init().

 @@ -175,6 +179,14 @@ static int host1x_probe(struct platform_device *dev)
 if (!host-syncpt)
 goto fail;
 
 +   /* I suggest create a dedicate function for initializing nop sp.
 +  First this _host1x_syncpt_alloc looks like an internal/static
 +  function.
 +  Then actually host1x_syncpt_alloc  _host1x_syncpt_alloc all
 +  serve host1x client(e.g: gr2d) so it's not suitable to use them
 +  for nop sp.
 +  Just create a wrapper function to call _host1x_syncpt_alloc is OK.
 +  This will make the code more readable. */
 host-nop_sp = _host1x_syncpt_alloc(host, NULL, 0);

_host1x_syncpt_alloc is an internal function, not exported.

Re: [PATCHv4 3/8] gpu: host1x: Add channel support

2013-01-02 Thread Terje Bergström
On 22.12.2012 06:17, Steven Rostedt wrote:
 On Fri, 2012-12-21 at 13:39 +0200, Terje Bergstrom wrote:
 +TRACE_EVENT(host1x_cdma_begin,
 +TP_PROTO(const char *name),
 +
 +TP_ARGS(name),
 +
 +TP_STRUCT__entry(
 +__field(const char *, name)
 +),
 +
 +TP_fast_assign(
 +__entry-name = name;
 +),
 +
 +TP_printk(name=%s,
 +__entry-name)
 +);
 +
 +TRACE_EVENT(host1x_cdma_end,
 +TP_PROTO(const char *name),
 +
 +TP_ARGS(name),
 +
 +TP_STRUCT__entry(
 +__field(const char *, name)
 +),
 +
 +TP_fast_assign(
 +__entry-name = name;
 +),
 +
 +TP_printk(name=%s,
 +__entry-name)
 +);
 
 The above two should be combined into a DECLARE_EVENT_CLASS() and
 DEFINE_EVENT()s. Saves text and data space that way.

Will do.

(...)
 If you can combine any of these others into DECLARE_EVENT_CLASS() and
 DEFINE_EVENT()s, please do.

I didn't find others with same parameters.

Thanks!

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 3/8] gpu: host1x: Add channel support

2013-01-02 Thread Terje Bergström
On 02.01.2013 09:40, Mark Zhang wrote:
 On 12/21/2012 07:39 PM, Terje Bergstrom wrote:
 Add support for host1x client modules, and host1x channels to submit
 work to the clients. The work is submitted in GEM CMA buffers, so
 this patch adds support for them.

 Signed-off-by: Terje Bergstrom tbergst...@nvidia.com
 ---
 [...]
 +/*
 + * Begin a cdma submit
 + */
 +int host1x_cdma_begin(struct host1x_cdma *cdma, struct host1x_job *job)
 +{
 +struct host1x *host1x = cdma_to_host1x(cdma);
 +
 +mutex_lock(cdma-lock);
 +
 +if (job-timeout) {
 +/* init state on first submit with timeout value */
 +if (!cdma-timeout.initialized) {
 +int err;
 +err = host1x-cdma_op.timeout_init(cdma,
 +job-syncpt_id);
 +if (err) {
 +mutex_unlock(cdma-lock);
 +return err;
 +}
 +}
 +}
 +if (!cdma-running)
 +host1x-cdma_op.start(cdma);
 +
 +cdma-slots_free = 0;
 +cdma-slots_used = 0;
 +cdma-first_get = host1x-cdma_pb_op.putptr(cdma-push_buffer);
 +
 +trace_host1x_cdma_begin(job-ch-dev-name);
 
 Seems missing mutex_unlock(cdma-lock); here.

That's intentional. Writing a job to channel is atomic, so lock is taken
from host1x_cdma_begin() until host1x_cdma_end().

Terje

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 28.12.2012 11:14, Mark Zhang wrote:
 diff --git a/drivers/gpu/drm/tegra/gr2d.c b/drivers/gpu/drm/tegra/gr2d.c
 index a936902..c3ded60 100644
 --- a/drivers/gpu/drm/tegra/gr2d.c
 +++ b/drivers/gpu/drm/tegra/gr2d.c
 @@ -131,6 +131,14 @@ static int gr2d_submit(struct tegra_drm_context
 *context,
 if (err)
 goto fail;
 
 +   /* We define CMA as an temporary solution in host1x driver.
 +  That's also why we have a CMA kernel config in it.
 +  But seems here, in tegradrm, we hardcode the CMA here.
 +  So what do you do when host1x change to IOMMU?
 +  Do we also need to define a CMA config in tegradrm
 driver,
 +  then after host1x changes to IOMMU, we add another IOMMU
 +  config in tegradrm? Or we should invent another more
 +  generic wrapper layer here? */
 cmdbuf.mem = handle_cma_to_host1x(drm, file_priv,
 cmdbuf.mem);

Lucas is working on host1x allocator, so let's defer this comment until
we have Lucas' code.

 diff --git a/drivers/gpu/host1x/job.c b/drivers/gpu/host1x/job.c
 index cc8ca2f..0867b72 100644
 --- a/drivers/gpu/host1x/job.c
 +++ b/drivers/gpu/host1x/job.c
 @@ -82,6 +82,26 @@ struct host1x_job *host1x_job_alloc(struct
 host1x_channel *ch,
 mem += num_unpins * sizeof(dma_addr_t);
 job-pin_ids = num_unpins ? mem : NULL;
 
 +   /* I think this is a somewhat ugly design.
 +  Actually addr_phys is consisted by reloc_addr_phys and
 +  gather_addr_phys.
 +  Why don't we just declare reloc_addr_phys and
 gather_addr_phys?
 +  In current design, let's say if one nvhost newbie changes the
 order
 +  of these 2 blocks of code in function pin_job_mem:
 +
 +  for (i = 0; i  job-num_relocs; i++) {
 +   struct host1x_reloc *reloc = job-relocarray[i];
 +   job-pin_ids[count] = reloc-target;
 +   count++;
 +  }
 +
 +  for (i = 0; i  job-num_gathers; i++) {
 +   struct host1x_job_gather *g = job-gathers[i];
 +   job-pin_ids[count] = g-mem_id;
 +   count++;
 +  }
 +
 +  Then likely something weird occurs... */

We do this because this way we can implement batch pinning of memory
handles. This way we can decrease memory handle management a lot as we
need to do locking only once per submit.

Decreasing memory management overhead is really important, because in
complex graphics cases, there are might be a hundreds of buffer
references per submit, and several submits per frame. Any extra overhead
relates directly to reduced performance.

 job-reloc_addr_phys = job-addr_phys;
 job-gather_addr_phys = job-addr_phys[num_relocs];
 
 @@ -252,6 +272,10 @@ static int do_relocs(struct host1x_job *job,
 }
 }
 
 +   /* I have question here. Does this mean the address info
 +  which need to be relocated(according to the libdrm codes,
 +  seems this address is 0xDEADBEEF) always staying at the
 +  beginning of a page? */
 __raw_writel(
 (job-reloc_addr_phys[i] +
 reloc-target_offset)  reloc-shift,

No - the slot can be anywhere. That's why we have cmdbuf_offset in the
reloc struct.

 @@ -565,6 +589,7 @@ int host1x_job_pin(struct host1x_job *job, struct
 platform_device *pdev)
 }
 }
 
 +   /* if (host1x_firewall  !err) { */
 if (host1x_firewall) {
 err = copy_gathers(job, pdev);
 if (err) {

Will add.

 @@ -573,6 +598,9 @@ int host1x_job_pin(struct host1x_job *job, struct
 platform_device *pdev)
 }
 }
 
 +/* Rename this label to out or something else.
 +   Because if everything goes right, the codes under this label also
 +   get executed. */
  fail:
 wmb();

Will do.

 
 diff --git a/drivers/gpu/host1x/memmgr.c b/drivers/gpu/host1x/memmgr.c
 index f3954f7..bb5763d 100644
 --- a/drivers/gpu/host1x/memmgr.c
 +++ b/drivers/gpu/host1x/memmgr.c
 @@ -156,6 +156,9 @@ int host1x_memmgr_pin_array_ids(struct
 platform_device *dev,
 count, unpin_data[pin_count],
 phys_addr);
 
 +   /* I don't think the current host1x_cma_pin_array_ids
 +  is able to return a negative value. So this if doesn't
 +  make sense...*/
 if (cma_count  0) {
 /* clean up previous handles */
 while (pin_count) {

It should return negative in case of error.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  

Re: [PATCHv4 3/8] gpu: host1x: Add channel support

2013-01-02 Thread Terje Bergström
On 02.01.2013 11:31, Mark Zhang wrote:
 On 01/02/2013 05:31 PM, Terje Bergström wrote:
 That's intentional. Writing a job to channel is atomic, so lock is taken
 from host1x_cdma_begin() until host1x_cdma_end().

 
 Okay. So can we consider that lock and unlock this mutex in the function
 which calls host1x_cdma_begin and host1x_cdma_end, that means, in
 current implementation, function channel_submit?

Yes, exactly.

Terje

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 03.01.2013 05:31, Mark Zhang wrote:
 Sorry I didn't get it. Yes, in current design, you can pin all mem
 handles in one time but I haven't found anything related with locking
 only once per submit.
 
 My idea is:
 - remove job-addr_phys
 - assign job-reloc_addr_phys  job-gather_addr_phys separately
 - In pin_job_mem, just call host1x_memmgr_pin_array_ids twice to
 fill the reloc_addr_phy  gather_addr_phys.
 
 Anything I misunderstood?

The current design uses CMA, which makes pinning basically a no-op. When
we have IOMMU support, pinning requires calling into IOMMU. Updating
SMMU tables requires locking, and probably maintenance before SMMU code
also requires its own locked data tables. For example, preventing
duplicate pinning might require a global table of handles.

Putting all of the handles in one table allows doing duplicate detection
across cmdbuf and reloc tables. This allows pinning each buffer exactly
once, which reduces number of calls to IOMMU.

 host1x_cma_pin_array_ids doesn't return negative value right now, so
 maybe you need to take a look at it.

True, and also a consequence of using CMA: pinning can never fail. With
IOMMU, pinning can fail.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2013-01-02 Thread Terje Bergström
On 21.12.2012 15:50, Lucas Stach wrote:
 Am Freitag, den 21.12.2012, 13:39 +0200 schrieb Terje Bergstrom:
 Some of the issues left open:
  * Register definitions still use static inline. There has been a
debate about code style versus ability to use compiler type
checking and code coverage analysis. There was no conclusion, so
I left it as was.
 This has to be resolved before merging. Personally I'm in favour of
 keeping reg access patterns close to what is done in other parts of the
 kernel.

How about if I define inline functions, and #defines to proxy to them?
Something like:

static inline u32 host1x_sync_cfpeek_ctrl_r(void)
{
return 0x74c;
}
#define HOST1X_SYNC_CFPEEK_CTRL \
host1x_sync_cfpeek_ctrl_r()

static inline u32 host1x_sync_cfpeek_ctrl_cfpeek_ena_f(u32 v)
{
return (v  0x1)  31;
}
#define HOST1X_SYNC_CFPEEK_CTRL_CFPEEK_ENA_F(v) \
host1x_sync_cfpeek_ctrl_cfpeek_ena_f(v)

I'd use the macros in .c files. That way the references will look
familiar to reader of .c files, but we can still get the benefits of
compiler processing (type checking, better error codes etc) and gcov
coverage tracking.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-01 Thread Terje Bergström
On 21.12.2012 23:19, Stephen Warren wrote:
> * There's host1x hardware.
> 
> * There's a low-level driver just for host1x itself; the host1x driver.
> 
> * There's a high-level driver for the entire host1x complex of devices.
> That is tegradrm. There may be more high-level drivers in the future
> (e.g. v4l2 camera driver if it needs to aggregate a bunch of host1x
> sub-devices liek tegradrm does).

tegradrm is a driver for a couple of the host1x clients, namely DC and
HDMI, and now adding 2D.

> * The presence of the host1x DT node logically implies that both the two
> drivers in the previous two points should be instantiated.
> 
> * Linux instantiates a single device per DT node.
> 
> * So, it's host1x's responsibility to instantiate the other device(s). I
> think it's reasonable for the host1x driver to know exactly what
> features the host1x HW complex supports; raw host1x access being one,
> and the higher level concept of a tegradrm being another. So, it's
> reasonable for host1x to trigger the instantiation of tegradrm.

tegradrm has drivers for each device that it controls, so tegradrm via
kernel device-driver model instantiates them. host1x driver doesn't need
to do that.

> * If DRM core didn't stomp on the device object's drvdata (or whichever
> field it is), I would recommend not creating a dummy device at all, but
> rather having the host1x driver directly implement multiple driver
> interfaces. There are many examples like this already in the kernel,
> e.g. combined GPIO+IRQ driver, combined GPIO+IRQ+pinctrl driver, etc.

This is the architecture that would imply host1x instantiating
everything, and DRM being a subcomponent of host1x driver. But we didn't
choose to go there. We agreed to have tegradrm and host1x drivers separate.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 6/8] gpu: drm: tegra: Remove redundant host1x

2013-01-01 Thread Terje Bergström
On 21.12.2012 23:19, Stephen Warren wrote:
 * There's host1x hardware.
 
 * There's a low-level driver just for host1x itself; the host1x driver.
 
 * There's a high-level driver for the entire host1x complex of devices.
 That is tegradrm. There may be more high-level drivers in the future
 (e.g. v4l2 camera driver if it needs to aggregate a bunch of host1x
 sub-devices liek tegradrm does).

tegradrm is a driver for a couple of the host1x clients, namely DC and
HDMI, and now adding 2D.

 * The presence of the host1x DT node logically implies that both the two
 drivers in the previous two points should be instantiated.
 
 * Linux instantiates a single device per DT node.
 
 * So, it's host1x's responsibility to instantiate the other device(s). I
 think it's reasonable for the host1x driver to know exactly what
 features the host1x HW complex supports; raw host1x access being one,
 and the higher level concept of a tegradrm being another. So, it's
 reasonable for host1x to trigger the instantiation of tegradrm.

tegradrm has drivers for each device that it controls, so tegradrm via
kernel device-driver model instantiates them. host1x driver doesn't need
to do that.

 * If DRM core didn't stomp on the device object's drvdata (or whichever
 field it is), I would recommend not creating a dummy device at all, but
 rather having the host1x driver directly implement multiple driver
 interfaces. There are many examples like this already in the kernel,
 e.g. combined GPIO+IRQ driver, combined GPIO+IRQ+pinctrl driver, etc.

This is the architecture that would imply host1x instantiating
everything, and DRM being a subcomponent of host1x driver. But we didn't
choose to go there. We agreed to have tegradrm and host1x drivers separate.

Terje
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 5/8] drm: tegra: Remove redundant host1x

2012-12-30 Thread Terje Bergström
On 28.12.2012 23:21, Thierry Reding wrote:
> Instead of going over this back and forth, I've decided to rewrite this
> patch from scratch the way I think it should be done. Maybe that'll make
> things clearer. I haven't tested it on real hardware yet because I don't
> have access over the holidays, but I'll post the patch once I've
> verified that it actually works. The code is based on patches 1-4 of
> this series and is meant to replace patch 5.

I'm still not happy that host1x needs to know about drm. That's just a
wrong dependency, and wrong dependencies always end up biting back. We
need to figure out solution that satisfies both mine and your
requirements and reduce complexity.

DC/HDMI/GR2D probes are using the global data only for managing the
lists "drm_clients" and "clients". "clients" list is only required after
tegra_drm_load(). "drm_clients" is required to establish driver load order.

With dummy device, we can determine the registration (and probe) order.
tegra-drm can register the drivers for DC/HDMI/GR2D devices first, and
as last the device and driver for tegra-drm.

tegra-drm probe will allocate the global data, enumerate all drivers and
add them to the clients list. If one driver is not initialized, it'll
return with -EPROBE_DEFER and will be called again later. When all this
is successful, it'll call drm_platform_init().

The advantages:
 * No management of drm_clients list
 * No mucking with drvdata
 * host1x doesn't need to know about drm
 * The global data is allocated and deallocated by tegra-drm probe and
remove, and accessed only via drm_device->dev_private
 * Much less code

Something like the attached patch - not tested, as I don't have access
to hw now, but it shows the idea. It's based on patches 1-5 in the
series, and could replace patch 5.

Terje
>From a97d475d65e68ce37c345924171dc57c5e7729ee Mon Sep 17 00:00:00 2001
From: Terje Bergstrom 
Date: Sat, 29 Dec 2012 11:43:54 +0200
Subject: [PATCH] drm: tegra: Postpone usage of global data

Change-Id: Ibfd1d4eeb267ac185de4508a1547fb009b80e93a
Signed-off-by: Terje Bergstrom 
---
 drivers/gpu/drm/tegra/dc.c   |   18 -
 drivers/gpu/drm/tegra/drm.c  |  151 +-
 drivers/gpu/drm/tegra/drm.h  |1 -
 drivers/gpu/drm/tegra/gr2d.c |7 --
 drivers/gpu/drm/tegra/hdmi.c |   18 -
 5 files changed, 47 insertions(+), 148 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 24bcd06..73056b7 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -740,7 +740,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 	struct resource *regs;
 	struct tegra_dc *dc;
 	int err;
-	struct tegradrm *tegradrm = platform_get_drvdata(pdev);
 
 	dc = devm_kzalloc(>dev, sizeof(*dc), GFP_KERNEL);
 	if (!dc)
@@ -780,7 +779,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 	INIT_LIST_HEAD(>client.list);
 	dc->client.ops = _client_ops;
 	dc->client.dev = >dev;
-	dc->client.tegradrm = tegradrm;
 
 	err = tegra_dc_rgb_probe(dc);
 	if (err < 0 && err != -ENODEV) {
@@ -788,13 +786,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 		return err;
 	}
 
-	err = tegra_drm_register_client(tegradrm, >client);
-	if (err < 0) {
-		dev_err(>dev, "failed to register tegra drm client: %d\n",
-			err);
-		return err;
-	}
-
 	platform_set_drvdata(pdev, dc);
 
 	return 0;
@@ -803,15 +794,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 static int tegra_dc_remove(struct platform_device *pdev)
 {
 	struct tegra_dc *dc = platform_get_drvdata(pdev);
-	int err;
-
-	err = tegra_drm_unregister_client(dc->client.tegradrm,
-			>client);
-	if (err < 0) {
-		dev_err(>dev, "failed to unregister tegra_drm client: %d\n",
-			err);
-		return err;
-	}
 
 	clk_disable_unprepare(dc->clk);
 
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 637b621..520c281 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -25,12 +25,6 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
-struct tegra_drm_client_entry {
-	struct tegra_drm_client *client;
-	struct device_node *np;
-	struct list_head list;
-};
-
 static int tegra_drm_add_client(struct device *dev, void *data)
 {
 	static const char * const compat[] = {
@@ -42,102 +36,26 @@ static int tegra_drm_add_client(struct device *dev, void *data)
 		"nvidia,tegra30-gr2d"
 	};
 	struct tegradrm *tegradrm = data;
-	struct tegra_drm_client_entry *client;
+	struct tegra_drm_client *client;
 	unsigned int i;
 
+	/* Check if dev is a tegra-drm device, and add to client list */
 	for (i = 0; i < ARRAY_SIZE(compat); i++) {
 		if (of_device_is_compatible(dev->of_node, compat[i]) &&
 		of_device_is_available(dev->of_node)) {
-			client = kzalloc(sizeof(*client), GFP_KERNEL);
+			client = dev_get_drvdata(dev);
 			if (!client)
-return -ENOMEM;
+return -EPROBE_DEFER;
 
 			INIT_LIST_HEAD(>list);
-			client->np = of_node_get(dev->of_node);
 
-			

Re: [PATCHv4 5/8] drm: tegra: Remove redundant host1x

2012-12-30 Thread Terje Bergström
On 28.12.2012 23:21, Thierry Reding wrote:
 Instead of going over this back and forth, I've decided to rewrite this
 patch from scratch the way I think it should be done. Maybe that'll make
 things clearer. I haven't tested it on real hardware yet because I don't
 have access over the holidays, but I'll post the patch once I've
 verified that it actually works. The code is based on patches 1-4 of
 this series and is meant to replace patch 5.

I'm still not happy that host1x needs to know about drm. That's just a
wrong dependency, and wrong dependencies always end up biting back. We
need to figure out solution that satisfies both mine and your
requirements and reduce complexity.

DC/HDMI/GR2D probes are using the global data only for managing the
lists drm_clients and clients. clients list is only required after
tegra_drm_load(). drm_clients is required to establish driver load order.

With dummy device, we can determine the registration (and probe) order.
tegra-drm can register the drivers for DC/HDMI/GR2D devices first, and
as last the device and driver for tegra-drm.

tegra-drm probe will allocate the global data, enumerate all drivers and
add them to the clients list. If one driver is not initialized, it'll
return with -EPROBE_DEFER and will be called again later. When all this
is successful, it'll call drm_platform_init().

The advantages:
 * No management of drm_clients list
 * No mucking with drvdata
 * host1x doesn't need to know about drm
 * The global data is allocated and deallocated by tegra-drm probe and
remove, and accessed only via drm_device-dev_private
 * Much less code

Something like the attached patch - not tested, as I don't have access
to hw now, but it shows the idea. It's based on patches 1-5 in the
series, and could replace patch 5.

Terje
From a97d475d65e68ce37c345924171dc57c5e7729ee Mon Sep 17 00:00:00 2001
From: Terje Bergstrom tbergst...@nvidia.com
Date: Sat, 29 Dec 2012 11:43:54 +0200
Subject: [PATCH] drm: tegra: Postpone usage of global data

Change-Id: Ibfd1d4eeb267ac185de4508a1547fb009b80e93a
Signed-off-by: Terje Bergstrom tbergst...@nvidia.com
---
 drivers/gpu/drm/tegra/dc.c   |   18 -
 drivers/gpu/drm/tegra/drm.c  |  151 +-
 drivers/gpu/drm/tegra/drm.h  |1 -
 drivers/gpu/drm/tegra/gr2d.c |7 --
 drivers/gpu/drm/tegra/hdmi.c |   18 -
 5 files changed, 47 insertions(+), 148 deletions(-)

diff --git a/drivers/gpu/drm/tegra/dc.c b/drivers/gpu/drm/tegra/dc.c
index 24bcd06..73056b7 100644
--- a/drivers/gpu/drm/tegra/dc.c
+++ b/drivers/gpu/drm/tegra/dc.c
@@ -740,7 +740,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 	struct resource *regs;
 	struct tegra_dc *dc;
 	int err;
-	struct tegradrm *tegradrm = platform_get_drvdata(pdev);
 
 	dc = devm_kzalloc(pdev-dev, sizeof(*dc), GFP_KERNEL);
 	if (!dc)
@@ -780,7 +779,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 	INIT_LIST_HEAD(dc-client.list);
 	dc-client.ops = dc_client_ops;
 	dc-client.dev = pdev-dev;
-	dc-client.tegradrm = tegradrm;
 
 	err = tegra_dc_rgb_probe(dc);
 	if (err  0  err != -ENODEV) {
@@ -788,13 +786,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 		return err;
 	}
 
-	err = tegra_drm_register_client(tegradrm, dc-client);
-	if (err  0) {
-		dev_err(pdev-dev, failed to register tegra drm client: %d\n,
-			err);
-		return err;
-	}
-
 	platform_set_drvdata(pdev, dc);
 
 	return 0;
@@ -803,15 +794,6 @@ static int tegra_dc_probe(struct platform_device *pdev)
 static int tegra_dc_remove(struct platform_device *pdev)
 {
 	struct tegra_dc *dc = platform_get_drvdata(pdev);
-	int err;
-
-	err = tegra_drm_unregister_client(dc-client.tegradrm,
-			dc-client);
-	if (err  0) {
-		dev_err(pdev-dev, failed to unregister tegra_drm client: %d\n,
-			err);
-		return err;
-	}
 
 	clk_disable_unprepare(dc-clk);
 
diff --git a/drivers/gpu/drm/tegra/drm.c b/drivers/gpu/drm/tegra/drm.c
index 637b621..520c281 100644
--- a/drivers/gpu/drm/tegra/drm.c
+++ b/drivers/gpu/drm/tegra/drm.c
@@ -25,12 +25,6 @@
 #define DRIVER_MINOR 0
 #define DRIVER_PATCHLEVEL 0
 
-struct tegra_drm_client_entry {
-	struct tegra_drm_client *client;
-	struct device_node *np;
-	struct list_head list;
-};
-
 static int tegra_drm_add_client(struct device *dev, void *data)
 {
 	static const char * const compat[] = {
@@ -42,102 +36,26 @@ static int tegra_drm_add_client(struct device *dev, void *data)
 		nvidia,tegra30-gr2d
 	};
 	struct tegradrm *tegradrm = data;
-	struct tegra_drm_client_entry *client;
+	struct tegra_drm_client *client;
 	unsigned int i;
 
+	/* Check if dev is a tegra-drm device, and add to client list */
 	for (i = 0; i  ARRAY_SIZE(compat); i++) {
 		if (of_device_is_compatible(dev-of_node, compat[i]) 
 		of_device_is_available(dev-of_node)) {
-			client = kzalloc(sizeof(*client), GFP_KERNEL);
+			client = dev_get_drvdata(dev);
 			if (!client)
-return -ENOMEM;
+return -EPROBE_DEFER;
 
 			INIT_LIST_HEAD(client-list);
-			client-np = 

Re: [PATCHv4 5/8] drm: tegra: Remove redundant host1x

2012-12-21 Thread Terje Bergström
On 21.12.2012 16:36, Thierry Reding wrote:
> On Fri, Dec 21, 2012 at 01:39:21PM +0200, Terje Bergstrom wrote:
>> +static struct platform_driver tegra_drm_platform_driver = {
>> +.driver = {
>> +.name = "tegradrm",
> 
> This should be "tegra-drm" to match the module name.

We've actually created two problems.

First is that the device name should match driver name which should
match module name. But host1x doesn't know the module name of tegradrm.

Second problem is that host1x driver creates tegradrm device even if
tegradrm isn't loaded to system.

These mean that the device has to be created in tegra-drm module to have
access to the module name. So instead of just getter, we need a getter
and a setter.

Terje
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHv4 0/8] Support for Tegra 2D hardware

2012-12-21 Thread Terje Bergström
On 21.12.2012 15:50, Lucas Stach wrote:
> This has to be resolved before merging. Personally I'm in favour of
> keeping reg access patterns close to what is done in other parts of the
> kernel.

I haven't so far received comments from too many people, so that's why I
left it as is.

>>  * Uses still CMA helpers. IOMMU support will replace CMA with host1x
>>specific allocator.
> 
> I really want the allocator issue resolved before talking about merging
> those patches. Proper IOMMU support will require a bit of rework of the
> current upstream IOMMU API, so it will still be a bit out.
> 
> But if you don't mind I'll try to implement the host1x allocator over
> the holidays and provide an incremental patch.

Do you mean host1x CMA allocator using dma mapping API? That'd be great
if you could help out. I've just tried to concentrate on getting ok for
base essentials before doing any extra work.

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   >