Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Terje Bergström
Just replying to part of your mail.

On 30.11.2012 09:22, Thierry Reding wrote:
> Actually for the display controller we want just a notification when the
> VBLANK happens. I'm not sure if we want to do that with syncpoints at
> all since it works quite well using regular interrupts.

VBLANK isn't actually a very good example of dc's use of sync points.
That can easily be done with regular interrupts, as you mention.

More important is when we have double buffering enabled. When you draw
something to a surface, and flip it to display, you want DC to notify
when the flip has been done and rendering can continue to the back buffer.

So, what you can do is return a fence from DC when initiating a flip,
and place that fence into 2D stream as a host wait so that 2D will
patiently wait for buffer to become free before it renders.

> What I'm proposing is to leave it up to each host1x client how they want
> to handle this. For display controllers it may be enough to have their
> callback run in interrupt context but other clients may need to do more
> work so they can queue it themselves.

DC doesn't need to worry about host1x interrupts at all. It's all
internal to the host1x driver, so we're now just talking about the
internal implementation of host1x.

We have two scenarios for the syncpt interrupts. One is that a job got
finished and we need to clean up the queue and free up resources. This
must be done in threads. Other is releasing a thread that is blocked by
a syncpt wait.

It's simpler if both of these are handled with the same infrastructure,
and we've shown that latency is very good even if we handle all events
in a thread.

> I know that this looks like it might be more work, but if it turns out
> that many drivers need to do the exact same thing, that functionality
> can be factored out into a helper. But it may just as well turn out that
> the requirements for each module are slightly different that forcing a
> workqueue on them could result in ugly workarounds because it doesn't
> quite work for them.

This is just driver internal, so there's no need for other drivers to
access this part.

> If we move responsibility of managing the workqueue out of host1x as I
> proposed above, maybe a lot of this code can be removed. Maybe you can
> explain a bit what they are used for exactly in your write-up.

It's going to be a big bad boy. :-)

Terje

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Thierry Reding
On Thu, Nov 29, 2012 at 11:41:50AM -0700, Stephen Warren wrote:
> On 11/29/2012 01:44 AM, Thierry Reding wrote:
> > On Mon, Nov 26, 2012 at 03:19:08PM +0200, Terje Bergstrom wrote:
> 
> >> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c
> >> b/drivers/video/tegra/host/host1x/host1x_intr.c
> > [...]
> >> +/* Spacing between sync registers */ +#define REGISTER_STRIDE 4
> > 
> > Erm... no. The usual way you should be doing this is either make
> > the register definitions account for the stride or use accessors
> > that apply the stride. You should be doing the latter anyway to
> > make accesses. For example:
> > 
> > static inline void host1x_syncpt_writel(struct host1x *host1x, 
> > unsigned long value, unsigned long offset) { writel(value,
> > host1x->regs + SYNCPT_BASE + offset); }
> > 
> > static inline unsigned long host1x_syncpt_readl(struct host1x
> > *host1x, unsigned long offset) { return readl(host1x->regs +
> > SYNCPT_BASE + offset); }
> > 
> > Alternatively, if you want to pass the register index instead of
> > the offset, you can use just multiply the offset in that function:
> > 
> > writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
> > 
> > The same can also be done with the non-syncpt registers.
> 
> It seems like reasonable documentation to replace "<< 2" with "*
> REGISTER_STRIDE" here.

Given that it is a very common pattern, << 2 seems enough documentation
to me, but sure, if you prefer to be extra explicit that's fine with me.

Thierry


pgpYKjECswErR.pgp
Description: PGP signature


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Thierry Reding
On Thu, Nov 29, 2012 at 12:39:23PM +0200, Terje Bergström wrote:
> On 29.11.2012 10:44, Thierry Reding wrote:
> >> diff --git a/drivers/video/tegra/host/dev.c 
> >> b/drivers/video/tegra/host/dev.c
> >> index 98c9c9f..025a820 100644
> >> --- a/drivers/video/tegra/host/dev.c
> >> +++ b/drivers/video/tegra/host/dev.c
> >> @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
> >>  }
> >>  EXPORT_SYMBOL(host1x_syncpt_read);
> >>
> >> +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
> > 
> > The choice of data types is odd here. id refers to a syncpt so a better
> > choice would have been unsigned int because the size of the variable
> > doesn't actually matter. But as I already said in my reply to patch 1,
> > these are resources and should therefore better be abstracted through an
> > opaque pointer anyway.
> > 
> > timeout is usually signed long, so this function should reflect that. As
> > for the value this is probably fine as it will effectively be set from a
> > register value. Though you also cache them in software using atomics.
> 
> 32-bits is an architectural limit for the sync point id, so that's why I
> used it here.

But given that there are only 32 syncpoints they look rather costly, so
I don't expect more than a few hundred to ever be used in hardware,
right?

> But you're right - it doesn't really matter and could be changed to
> unsigned long.

I'd still opt for unsigned int. For no other reason than that it is how
other types of resources are enumerated.

> thresh and *value reflects that sync point value is 32-bit, and I'd keep
> that as is.

Yes, that makes sense.

> Timeout should be unsigned long, yes.

It should actually be signed long to match the type used for timeouts in
the various wait_*() functions.

> >> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c 
> >> b/drivers/video/tegra/host/host1x/host1x_intr.c
> > [...]
> >> +#include 
> >> +#include 
> >> +#include 
> >> +#include 
> >> +
> >> +#include "nvhost_intr.h"
> >> +#include "host1x/host1x.h"
> >> +
> >> +/* Spacing between sync registers */
> >> +#define REGISTER_STRIDE 4
> > 
> > Erm... no. The usual way you should be doing this is either make the
> > register definitions account for the stride or use accessors that apply
> > the stride. You should be doing the latter anyway to make accesses. For
> > example:
> > 
> > static inline void host1x_syncpt_writel(struct host1x *host1x,
> > unsigned long value,
> > unsigned long offset)
> > {
> > writel(value, host1x->regs + SYNCPT_BASE + offset);
> > }
> > 
> > static inline unsigned long host1x_syncpt_readl(struct host1x 
> > *host1x,
> > unsigned long 
> > offset)
> > {
> > return readl(host1x->regs + SYNCPT_BASE + offset);
> > }
> > 
> > Alternatively, if you want to pass the register index instead of the
> > offset, you can use just multiply the offset in that function:
> >
> > writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
> >
> > The same can also be done with the non-syncpt registers.
> 
> The register number has a stride of 4 when doing writes, and 1 when
> adding to command streams. This is why I've kept the register
> definitions as is.

Yes, that's why it makes sense to use such helpers. It allows you to
reuse the register definitions for both direct and indirect access but
doesn't require you to repeat the stride multiplication every time.

> I could add helper functions. Just as a side note, the sync register
> space has other definitions than just the syncpt registers, so the
> naming should be changed a bit.

The TRM refers to them as SYNC registers, so SYNC_BASE should be fine.

> >> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
> >> +{
> >> + struct nvhost_master *dev = dev_id;
> >> + void __iomem *sync_regs = dev->sync_aperture;
> >> + struct nvhost_intr *intr = >intr;
> >> + unsigned long reg;
> >> + int i, id;
> >> +
> >> + for (i = 0; i < dev->info.nb_pts / BITS_PER_LONG; i++) {
> >> + reg = readl(sync_regs +
> >> + 
> >> host1x_sync_syncpt_thresh_cpu0_int_status_r() +
> >> + i * REGISTER_STRIDE);
> >> + for_each_set_bit(id, , BITS_PER_LONG) {
> >> + struct nvhost_intr_syncpt *sp =
> >> + intr->syncpt + (i * BITS_PER_LONG + id);
> >> + host1x_intr_syncpt_thresh_isr(sp);
> >> + queue_work(intr->wq, >work);
> >> + }
> >> + }
> >> +
> >> + return IRQ_HANDLED;
> >> +}
> > 
> > Maybe it would be better to call the syncpt handlers in interrupt
> > context and let them schedule work if they want to. I'm thinking about
> > the display controllers which may want 

Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Stephen Warren
On 11/29/2012 01:44 AM, Thierry Reding wrote:
> On Mon, Nov 26, 2012 at 03:19:08PM +0200, Terje Bergstrom wrote:

>> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c
>> b/drivers/video/tegra/host/host1x/host1x_intr.c
> [...]
>> +/* Spacing between sync registers */ +#define REGISTER_STRIDE 4
> 
> Erm... no. The usual way you should be doing this is either make
> the register definitions account for the stride or use accessors
> that apply the stride. You should be doing the latter anyway to
> make accesses. For example:
> 
> static inline void host1x_syncpt_writel(struct host1x *host1x, 
> unsigned long value, unsigned long offset) { writel(value,
> host1x->regs + SYNCPT_BASE + offset); }
> 
> static inline unsigned long host1x_syncpt_readl(struct host1x
> *host1x, unsigned long offset) { return readl(host1x->regs +
> SYNCPT_BASE + offset); }
> 
> Alternatively, if you want to pass the register index instead of
> the offset, you can use just multiply the offset in that function:
> 
> writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
> 
> The same can also be done with the non-syncpt registers.

It seems like reasonable documentation to replace "<< 2" with "*
REGISTER_STRIDE" here.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Terje Bergström
On 29.11.2012 10:44, Thierry Reding wrote:
>> diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
>> index 98c9c9f..025a820 100644
>> --- a/drivers/video/tegra/host/dev.c
>> +++ b/drivers/video/tegra/host/dev.c
>> @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
>>  }
>>  EXPORT_SYMBOL(host1x_syncpt_read);
>>
>> +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
> 
> The choice of data types is odd here. id refers to a syncpt so a better
> choice would have been unsigned int because the size of the variable
> doesn't actually matter. But as I already said in my reply to patch 1,
> these are resources and should therefore better be abstracted through an
> opaque pointer anyway.
> 
> timeout is usually signed long, so this function should reflect that. As
> for the value this is probably fine as it will effectively be set from a
> register value. Though you also cache them in software using atomics.

32-bits is an architectural limit for the sync point id, so that's why I
used it here. But you're right - it doesn't really matter and could be
changed to unsigned long.

thresh and *value reflects that sync point value is 32-bit, and I'd keep
that as is.

Timeout should be unsigned long, yes.

>> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c 
>> b/drivers/video/tegra/host/host1x/host1x_intr.c
> [...]
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#include "nvhost_intr.h"
>> +#include "host1x/host1x.h"
>> +
>> +/* Spacing between sync registers */
>> +#define REGISTER_STRIDE 4
> 
> Erm... no. The usual way you should be doing this is either make the
> register definitions account for the stride or use accessors that apply
> the stride. You should be doing the latter anyway to make accesses. For
> example:
> 
> static inline void host1x_syncpt_writel(struct host1x *host1x,
> unsigned long value,
> unsigned long offset)
> {
> writel(value, host1x->regs + SYNCPT_BASE + offset);
> }
> 
> static inline unsigned long host1x_syncpt_readl(struct host1x *host1x,
> unsigned long offset)
> {
> return readl(host1x->regs + SYNCPT_BASE + offset);
> }
> 
> Alternatively, if you want to pass the register index instead of the
> offset, you can use just multiply the offset in that function:
>
> writel(value, host1x->regs + SYNCPT_BASE + (offset << 2));
>
> The same can also be done with the non-syncpt registers.

The register number has a stride of 4 when doing writes, and 1 when
adding to command streams. This is why I've kept the register
definitions as is.

I could add helper functions. Just as a side note, the sync register
space has other definitions than just the syncpt registers, so the
naming should be changed a bit.

>> +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
>> +{
>> + struct nvhost_master *dev = dev_id;
>> + void __iomem *sync_regs = dev->sync_aperture;
>> + struct nvhost_intr *intr = >intr;
>> + unsigned long reg;
>> + int i, id;
>> +
>> + for (i = 0; i < dev->info.nb_pts / BITS_PER_LONG; i++) {
>> + reg = readl(sync_regs +
>> + host1x_sync_syncpt_thresh_cpu0_int_status_r() +
>> + i * REGISTER_STRIDE);
>> + for_each_set_bit(id, , BITS_PER_LONG) {
>> + struct nvhost_intr_syncpt *sp =
>> + intr->syncpt + (i * BITS_PER_LONG + id);
>> + host1x_intr_syncpt_thresh_isr(sp);
>> + queue_work(intr->wq, >work);
>> + }
>> + }
>> +
>> + return IRQ_HANDLED;
>> +}
> 
> Maybe it would be better to call the syncpt handlers in interrupt
> context and let them schedule work if they want to. I'm thinking about
> the display controllers which may want to use syncpoints for VBLANK
> support.

Display controller can use the APIs to read, increment and wait for sync
point.

We could do more in isr, but then again, we've noticed that the current
design already gives pretty good latency, so we haven't seen the need to
move code from thread to isr.

>> +static void host1x_intr_init_host_sync(struct nvhost_intr *intr)
>> +{
>> + struct nvhost_master *dev = intr_to_dev(intr);
>> + void __iomem *sync_regs = dev->sync_aperture;
>> + int i, err, irq;
>> +
>> + writel(0xUL,
>> + sync_regs + host1x_sync_syncpt_thresh_int_disable_r());
>> + writel(0xUL,
>> + sync_regs + host1x_sync_syncpt_thresh_cpu0_int_status_r());
>> +
>> + for (i = 0; i < dev->info.nb_pts; i++)
>> + INIT_WORK(>syncpt[i].work, syncpt_thresh_cascade_fn);
>> +
>> + irq = platform_get_irq(dev->dev, 0);
>> + WARN_ON(IS_ERR_VALUE(irq));
>> + err = 

Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Thierry Reding
On Mon, Nov 26, 2012 at 03:19:08PM +0200, Terje Bergstrom wrote:
[...]
> diff --git a/drivers/video/tegra/host/chip_support.h 
> b/drivers/video/tegra/host/chip_support.h
[...]
> +struct nvhost_intr_ops {
> + void (*init_host_sync)(struct nvhost_intr *);
> + void (*set_host_clocks_per_usec)(
> + struct nvhost_intr *, u32 clocks);
> + void (*set_syncpt_threshold)(
> + struct nvhost_intr *, u32 id, u32 thresh);
> + void (*enable_syncpt_intr)(struct nvhost_intr *, u32 id);
> + void (*disable_syncpt_intr)(struct nvhost_intr *, u32 id);
> + void (*disable_all_syncpt_intrs)(struct nvhost_intr *);
> + int  (*request_host_general_irq)(struct nvhost_intr *);
> + void (*free_host_general_irq)(struct nvhost_intr *);
> + int (*free_syncpt_irq)(struct nvhost_intr *);
> +};
> +
>  struct nvhost_chip_support {
>   const char *soc_name;
>   struct nvhost_syncpt_ops syncpt;
> + struct nvhost_intr_ops intr;
>  };
>  
>  struct nvhost_chip_support *nvhost_get_chip_ops(void);
>  
>  #define syncpt_op()  (nvhost_get_chip_ops()->syncpt)
> +#define intr_op()(nvhost_get_chip_ops()->intr)
>  
>  int nvhost_init_chip_support(struct nvhost_master *host);
>  

The same comments apply as for patch 1. Reducing the number of
indirections here make things a lot easier.

> diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
> index 98c9c9f..025a820 100644
> --- a/drivers/video/tegra/host/dev.c
> +++ b/drivers/video/tegra/host/dev.c
> @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
>  }
>  EXPORT_SYMBOL(host1x_syncpt_read);
>  
> +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)

The choice of data types is odd here. id refers to a syncpt so a better
choice would have been unsigned int because the size of the variable
doesn't actually matter. But as I already said in my reply to patch 1,
these are resources and should therefore better be abstracted through an
opaque pointer anyway.

timeout is usually signed long, so this function should reflect that. As
for the value this is probably fine as it will effectively be set from a
register value. Though you also cache them in software using atomics.

> +static void clock_on_host(struct platform_device *dev)
> +{
> + struct nvhost_device_data *pdata = platform_get_drvdata(dev);
> + struct nvhost_master *host = nvhost_get_private_data(dev);
> + nvhost_intr_start(>intr, clk_get_rate(pdata->clk[0]));
> +}
> +
> +static int clock_off_host(struct platform_device *dev)
> +{
> + struct nvhost_master *host = nvhost_get_private_data(dev);
> + nvhost_intr_stop(>intr);
> + return 0;
> +}

This is a good example of why these indirections are wasteful. You
constantly need to look up the host pointer just to call another
function on it. With some modifications to the structure layouts it
should be possible to make this a lot more straightforward.

> diff --git a/drivers/video/tegra/host/host1x/host1x.h 
> b/drivers/video/tegra/host/host1x/host1x.h
> index 76748ac..af9bfef 100644
> --- a/drivers/video/tegra/host/host1x/host1x.h
> +++ b/drivers/video/tegra/host/host1x/host1x.h
> @@ -25,6 +25,7 @@
>  #include 
>  
>  #include "nvhost_syncpt.h"
> +#include "nvhost_intr.h"
>  
>  #define TRACE_MAX_LENGTH 128U
>  #define IFACE_NAME   "nvhost"
> @@ -33,6 +34,7 @@ struct nvhost_master {
>   void __iomem *aperture;
>   void __iomem *sync_aperture;
>   struct nvhost_syncpt syncpt;
> + struct nvhost_intr intr;
>   struct platform_device *dev;
>   struct host1x_device_info info;
>  };
> diff --git a/drivers/video/tegra/host/host1x/host1x01.c 
> b/drivers/video/tegra/host/host1x/host1x01.c
> index d53302d..5bf0e6e 100644
> --- a/drivers/video/tegra/host/host1x/host1x01.c
> +++ b/drivers/video/tegra/host/host1x/host1x01.c
> @@ -26,12 +26,14 @@
>  #include "chip_support.h"
>  
>  #include "host1x/host1x_syncpt.c"
> +#include "host1x/host1x_intr.c"
>  
>  int nvhost_init_host1x01_support(struct nvhost_master *host,
>   struct nvhost_chip_support *op)
>  {
>   host->sync_aperture = host->aperture + HOST1X_CHANNEL_SYNC_REG_BASE;
>   op->syncpt = host1x_syncpt_ops;
> + op->intr = host1x_intr_ops;
>  
>   return 0;
>  }

Also you need to touch a lot of files just to add this new feature. This
makes maintenance needlessly difficult.

> diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c 
> b/drivers/video/tegra/host/host1x/host1x_intr.c
[...]
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#include "nvhost_intr.h"
> +#include "host1x/host1x.h"
> +
> +/* Spacing between sync registers */
> +#define REGISTER_STRIDE 4

Erm... no. The usual way you should be doing this is either make the
register definitions account for the stride or use accessors that apply
the stride. You should be doing the latter anyway to make accesses. For
example:

static inline void host1x_syncpt_writel(struct host1x 

Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Terje Bergström
On 29.11.2012 10:44, Thierry Reding wrote:
 diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
 index 98c9c9f..025a820 100644
 --- a/drivers/video/tegra/host/dev.c
 +++ b/drivers/video/tegra/host/dev.c
 @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
  }
  EXPORT_SYMBOL(host1x_syncpt_read);

 +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
 
 The choice of data types is odd here. id refers to a syncpt so a better
 choice would have been unsigned int because the size of the variable
 doesn't actually matter. But as I already said in my reply to patch 1,
 these are resources and should therefore better be abstracted through an
 opaque pointer anyway.
 
 timeout is usually signed long, so this function should reflect that. As
 for the value this is probably fine as it will effectively be set from a
 register value. Though you also cache them in software using atomics.

32-bits is an architectural limit for the sync point id, so that's why I
used it here. But you're right - it doesn't really matter and could be
changed to unsigned long.

thresh and *value reflects that sync point value is 32-bit, and I'd keep
that as is.

Timeout should be unsigned long, yes.

 diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c 
 b/drivers/video/tegra/host/host1x/host1x_intr.c
 [...]
 +#include linux/interrupt.h
 +#include linux/irq.h
 +#include linux/io.h
 +#include asm/mach/irq.h
 +
 +#include nvhost_intr.h
 +#include host1x/host1x.h
 +
 +/* Spacing between sync registers */
 +#define REGISTER_STRIDE 4
 
 Erm... no. The usual way you should be doing this is either make the
 register definitions account for the stride or use accessors that apply
 the stride. You should be doing the latter anyway to make accesses. For
 example:
 
 static inline void host1x_syncpt_writel(struct host1x *host1x,
 unsigned long value,
 unsigned long offset)
 {
 writel(value, host1x-regs + SYNCPT_BASE + offset);
 }
 
 static inline unsigned long host1x_syncpt_readl(struct host1x *host1x,
 unsigned long offset)
 {
 return readl(host1x-regs + SYNCPT_BASE + offset);
 }
 
 Alternatively, if you want to pass the register index instead of the
 offset, you can use just multiply the offset in that function:

 writel(value, host1x-regs + SYNCPT_BASE + (offset  2));

 The same can also be done with the non-syncpt registers.

The register number has a stride of 4 when doing writes, and 1 when
adding to command streams. This is why I've kept the register
definitions as is.

I could add helper functions. Just as a side note, the sync register
space has other definitions than just the syncpt registers, so the
naming should be changed a bit.

 +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
 +{
 + struct nvhost_master *dev = dev_id;
 + void __iomem *sync_regs = dev-sync_aperture;
 + struct nvhost_intr *intr = dev-intr;
 + unsigned long reg;
 + int i, id;
 +
 + for (i = 0; i  dev-info.nb_pts / BITS_PER_LONG; i++) {
 + reg = readl(sync_regs +
 + host1x_sync_syncpt_thresh_cpu0_int_status_r() +
 + i * REGISTER_STRIDE);
 + for_each_set_bit(id, reg, BITS_PER_LONG) {
 + struct nvhost_intr_syncpt *sp =
 + intr-syncpt + (i * BITS_PER_LONG + id);
 + host1x_intr_syncpt_thresh_isr(sp);
 + queue_work(intr-wq, sp-work);
 + }
 + }
 +
 + return IRQ_HANDLED;
 +}
 
 Maybe it would be better to call the syncpt handlers in interrupt
 context and let them schedule work if they want to. I'm thinking about
 the display controllers which may want to use syncpoints for VBLANK
 support.

Display controller can use the APIs to read, increment and wait for sync
point.

We could do more in isr, but then again, we've noticed that the current
design already gives pretty good latency, so we haven't seen the need to
move code from thread to isr.

 +static void host1x_intr_init_host_sync(struct nvhost_intr *intr)
 +{
 + struct nvhost_master *dev = intr_to_dev(intr);
 + void __iomem *sync_regs = dev-sync_aperture;
 + int i, err, irq;
 +
 + writel(0xUL,
 + sync_regs + host1x_sync_syncpt_thresh_int_disable_r());
 + writel(0xUL,
 + sync_regs + host1x_sync_syncpt_thresh_cpu0_int_status_r());
 +
 + for (i = 0; i  dev-info.nb_pts; i++)
 + INIT_WORK(intr-syncpt[i].work, syncpt_thresh_cascade_fn);
 +
 + irq = platform_get_irq(dev-dev, 0);
 + WARN_ON(IS_ERR_VALUE(irq));
 + err = devm_request_irq(dev-dev-dev, irq,
 + syncpt_thresh_cascade_isr,
 +

Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Stephen Warren
On 11/29/2012 01:44 AM, Thierry Reding wrote:
 On Mon, Nov 26, 2012 at 03:19:08PM +0200, Terje Bergstrom wrote:

 diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c
 b/drivers/video/tegra/host/host1x/host1x_intr.c
 [...]
 +/* Spacing between sync registers */ +#define REGISTER_STRIDE 4
 
 Erm... no. The usual way you should be doing this is either make
 the register definitions account for the stride or use accessors
 that apply the stride. You should be doing the latter anyway to
 make accesses. For example:
 
 static inline void host1x_syncpt_writel(struct host1x *host1x, 
 unsigned long value, unsigned long offset) { writel(value,
 host1x-regs + SYNCPT_BASE + offset); }
 
 static inline unsigned long host1x_syncpt_readl(struct host1x
 *host1x, unsigned long offset) { return readl(host1x-regs +
 SYNCPT_BASE + offset); }
 
 Alternatively, if you want to pass the register index instead of
 the offset, you can use just multiply the offset in that function:
 
 writel(value, host1x-regs + SYNCPT_BASE + (offset  2));
 
 The same can also be done with the non-syncpt registers.

It seems like reasonable documentation to replace  2 with *
REGISTER_STRIDE here.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Thierry Reding
On Thu, Nov 29, 2012 at 12:39:23PM +0200, Terje Bergström wrote:
 On 29.11.2012 10:44, Thierry Reding wrote:
  diff --git a/drivers/video/tegra/host/dev.c 
  b/drivers/video/tegra/host/dev.c
  index 98c9c9f..025a820 100644
  --- a/drivers/video/tegra/host/dev.c
  +++ b/drivers/video/tegra/host/dev.c
  @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
   }
   EXPORT_SYMBOL(host1x_syncpt_read);
 
  +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
  
  The choice of data types is odd here. id refers to a syncpt so a better
  choice would have been unsigned int because the size of the variable
  doesn't actually matter. But as I already said in my reply to patch 1,
  these are resources and should therefore better be abstracted through an
  opaque pointer anyway.
  
  timeout is usually signed long, so this function should reflect that. As
  for the value this is probably fine as it will effectively be set from a
  register value. Though you also cache them in software using atomics.
 
 32-bits is an architectural limit for the sync point id, so that's why I
 used it here.

But given that there are only 32 syncpoints they look rather costly, so
I don't expect more than a few hundred to ever be used in hardware,
right?

 But you're right - it doesn't really matter and could be changed to
 unsigned long.

I'd still opt for unsigned int. For no other reason than that it is how
other types of resources are enumerated.

 thresh and *value reflects that sync point value is 32-bit, and I'd keep
 that as is.

Yes, that makes sense.

 Timeout should be unsigned long, yes.

It should actually be signed long to match the type used for timeouts in
the various wait_*() functions.

  diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c 
  b/drivers/video/tegra/host/host1x/host1x_intr.c
  [...]
  +#include linux/interrupt.h
  +#include linux/irq.h
  +#include linux/io.h
  +#include asm/mach/irq.h
  +
  +#include nvhost_intr.h
  +#include host1x/host1x.h
  +
  +/* Spacing between sync registers */
  +#define REGISTER_STRIDE 4
  
  Erm... no. The usual way you should be doing this is either make the
  register definitions account for the stride or use accessors that apply
  the stride. You should be doing the latter anyway to make accesses. For
  example:
  
  static inline void host1x_syncpt_writel(struct host1x *host1x,
  unsigned long value,
  unsigned long offset)
  {
  writel(value, host1x-regs + SYNCPT_BASE + offset);
  }
  
  static inline unsigned long host1x_syncpt_readl(struct host1x 
  *host1x,
  unsigned long 
  offset)
  {
  return readl(host1x-regs + SYNCPT_BASE + offset);
  }
  
  Alternatively, if you want to pass the register index instead of the
  offset, you can use just multiply the offset in that function:
 
  writel(value, host1x-regs + SYNCPT_BASE + (offset  2));
 
  The same can also be done with the non-syncpt registers.
 
 The register number has a stride of 4 when doing writes, and 1 when
 adding to command streams. This is why I've kept the register
 definitions as is.

Yes, that's why it makes sense to use such helpers. It allows you to
reuse the register definitions for both direct and indirect access but
doesn't require you to repeat the stride multiplication every time.

 I could add helper functions. Just as a side note, the sync register
 space has other definitions than just the syncpt registers, so the
 naming should be changed a bit.

The TRM refers to them as SYNC registers, so SYNC_BASE should be fine.

  +static irqreturn_t syncpt_thresh_cascade_isr(int irq, void *dev_id)
  +{
  + struct nvhost_master *dev = dev_id;
  + void __iomem *sync_regs = dev-sync_aperture;
  + struct nvhost_intr *intr = dev-intr;
  + unsigned long reg;
  + int i, id;
  +
  + for (i = 0; i  dev-info.nb_pts / BITS_PER_LONG; i++) {
  + reg = readl(sync_regs +
  + 
  host1x_sync_syncpt_thresh_cpu0_int_status_r() +
  + i * REGISTER_STRIDE);
  + for_each_set_bit(id, reg, BITS_PER_LONG) {
  + struct nvhost_intr_syncpt *sp =
  + intr-syncpt + (i * BITS_PER_LONG + id);
  + host1x_intr_syncpt_thresh_isr(sp);
  + queue_work(intr-wq, sp-work);
  + }
  + }
  +
  + return IRQ_HANDLED;
  +}
  
  Maybe it would be better to call the syncpt handlers in interrupt
  context and let them schedule work if they want to. I'm thinking about
  the display controllers which may want to use syncpoints for VBLANK
  support.
 
 Display controller can use the APIs to read, increment and wait for sync
 point.

Actually for the display controller we want just a notification 

Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Thierry Reding
On Thu, Nov 29, 2012 at 11:41:50AM -0700, Stephen Warren wrote:
 On 11/29/2012 01:44 AM, Thierry Reding wrote:
  On Mon, Nov 26, 2012 at 03:19:08PM +0200, Terje Bergstrom wrote:
 
  diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c
  b/drivers/video/tegra/host/host1x/host1x_intr.c
  [...]
  +/* Spacing between sync registers */ +#define REGISTER_STRIDE 4
  
  Erm... no. The usual way you should be doing this is either make
  the register definitions account for the stride or use accessors
  that apply the stride. You should be doing the latter anyway to
  make accesses. For example:
  
  static inline void host1x_syncpt_writel(struct host1x *host1x, 
  unsigned long value, unsigned long offset) { writel(value,
  host1x-regs + SYNCPT_BASE + offset); }
  
  static inline unsigned long host1x_syncpt_readl(struct host1x
  *host1x, unsigned long offset) { return readl(host1x-regs +
  SYNCPT_BASE + offset); }
  
  Alternatively, if you want to pass the register index instead of
  the offset, you can use just multiply the offset in that function:
  
  writel(value, host1x-regs + SYNCPT_BASE + (offset  2));
  
  The same can also be done with the non-syncpt registers.
 
 It seems like reasonable documentation to replace  2 with *
 REGISTER_STRIDE here.

Given that it is a very common pattern,  2 seems enough documentation
to me, but sure, if you prefer to be extra explicit that's fine with me.

Thierry


pgpYKjECswErR.pgp
Description: PGP signature


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Terje Bergström
Just replying to part of your mail.

On 30.11.2012 09:22, Thierry Reding wrote:
 Actually for the display controller we want just a notification when the
 VBLANK happens. I'm not sure if we want to do that with syncpoints at
 all since it works quite well using regular interrupts.

VBLANK isn't actually a very good example of dc's use of sync points.
That can easily be done with regular interrupts, as you mention.

More important is when we have double buffering enabled. When you draw
something to a surface, and flip it to display, you want DC to notify
when the flip has been done and rendering can continue to the back buffer.

So, what you can do is return a fence from DC when initiating a flip,
and place that fence into 2D stream as a host wait so that 2D will
patiently wait for buffer to become free before it renders.

 What I'm proposing is to leave it up to each host1x client how they want
 to handle this. For display controllers it may be enough to have their
 callback run in interrupt context but other clients may need to do more
 work so they can queue it themselves.

DC doesn't need to worry about host1x interrupts at all. It's all
internal to the host1x driver, so we're now just talking about the
internal implementation of host1x.

We have two scenarios for the syncpt interrupts. One is that a job got
finished and we need to clean up the queue and free up resources. This
must be done in threads. Other is releasing a thread that is blocked by
a syncpt wait.

It's simpler if both of these are handled with the same infrastructure,
and we've shown that latency is very good even if we handle all events
in a thread.

 I know that this looks like it might be more work, but if it turns out
 that many drivers need to do the exact same thing, that functionality
 can be factored out into a helper. But it may just as well turn out that
 the requirements for each module are slightly different that forcing a
 workqueue on them could result in ugly workarounds because it doesn't
 quite work for them.

This is just driver internal, so there's no need for other drivers to
access this part.

 If we move responsibility of managing the workqueue out of host1x as I
 proposed above, maybe a lot of this code can be removed. Maybe you can
 explain a bit what they are used for exactly in your write-up.

It's going to be a big bad boy. :-)

Terje

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-29 Thread Thierry Reding
On Mon, Nov 26, 2012 at 03:19:08PM +0200, Terje Bergstrom wrote:
[...]
 diff --git a/drivers/video/tegra/host/chip_support.h 
 b/drivers/video/tegra/host/chip_support.h
[...]
 +struct nvhost_intr_ops {
 + void (*init_host_sync)(struct nvhost_intr *);
 + void (*set_host_clocks_per_usec)(
 + struct nvhost_intr *, u32 clocks);
 + void (*set_syncpt_threshold)(
 + struct nvhost_intr *, u32 id, u32 thresh);
 + void (*enable_syncpt_intr)(struct nvhost_intr *, u32 id);
 + void (*disable_syncpt_intr)(struct nvhost_intr *, u32 id);
 + void (*disable_all_syncpt_intrs)(struct nvhost_intr *);
 + int  (*request_host_general_irq)(struct nvhost_intr *);
 + void (*free_host_general_irq)(struct nvhost_intr *);
 + int (*free_syncpt_irq)(struct nvhost_intr *);
 +};
 +
  struct nvhost_chip_support {
   const char *soc_name;
   struct nvhost_syncpt_ops syncpt;
 + struct nvhost_intr_ops intr;
  };
  
  struct nvhost_chip_support *nvhost_get_chip_ops(void);
  
  #define syncpt_op()  (nvhost_get_chip_ops()-syncpt)
 +#define intr_op()(nvhost_get_chip_ops()-intr)
  
  int nvhost_init_chip_support(struct nvhost_master *host);
  

The same comments apply as for patch 1. Reducing the number of
indirections here make things a lot easier.

 diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
 index 98c9c9f..025a820 100644
 --- a/drivers/video/tegra/host/dev.c
 +++ b/drivers/video/tegra/host/dev.c
 @@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
  }
  EXPORT_SYMBOL(host1x_syncpt_read);
  
 +int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)

The choice of data types is odd here. id refers to a syncpt so a better
choice would have been unsigned int because the size of the variable
doesn't actually matter. But as I already said in my reply to patch 1,
these are resources and should therefore better be abstracted through an
opaque pointer anyway.

timeout is usually signed long, so this function should reflect that. As
for the value this is probably fine as it will effectively be set from a
register value. Though you also cache them in software using atomics.

 +static void clock_on_host(struct platform_device *dev)
 +{
 + struct nvhost_device_data *pdata = platform_get_drvdata(dev);
 + struct nvhost_master *host = nvhost_get_private_data(dev);
 + nvhost_intr_start(host-intr, clk_get_rate(pdata-clk[0]));
 +}
 +
 +static int clock_off_host(struct platform_device *dev)
 +{
 + struct nvhost_master *host = nvhost_get_private_data(dev);
 + nvhost_intr_stop(host-intr);
 + return 0;
 +}

This is a good example of why these indirections are wasteful. You
constantly need to look up the host pointer just to call another
function on it. With some modifications to the structure layouts it
should be possible to make this a lot more straightforward.

 diff --git a/drivers/video/tegra/host/host1x/host1x.h 
 b/drivers/video/tegra/host/host1x/host1x.h
 index 76748ac..af9bfef 100644
 --- a/drivers/video/tegra/host/host1x/host1x.h
 +++ b/drivers/video/tegra/host/host1x/host1x.h
 @@ -25,6 +25,7 @@
  #include linux/nvhost.h
  
  #include nvhost_syncpt.h
 +#include nvhost_intr.h
  
  #define TRACE_MAX_LENGTH 128U
  #define IFACE_NAME   nvhost
 @@ -33,6 +34,7 @@ struct nvhost_master {
   void __iomem *aperture;
   void __iomem *sync_aperture;
   struct nvhost_syncpt syncpt;
 + struct nvhost_intr intr;
   struct platform_device *dev;
   struct host1x_device_info info;
  };
 diff --git a/drivers/video/tegra/host/host1x/host1x01.c 
 b/drivers/video/tegra/host/host1x/host1x01.c
 index d53302d..5bf0e6e 100644
 --- a/drivers/video/tegra/host/host1x/host1x01.c
 +++ b/drivers/video/tegra/host/host1x/host1x01.c
 @@ -26,12 +26,14 @@
  #include chip_support.h
  
  #include host1x/host1x_syncpt.c
 +#include host1x/host1x_intr.c
  
  int nvhost_init_host1x01_support(struct nvhost_master *host,
   struct nvhost_chip_support *op)
  {
   host-sync_aperture = host-aperture + HOST1X_CHANNEL_SYNC_REG_BASE;
   op-syncpt = host1x_syncpt_ops;
 + op-intr = host1x_intr_ops;
  
   return 0;
  }

Also you need to touch a lot of files just to add this new feature. This
makes maintenance needlessly difficult.

 diff --git a/drivers/video/tegra/host/host1x/host1x_intr.c 
 b/drivers/video/tegra/host/host1x/host1x_intr.c
[...]
 +#include linux/interrupt.h
 +#include linux/irq.h
 +#include linux/io.h
 +#include asm/mach/irq.h
 +
 +#include nvhost_intr.h
 +#include host1x/host1x.h
 +
 +/* Spacing between sync registers */
 +#define REGISTER_STRIDE 4

Erm... no. The usual way you should be doing this is either make the
register definitions account for the stride or use accessors that apply
the stride. You should be doing the latter anyway to make accesses. For
example:

static inline void host1x_syncpt_writel(struct host1x *host1x,
  

Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-27 Thread Sivaram Nair
On Mon, Nov 26, 2012 at 02:19:08PM +0100, Terje Bergstrom wrote:
> +void nvhost_intr_stop(struct nvhost_intr *intr)
> +{
> +   unsigned int id;
> +   struct nvhost_intr_syncpt *syncpt;
> +   u32 nb_pts = nvhost_syncpt_nb_pts(_to_dev(intr)->syncpt);
> +
> +   mutex_lock(>mutex);
> +
> +   intr_op().disable_all_syncpt_intrs(intr);
> +
> +   for (id = 0, syncpt = intr->syncpt;
> +id < nb_pts;
> +++id, ++syncpt) {
> +   struct nvhost_waitlist *waiter, *next;
> +   list_for_each_entry_safe(waiter, next,
> +   >wait_head, list) {
> +   if (atomic_cmpxchg(>state,
> +   WLS_CANCELLED, WLS_HANDLED)
> +   == WLS_CANCELLED) {
> +   list_del(>list);
> +   kref_put(>refcount, waiter_release);
> +   }
> +   }
> +
> +   if (!list_empty(>wait_head)) {  /* output diagnostics 
> */
> +   pr_warn("%s cannot stop syncpt intr id=%d\n",
> +   __func__, id);
> +   return;

mutex_unlock() missing before return.

-Sivaram
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-27 Thread Sivaram Nair
On Mon, Nov 26, 2012 at 02:19:08PM +0100, Terje Bergstrom wrote:
 +void nvhost_intr_stop(struct nvhost_intr *intr)
 +{
 +   unsigned int id;
 +   struct nvhost_intr_syncpt *syncpt;
 +   u32 nb_pts = nvhost_syncpt_nb_pts(intr_to_dev(intr)-syncpt);
 +
 +   mutex_lock(intr-mutex);
 +
 +   intr_op().disable_all_syncpt_intrs(intr);
 +
 +   for (id = 0, syncpt = intr-syncpt;
 +id  nb_pts;
 +++id, ++syncpt) {
 +   struct nvhost_waitlist *waiter, *next;
 +   list_for_each_entry_safe(waiter, next,
 +   syncpt-wait_head, list) {
 +   if (atomic_cmpxchg(waiter-state,
 +   WLS_CANCELLED, WLS_HANDLED)
 +   == WLS_CANCELLED) {
 +   list_del(waiter-list);
 +   kref_put(waiter-refcount, waiter_release);
 +   }
 +   }
 +
 +   if (!list_empty(syncpt-wait_head)) {  /* output diagnostics 
 */
 +   pr_warn(%s cannot stop syncpt intr id=%d\n,
 +   __func__, id);
 +   return;

mutex_unlock() missing before return.

-Sivaram
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-26 Thread Terje Bergstrom
Add support for sync point interrupts, and sync point wait. Sync point
wait uses interrupts for unblocking wait.

Signed-off-by: Terje Bergstrom 
---
 drivers/video/tegra/host/Makefile |1 +
 drivers/video/tegra/host/chip_support.h   |   17 ++
 drivers/video/tegra/host/dev.c|7 +
 drivers/video/tegra/host/host1x/host1x.c  |   33 +++
 drivers/video/tegra/host/host1x/host1x.h  |2 +
 drivers/video/tegra/host/host1x/host1x01.c|2 +
 drivers/video/tegra/host/host1x/host1x_intr.c |  263 ++
 drivers/video/tegra/host/nvhost_intr.c|  363 +
 drivers/video/tegra/host/nvhost_intr.h|  102 +++
 drivers/video/tegra/host/nvhost_syncpt.c  |  111 
 drivers/video/tegra/host/nvhost_syncpt.h  |   10 +
 include/linux/nvhost.h|2 +
 12 files changed, 913 insertions(+)
 create mode 100644 drivers/video/tegra/host/host1x/host1x_intr.c
 create mode 100644 drivers/video/tegra/host/nvhost_intr.c
 create mode 100644 drivers/video/tegra/host/nvhost_intr.h

diff --git a/drivers/video/tegra/host/Makefile 
b/drivers/video/tegra/host/Makefile
index 3edab4a..24a 100644
--- a/drivers/video/tegra/host/Makefile
+++ b/drivers/video/tegra/host/Makefile
@@ -3,6 +3,7 @@ ccflags-y = -Idrivers/video/tegra/host
 nvhost-objs = \
nvhost_acm.o \
nvhost_syncpt.o \
+   nvhost_intr.o \
dev.o \
chip_support.o
 
diff --git a/drivers/video/tegra/host/chip_support.h 
b/drivers/video/tegra/host/chip_support.h
index acfa2f1..5c8f49f 100644
--- a/drivers/video/tegra/host/chip_support.h
+++ b/drivers/video/tegra/host/chip_support.h
@@ -25,6 +25,7 @@
 struct output;
 
 struct nvhost_master;
+struct nvhost_intr;
 struct nvhost_syncpt;
 struct platform_device;
 
@@ -38,14 +39,30 @@ struct nvhost_syncpt_ops {
const char * (*name)(struct nvhost_syncpt *, u32 id);
 };
 
+struct nvhost_intr_ops {
+   void (*init_host_sync)(struct nvhost_intr *);
+   void (*set_host_clocks_per_usec)(
+   struct nvhost_intr *, u32 clocks);
+   void (*set_syncpt_threshold)(
+   struct nvhost_intr *, u32 id, u32 thresh);
+   void (*enable_syncpt_intr)(struct nvhost_intr *, u32 id);
+   void (*disable_syncpt_intr)(struct nvhost_intr *, u32 id);
+   void (*disable_all_syncpt_intrs)(struct nvhost_intr *);
+   int  (*request_host_general_irq)(struct nvhost_intr *);
+   void (*free_host_general_irq)(struct nvhost_intr *);
+   int (*free_syncpt_irq)(struct nvhost_intr *);
+};
+
 struct nvhost_chip_support {
const char *soc_name;
struct nvhost_syncpt_ops syncpt;
+   struct nvhost_intr_ops intr;
 };
 
 struct nvhost_chip_support *nvhost_get_chip_ops(void);
 
 #define syncpt_op()(nvhost_get_chip_ops()->syncpt)
+#define intr_op()  (nvhost_get_chip_ops()->intr)
 
 int nvhost_init_chip_support(struct nvhost_master *host);
 
diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
index 98c9c9f..025a820 100644
--- a/drivers/video/tegra/host/dev.c
+++ b/drivers/video/tegra/host/dev.c
@@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
 }
 EXPORT_SYMBOL(host1x_syncpt_read);
 
+int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
+{
+   struct nvhost_syncpt *sp = >syncpt;
+   return nvhost_syncpt_wait_timeout(sp, id, thresh, timeout, value);
+}
+EXPORT_SYMBOL(host1x_syncpt_wait);
+
 bool host1x_powered(struct platform_device *dev)
 {
bool ret = 0;
diff --git a/drivers/video/tegra/host/host1x/host1x.c 
b/drivers/video/tegra/host/host1x/host1x.c
index 77ff00b..766931b 100644
--- a/drivers/video/tegra/host/host1x/host1x.c
+++ b/drivers/video/tegra/host/host1x/host1x.c
@@ -52,8 +52,24 @@ static int power_off_host(struct platform_device *dev)
return 0;
 }
 
+static void clock_on_host(struct platform_device *dev)
+{
+   struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+   struct nvhost_master *host = nvhost_get_private_data(dev);
+   nvhost_intr_start(>intr, clk_get_rate(pdata->clk[0]));
+}
+
+static int clock_off_host(struct platform_device *dev)
+{
+   struct nvhost_master *host = nvhost_get_private_data(dev);
+   nvhost_intr_stop(>intr);
+   return 0;
+}
+
 static void nvhost_free_resources(struct nvhost_master *host)
 {
+   kfree(host->intr.syncpt);
+   host->intr.syncpt = 0;
 }
 
 static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
@@ -64,6 +80,16 @@ static int __devinit nvhost_alloc_resources(struct 
nvhost_master *host)
if (err)
return err;
 
+   host->intr.syncpt = devm_kzalloc(>dev->dev,
+   sizeof(struct nvhost_intr_syncpt) *
+   nvhost_syncpt_nb_pts(>syncpt),
+   GFP_KERNEL);
+
+   if (!host->intr.syncpt) {
+   /* frees happen in the support removal phase */
+   

[RFC v2 2/8] video: tegra: Add syncpoint wait and interrupts

2012-11-26 Thread Terje Bergstrom
Add support for sync point interrupts, and sync point wait. Sync point
wait uses interrupts for unblocking wait.

Signed-off-by: Terje Bergstrom tbergst...@nvidia.com
---
 drivers/video/tegra/host/Makefile |1 +
 drivers/video/tegra/host/chip_support.h   |   17 ++
 drivers/video/tegra/host/dev.c|7 +
 drivers/video/tegra/host/host1x/host1x.c  |   33 +++
 drivers/video/tegra/host/host1x/host1x.h  |2 +
 drivers/video/tegra/host/host1x/host1x01.c|2 +
 drivers/video/tegra/host/host1x/host1x_intr.c |  263 ++
 drivers/video/tegra/host/nvhost_intr.c|  363 +
 drivers/video/tegra/host/nvhost_intr.h|  102 +++
 drivers/video/tegra/host/nvhost_syncpt.c  |  111 
 drivers/video/tegra/host/nvhost_syncpt.h  |   10 +
 include/linux/nvhost.h|2 +
 12 files changed, 913 insertions(+)
 create mode 100644 drivers/video/tegra/host/host1x/host1x_intr.c
 create mode 100644 drivers/video/tegra/host/nvhost_intr.c
 create mode 100644 drivers/video/tegra/host/nvhost_intr.h

diff --git a/drivers/video/tegra/host/Makefile 
b/drivers/video/tegra/host/Makefile
index 3edab4a..24a 100644
--- a/drivers/video/tegra/host/Makefile
+++ b/drivers/video/tegra/host/Makefile
@@ -3,6 +3,7 @@ ccflags-y = -Idrivers/video/tegra/host
 nvhost-objs = \
nvhost_acm.o \
nvhost_syncpt.o \
+   nvhost_intr.o \
dev.o \
chip_support.o
 
diff --git a/drivers/video/tegra/host/chip_support.h 
b/drivers/video/tegra/host/chip_support.h
index acfa2f1..5c8f49f 100644
--- a/drivers/video/tegra/host/chip_support.h
+++ b/drivers/video/tegra/host/chip_support.h
@@ -25,6 +25,7 @@
 struct output;
 
 struct nvhost_master;
+struct nvhost_intr;
 struct nvhost_syncpt;
 struct platform_device;
 
@@ -38,14 +39,30 @@ struct nvhost_syncpt_ops {
const char * (*name)(struct nvhost_syncpt *, u32 id);
 };
 
+struct nvhost_intr_ops {
+   void (*init_host_sync)(struct nvhost_intr *);
+   void (*set_host_clocks_per_usec)(
+   struct nvhost_intr *, u32 clocks);
+   void (*set_syncpt_threshold)(
+   struct nvhost_intr *, u32 id, u32 thresh);
+   void (*enable_syncpt_intr)(struct nvhost_intr *, u32 id);
+   void (*disable_syncpt_intr)(struct nvhost_intr *, u32 id);
+   void (*disable_all_syncpt_intrs)(struct nvhost_intr *);
+   int  (*request_host_general_irq)(struct nvhost_intr *);
+   void (*free_host_general_irq)(struct nvhost_intr *);
+   int (*free_syncpt_irq)(struct nvhost_intr *);
+};
+
 struct nvhost_chip_support {
const char *soc_name;
struct nvhost_syncpt_ops syncpt;
+   struct nvhost_intr_ops intr;
 };
 
 struct nvhost_chip_support *nvhost_get_chip_ops(void);
 
 #define syncpt_op()(nvhost_get_chip_ops()-syncpt)
+#define intr_op()  (nvhost_get_chip_ops()-intr)
 
 int nvhost_init_chip_support(struct nvhost_master *host);
 
diff --git a/drivers/video/tegra/host/dev.c b/drivers/video/tegra/host/dev.c
index 98c9c9f..025a820 100644
--- a/drivers/video/tegra/host/dev.c
+++ b/drivers/video/tegra/host/dev.c
@@ -43,6 +43,13 @@ u32 host1x_syncpt_read(u32 id)
 }
 EXPORT_SYMBOL(host1x_syncpt_read);
 
+int host1x_syncpt_wait(u32 id, u32 thresh, u32 timeout, u32 *value)
+{
+   struct nvhost_syncpt *sp = nvhost-syncpt;
+   return nvhost_syncpt_wait_timeout(sp, id, thresh, timeout, value);
+}
+EXPORT_SYMBOL(host1x_syncpt_wait);
+
 bool host1x_powered(struct platform_device *dev)
 {
bool ret = 0;
diff --git a/drivers/video/tegra/host/host1x/host1x.c 
b/drivers/video/tegra/host/host1x/host1x.c
index 77ff00b..766931b 100644
--- a/drivers/video/tegra/host/host1x/host1x.c
+++ b/drivers/video/tegra/host/host1x/host1x.c
@@ -52,8 +52,24 @@ static int power_off_host(struct platform_device *dev)
return 0;
 }
 
+static void clock_on_host(struct platform_device *dev)
+{
+   struct nvhost_device_data *pdata = platform_get_drvdata(dev);
+   struct nvhost_master *host = nvhost_get_private_data(dev);
+   nvhost_intr_start(host-intr, clk_get_rate(pdata-clk[0]));
+}
+
+static int clock_off_host(struct platform_device *dev)
+{
+   struct nvhost_master *host = nvhost_get_private_data(dev);
+   nvhost_intr_stop(host-intr);
+   return 0;
+}
+
 static void nvhost_free_resources(struct nvhost_master *host)
 {
+   kfree(host-intr.syncpt);
+   host-intr.syncpt = 0;
 }
 
 static int __devinit nvhost_alloc_resources(struct nvhost_master *host)
@@ -64,6 +80,16 @@ static int __devinit nvhost_alloc_resources(struct 
nvhost_master *host)
if (err)
return err;
 
+   host-intr.syncpt = devm_kzalloc(host-dev-dev,
+   sizeof(struct nvhost_intr_syncpt) *
+   nvhost_syncpt_nb_pts(host-syncpt),
+   GFP_KERNEL);
+
+   if (!host-intr.syncpt) {
+   /* frees happen in the