Re: When is a wake_up() not a wake up?

2009-04-08 Thread Andy Walls
On Sat, 2009-04-04 at 04:07 -0700, Trent Piepho wrote:
 On Fri, 3 Apr 2009, Andy Walls wrote:
  1. A work queue thread or read() call needs to send a command to the
  CX23418 using the cx18_api_call() function
  2. It fills out a mailbox with a command for the CX23418
  3. It prepares to wait, just in case a wait is needed
  4. A SW1 interrupt is sent to the CX23418 telling it a mailbox is ready
  5. The ack filed in the mailbox, a PCI MMIO location, is checked to see
  if the mailbox was ack'ed already
  6. If not, schedule_timeout() for up to 10 msecs (a near eternity...)
  7. Clean up the wait and move on
 
 10ms isn't that long.  With a 100 HZ kernel it's only one jiffie.  The
 shortest time msleep() can sleep on a 100 HZ kernel is 20 ms.
 
 Is it possible that it's simply taking more than 10 ms for your process to
 get run?

After some testing with fine grained timestamps as you suggested, that
indeed appears to be the case.

For the high volume API command I care about, in 6097 out of 6100
samples, the firmware acknowledgment interrupt comes back in and the
process is awakened back to TASK_RUNNABLE in under 150 us.  0 out of
6100 samples finished the wakeup to TASK_RUNNABLE faster than 55 us.
However, I had 22 samples, where coming back from schedule_timeout()
took 10 msec or greater, despite the wakeup happening in under 150 us.

So the simple answer is to poll the ack field of the mailbox, with some
50 us busy waits perhaps, for this one high volume API command.

Regards,
Andy

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: When is a wake_up() not a wake up?

2009-04-04 Thread Trent Piepho
On Fri, 3 Apr 2009, Andy Walls wrote:
 1. A work queue thread or read() call needs to send a command to the
 CX23418 using the cx18_api_call() function
 2. It fills out a mailbox with a command for the CX23418
 3. It prepares to wait, just in case a wait is needed
 4. A SW1 interrupt is sent to the CX23418 telling it a mailbox is ready
 5. The ack filed in the mailbox, a PCI MMIO location, is checked to see
 if the mailbox was ack'ed already
 6. If not, schedule_timeout() for up to 10 msecs (a near eternity...)
 7. Clean up the wait and move on

10ms isn't that long.  With a 100 HZ kernel it's only one jiffie.  The
shortest time msleep() can sleep on a 100 HZ kernel is 20 ms.

Is it possible that it's simply taking more than 10 ms for your process to
get run?

 Mar 31 23:36:56 palomino kernel: cx18-0:  irq: sending interrupt SW1: 8 to 
 send CX18_CPU_DE_SET_MDL
 Mar 31 23:36:56 palomino kernel: cx18-0:  api: scheduling to wait for ack of 
 CX18_CPU_DE_SET_MDL: req 51267 ack 51266, pid 21092, state 2
 Mar 31 23:36:56 palomino kernel: cx18-0:  irq: received interrupts SW1: 0  
 SW2: 8  HW2: 0
 Mar 31 23:36:56 palomino kernel: cx18-0:  irq: Wake up initiated on pid 21092 
 in state 2
 Mar 31 23:36:56 palomino kernel: cx18-0:  irq: Wake up succeeded on pid 
 21092, state 2 - 0
 Mar 31 23:36:56 palomino kernel: cx18-0:  api: done wait for ack of 
 CX18_CPU_DE_SET_MDL: req 51267 ack 51267, current pid 21092, current state 0, 
 state 0
 Mar 31 23:36:56 palomino kernel: cx18-0:  warning: failed to be awakened upon 
 RPU acknowledgment sending CX18_CPU_DE_SET_MDL; timed out waiting 10 msecs

Try some higher resolution time stamps, you can't really tell much about
when things are happening.

Here's some code I wrote to do it on x86.

#include linux/calc64.h
#define MHZ 1666.787
#define INT (unsigned int)(MHZ * (112) + 0.5)
#define FRAC(unsigned int)(MHZ * (121) / 1000.0 + 0.5)
#define timestamp(str, args...) { u64 _time; u32 _rem; rdtscll(_time); \
_time = (_time - starttime)12; \
_rem = do_div(_time, INT); \
printk(%s - %u.%03u us  str \n, __func__, \
(unsigned int)_time, (_rem  9) / FRAC , ## args); }
static u64 starttime;

Change MHZ to what /proc/cpuinfo tells you.  Call 'rdtscll(starttime)' some
time when your driver inits or a device gets opened or something like that.
Makes the time stamps easier to read when they start near zero.
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


When is a wake_up() not a wake up?

2009-04-03 Thread Andy Walls
Hi.  I've got a problem where in the cx18 driver, the wake_up() call
doesn't seem to be waking up a process or work queue thread that tried
to wait on a wait queue.

While doing some investigation, I found I was unable to understand the
conditions under which try_to_wake_up() will return success or not.  Can
anyone give short summary on the conditions under which it does not?

FYI:
$ uname -a
Linux palomino.walls.org 2.6.27.9-159.fc10.x86_64 #1 SMP Tue Dec 16 14:47:52 
EST 2008 x86_64 x86_64 x86_64 GNU/Linux


Also, just to provide some context on the larger problem, here's what's
supposed to happen in the cx18 driver:

1. A work queue thread or read() call needs to send a command to the
CX23418 using the cx18_api_call() function
2. It fills out a mailbox with a command for the CX23418
3. It prepares to wait, just in case a wait is needed
4. A SW1 interrupt is sent to the CX23418 telling it a mailbox is ready
5. The ack filed in the mailbox, a PCI MMIO location, is checked to see
if the mailbox was ack'ed already
6. If not, schedule_timeout() for up to 10 msecs (a near eternity...)
7. Clean up the wait and move on

In response to the SW1 interrupt in step 4 above, the CX23418 processes
the command mailbox, fills out the ack field in the mailbox with a
good value, and sends a SW2 ack interrupt back.  The cx18_irq_hander()
calls wake_up() in response to this.

However, I am running across occasions where the maximum timeout has
expired, and the mailbox has been ack'ed.  It's as if the wake_up()
didn't work.  Here's a sample from the log:


Mar 31 23:36:56 palomino kernel: cx18-0:  irq: sending interrupt SW1: 8 to send 
CX18_CPU_DE_SET_MDL
Mar 31 23:36:56 palomino kernel: cx18-0:  api: scheduling to wait for ack of 
CX18_CPU_DE_SET_MDL: req 51267 ack 51266, pid 21092, state 2
Mar 31 23:36:56 palomino kernel: cx18-0:  irq: received interrupts SW1: 0  SW2: 
8  HW2: 0
Mar 31 23:36:56 palomino kernel: cx18-0:  irq: Wake up initiated on pid 21092 
in state 2
Mar 31 23:36:56 palomino kernel: cx18-0:  irq: Wake up succeeded on pid 21092, 
state 2 - 0
Mar 31 23:36:56 palomino kernel: cx18-0:  api: done wait for ack of 
CX18_CPU_DE_SET_MDL: req 51267 ack 51267, current pid 21092, current state 0, 
state 0
Mar 31 23:36:56 palomino kernel: cx18-0:  warning: failed to be awakened upon 
RPU acknowledgment sending CX18_CPU_DE_SET_MDL; timed out waiting 10 msecs

A wait of 10 msecs is not impossible I guess.  However, they should be
much less frequent that what I am seeing in my logs, given that this is
an ATSC video stream capture. 

Here's the highlights from a patched cx18 driver that I have in place to
try and see what's going on:

static int cx18_autoremove_wake_function(wait_queue_t *w, unsigned mode,
 int sync, void *key)
{
struct cx18 *cx = key;
struct task_struct *t = w-private;
long old_state;
int ret;

old_state = t-state;
CX18_DEBUG_HI_IRQ(Wake up initiated on pid %d in state %lx\n,
  task_pid_nr(t), old_state);

ret = default_wake_function(w, mode, sync, NULL);

if (ret) {
CX18_DEBUG_HI_IRQ(Wake up succeeded on pid %d, 
  state %lx - %lx\n,
  task_pid_nr(t), old_state, t-state);
list_del_init(w-task_list);
} else {
CX18_DEBUG_WARN(Wake up failed on pid %d, 
state %lx - %lx\n,
task_pid_nr(t), old_state, t-state);
}
return ret;
}

static int cx18_api_call(struct cx18 *cx, u32 cmd, int args, u32 data[])
{
u32 state, irq, req, ack, err;
struct cx18_mailbox __iomem *mb;
wait_queue_head_t *waitq;
struct mutex *mb_lock;
long int timeout, ret;
int i;
wait_queue_t w;   

... 

/* Setup pointers and values we need */
waitq = cx-mb_cpu_waitq;
mb_lock = cx-epu2cpu_mb_lock;
irq = IRQ_EPU_TO_CPU;
mb = cx-scb-epu2cpu_mb;
...

/* Acquire the mutex for the mailbox, and put a request in it */
/* blah blah blah */
...

timeout = msecs_to_jiffies(10);
CX18_DEBUG_HI_IRQ(sending interrupt SW1: %x to send %s\n,
  irq, info-name);
ret = timeout;

/* setup for a wait */
init_wait(w);
w.func = cx18_autoremove_wake_function;
prepare_to_wait(waitq, w, TASK_UNINTERRUPTIBLE);

/* Tell the device our request is in it's mailbox */
cx18_write_reg_expect(cx, irq, SW1_INT_SET, irq, irq);

/* Did it ack our request already? */
ack = cx18_readl(cx, mb-ack);
if (req != ack) {
CX18_DEBUG_HI_API(scheduling to wait for ack of %s: req %u 
  ack %u, pid %u, state %lx\n,