Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-04-08 Thread christian pellegrin
On Mon, Mar 29, 2010 at 4:48 AM, Feng Tang feng.t...@intel.com wrote:

 +             tx |= MAX3100_WD | MAX3100_RTS;

 Does this imply to have to work with HW flow control? on my platform
 I have to remove the RTS bit to make it work.


Finally I had time to check this. If you compare the 8250 (or similar)
data-sheet with the MAX31x0 one you see that the handling of the
RTX/CTS bits is exactly the same (8250 about RTS bit for example:
When any of these bits are cleared, the associated output is forced
high. and MAX3110: Request-to-Send Bit. Controls the state of the
RTS output. This bit is reset on power-up (RTS
bit = 0 sets the RTS pin = logic high).). If you look at the 8250.c
driver (grep for UART_MCR_RTS) you notice that the bit is set on
device open (together with DTR of course). So I think the driver is
doing the right thing here.

-- 
Christian Pellegrin, see http://www.evolware.org/chri/
Real Programmers don't play tennis, or any other sport which requires
you to change clothes. Mountain climbing is OK, and Real Programmers
wear their climbing boots to work in case a mountain should suddenly
spring up in the middle of the computer room.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-04-05 Thread christian pellegrin
Hi,

first of all I'm sorry for the late response, just now I found time to
work on this.

On Wed, Mar 31, 2010 at 8:04 AM, Grant Likely grant.lik...@secretlab.ca wrote:

 Is checking the T bit really necessary if Linux instead tracks the
 timing between bytes internally?  ie.  If the driver uses an hrtimer
 to schedule the submission of SPI transfers such that the time between
 SPI submissions is always greater than the time required for a serial
 character to be transmitted?

I think you have to check it anyway. For example the SPI bus may be
shared with another device so we don't know when our char will be sent
(it might be delayed for more than the duration of a char being sent
on the serial line if the initial execution of the SPI command is
delayed). But using a hrtimer will be for sure more fair than polling
T bit as far as resource usage is concerned. I was always hesitant
about using hrtimers: I really don't know if all platforms support
them with the needed granularity (at 115200 a char takes around 100us)
and the aren't many users of them in the drivers directory (quite all
of them are in the staging one). But it's definitely a good idea if
hrtimers do work. I'll make some tests.


 You may be able to set this up into a free-running state machine.
 Submit the SPI transfers asynchronously, and use a callback to be
 notified when the transfer is completed.  In most cases, the transfer
 will be completed every time, but if it is not, then the state machine
 can just wait another time period before submitting.  By doing it this
 way, all the work can be handled at the atomic context without any
 workqueues or threaded IRQ handlers.


yes a completely async design could improve performance (the greatest
culprit for low performance (not mentioning slow SPI master drivers)
is the latency in the delayed work being started). When I first wrote
this driver I wanted to keep it simple so I was a bit frightened by a
state-machine like design, but it can be done for sure. My concern
here is: everything is the kernel is moving to doing as much as
possible in a delayed work mechanics (see the introduction of threaded
interrupts (which could became the default) or the coming soon death
of IRQF_DISABLED). Is doing a big part of the work (of course I would
use spi_async directly in the interrupt handler and handle the
incoming/outgoing chars in spi_async callback which is usually called
in an interrupt context) in an interrupt context antihistorical,
isn't it?

BTW: both of the design changes you mentioned seem sensible to me for
better performance of the driver. But they don't do any form of
batching and won't help if the underlying SPI master driver uses some
form of delayed work itself.

-- 
Christian Pellegrin, see http://www.evolware.org/chri/
Real Programmers don't play tennis, or any other sport which requires
you to change clothes. Mountain climbing is OK, and Real Programmers
wear their climbing boots to work in case a mountain should suddenly
spring up in the middle of the computer room.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-04-05 Thread Grant Likely
On Mon, Apr 5, 2010 at 12:19 PM, christian pellegrin chrip...@fsfe.org wrote:
 Hi,

 first of all I'm sorry for the late response, just now I found time to
 work on this.

 On Wed, Mar 31, 2010 at 8:04 AM, Grant Likely grant.lik...@secretlab.ca 
 wrote:

 Is checking the T bit really necessary if Linux instead tracks the
 timing between bytes internally?  ie.  If the driver uses an hrtimer
 to schedule the submission of SPI transfers such that the time between
 SPI submissions is always greater than the time required for a serial
 character to be transmitted?

 I think you have to check it anyway. For example the SPI bus may be
 shared with another device so we don't know when our char will be sent
 (it might be delayed for more than the duration of a char being sent
 on the serial line if the initial execution of the SPI command is
 delayed). But using a hrtimer will be for sure more fair than polling
 T bit as far as resource usage is concerned. I was always hesitant
 about using hrtimers: I really don't know if all platforms support
 them with the needed granularity (at 115200 a char takes around 100us)
 and the aren't many users of them in the drivers directory (quite all
 of them are in the staging one). But it's definitely a good idea if
 hrtimers do work. I'll make some tests.

Another thing to think about: this device is never going to be high
performance.  If tx gets delayed, then it isn't a big deal because
there won't be any data loss.  The real concern is RX where if you
don't process fast enough then characters get dropped.  Fortunately
you have an 8 byte deep buffer.  So if each timer fire schedules more
than one transfer (even if only one char is actually transmitted),
then it is okay if the timer granularity is 300-400us at 115200.



 You may be able to set this up into a free-running state machine.
 Submit the SPI transfers asynchronously, and use a callback to be
 notified when the transfer is completed.  In most cases, the transfer
 will be completed every time, but if it is not, then the state machine
 can just wait another time period before submitting.  By doing it this
 way, all the work can be handled at the atomic context without any
 workqueues or threaded IRQ handlers.


 yes a completely async design could improve performance (the greatest
 culprit for low performance (not mentioning slow SPI master drivers)
 is the latency in the delayed work being started). When I first wrote
 this driver I wanted to keep it simple so I was a bit frightened by a
 state-machine like design, but it can be done for sure. My concern
 here is: everything is the kernel is moving to doing as much as
 possible in a delayed work mechanics (see the introduction of threaded
 interrupts (which could became the default) or the coming soon death
 of IRQF_DISABLED). Is doing a big part of the work (of course I would
 use spi_async directly in the interrupt handler and handle the
 incoming/outgoing chars in spi_async callback which is usually called
 in an interrupt context) in an interrupt context antihistorical,
 isn't it?

If you only use spi_async (which must be atomic), and you don't do any
heavy data processing or udelay()s in the irq path, then your driver
will not be adding major latency.  Use a ring buffer for both the tx
and rx paths so that the console processing can be done in a workqueue
or something and you should be just fine.

 BTW: both of the design changes you mentioned seem sensible to me for
 better performance of the driver. But they don't do any form of
 batching and won't help if the underlying SPI master driver uses some
 form of delayed work itself.

Then the SPI master driver is broken.  It must be fixed.  :-)
Submission of SPI transfers via spi_async() is supposed to always be
atomic.

g.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-03-31 Thread Grant Likely
On Tue, Mar 30, 2010 at 6:03 AM, christian pellegrin chrip...@fsfe.org wrote:
 On Tue, Mar 30, 2010 at 10:46 AM, Alan Cox a...@lxorguk.ukuu.org.uk wrote:

 There I would partly disagree. Fixing the spi driver clearly makes sense
 but the serial driver should be batching better. Is there a reason the
 driver couldn't adjust the size based upon the tty speed when getting a
 termios request ?

 Just a small note for those who don't know max31x0. There are just 2
 instructions to tx/rx data: simple data rx (in which we read one data
 from the fifo (the data is valid if R status bit is set), control
 lines (CTS) and status bits) and a combined tx/rx (like before but we
 also send one byte (if the T status bit the char will be accepted) and
 set control lines (RTS)). Every of these instructions is 2 bytes long.
 It's a full SPI message (so you have to assert CS before and de-assert
 it afterwards to make it work). TX buffer is just 1 byte deep, RX one
 is 8 bytes.

Is checking the T bit really necessary if Linux instead tracks the
timing between bytes internally?  ie.  If the driver uses an hrtimer
to schedule the submission of SPI transfers such that the time between
SPI submissions is always greater than the time required for a serial
character to be transmitted?

You may be able to set this up into a free-running state machine.
Submit the SPI transfers asynchronously, and use a callback to be
notified when the transfer is completed.  In most cases, the transfer
will be completed every time, but if it is not, then the state machine
can just wait another time period before submitting.  By doing it this
way, all the work can be handled at the atomic context without any
workqueues or threaded IRQ handlers.

Here's some draft )completely untested and uncompiled) code for the
bottom half:  The time rate would be calculated when the baud rate is
set.  I've glossed over a bunch of details and corner cases, but you
get the idea...

void max3100_spi_complete(void *__max3100)
{
  struct max3100_port *max3100 = __max3100;
  spin_lock(max3100-lock);
  max3100-busy = 0;
  /* Process received characters here.  Schedule a workqueue if
needed.  A ring buffer would probably work well. */
  spin_unlock(max3100-lock);
}

enum hrtimer_restart max3100_hrtimer_callback( struct hrtimer *timer )
{
  struct max3100_port *max3100 = container_of(timer, struct
max3100_port, hrtimer);

  if (max3100-busy)
return HRTIMER_RESTART; /* still busy, try again later */

  spin_lock(max3100-lock);
  /* The following 7 lines can probably actually be done once at
driver probe time */
  spi_message_init(max3100-spi_msg);
  spi_message_add_tail(max3100-spi_tran);
  max3100-spi_tran.tx_buf = mac3100-tx_buf;
  max3100-spi_tran.rx_buf = mac3100-rx_buf;
  max3100-spi_tran.len = 2;  /* can this be larger to receive
multiple bytes in 1 transfer? */
  max3100-spi_msg.complete = max3100_spi_complete;
  max3100-spi_msg.context = max3100;
  /* Alternately, a bunch of transfers could be queued up with only
the first actual
   * containing tx data.  The rest would be for RX */

  if (max3100-tx_pending) {
/* a ring buffer may be a better idea here */
max3100-tx_buf[0] = max3100-tx_pending_data;
max3100-tx_pending = 0;
  }
  spi_async(max3100-spi_msg);
  max3100-busy = 1;  /* cleared by the completion callback */

  spin_unlock(max3100-lock);

  return HRTIMER_RESTART;
}

g.

-- 
Grant Likely, B.Sc., P.Eng.
Secret Lab Technologies Ltd.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-03-30 Thread christian pellegrin
On Tue, Mar 30, 2010 at 4:14 AM, Feng Tang feng.t...@intel.com wrote:
 On Mon, 29 Mar 2010 20:55:51 +0800

 DW controller driver don't need max3110 driver to use big spi transfer,
 the early version of our max3110 is also one word per spi transfer mode,
 and the 2 drivers work fine, though the rx performance is not good (copy
  paste a bulk message to console).

What I'm saying is that the reason of bad performance is that the
underlying SPI driver has bad performance. A SPI master driver should
do as little work as possible. For small transfers (2 bytes == 16
bits) at 5MHz (== 3.2 us) it's much better to spin in spi_transfer
waiting for SPI done than scheduling anything or setting up a DMA
transfer. Especially if you do *2* task schedule (spi_transfer queues
a workqueue that schedules a tasklet) be prepared for *big* latency. I
hope other people can comment on this if I'm saying something wrong.


 When the HW works at 230400 bps, when we copy bulk data to the console,
 max3110 will probably receive about 20K characters, so the time for
 handling each char is 50us. If you use one char per spi transfer, it
 will have to execute 20K spi_transfer in one second, and each need be
 done in 50us, in this 50us you'll have to deal with controller IRQ
 handler + spi system overhead + tty receiving overhead. Is it a little
 over-killing to use one char per spi transfer? while the HW does
 have a 8 words RX FIFO

this is too hackish, max3100 wasn't conceived this way. Let me point
some problems of such an approach:

1) latency on rx chars becomes very high because we can process
incoming transfers only after a full 8 byte (or whatever the spi
transfer dimension is). For a 9600 this is too much.

2) even worse is that we can do flow control decision only on such boundary.

3) this is not reliable: think of what happens if the actual SPI
transfer speed we get will be slower that the one we requested. We
won't be emptying the RX buffer fastly enough even if could.

4) for low speeds we are going to monopolize the SPI bus for ages.

So I'm rather convinced that the SPI transfer has to happen as soon as
possible at a SPI device driver level without any guess on how the
data will be clocked out. I suggest you to improve the dw_spi driver
for better performance.

 [[    ..00  iiuu  eessoo  44rr11((eeggffnn--77  
 ggccvvrriinn11((bbnnuu1144bbnnuu
 [[    ..00  BBOO--8800  1100--8800  uuaall))

huh, here I'm seeing another kind of problem: the same char is
repeated two times (for example BBOO is written instead of BIOS). I'm
quite sure (but I will triple check shortly) the the console driver
doesn't do this (it's quite an easy loop in max3100_console_work, I
will check wit a printascii to another serial port what is sent). So I
guess the problem can be in the SPI master driver: is it correctly
handling the CS changes? Isn't it having problems with DMA for example
(I always found DMA for small data transfer (such 2 bytes in this
case) problematic (for example many platforms just do it on a 4 byte
boundary))? Have you tested it with other devices?

-- 
Christian Pellegrin, see http://www.evolware.org/chri/
Real Programmers don't play tennis, or any other sport which requires
you to change clothes. Mountain climbing is OK, and Real Programmers
wear their climbing boots to work in case a mountain should suddenly
spring up in the middle of the computer room.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-03-30 Thread Feng Tang
On Tue, 30 Mar 2010 14:49:57 +0800
christian pellegrin chrip...@fsfe.org wrote:

 On Tue, Mar 30, 2010 at 4:14 AM, Feng Tang feng.t...@intel.com
 wrote:
  On Mon, 29 Mar 2010 20:55:51 +0800
 
  When the HW works at 230400 bps, when we copy bulk data to the
  console, max3110 will probably receive about 20K characters, so the
  time for handling each char is 50us. If you use one char per spi
  transfer, it will have to execute 20K spi_transfer in one second,
  and each need be done in 50us, in this 50us you'll have to deal
  with controller IRQ handler + spi system overhead + tty receiving
  overhead. Is it a little over-killing to use one char per spi
  transfer? while the HW does have a 8 words RX FIFO
 
 this is too hackish, max3100 wasn't conceived this way. Let me point
 some problems of such an approach:
 
 1) latency on rx chars becomes very high because we can process
 incoming transfers only after a full 8 byte (or whatever the spi
 transfer dimension is). For a 9600 this is too much.
 
 2) even worse is that we can do flow control decision only on such
 boundary.
 
 3) this is not reliable: think of what happens if the actual SPI
 transfer speed we get will be slower that the one we requested. We
 won't be emptying the RX buffer fastly enough even if could.
 
 4) for low speeds we are going to monopolize the SPI bus for ages.

No words in my last email suggested to use dynamic SPI clock speed,
I admited that method is brutal and not mature when my driver was
reviewed. In early version of our 3110 driver, we don't use dynamic
clock speed, but the maxim clock supports by 3110, while we still use 
8 words buffer read for RX.

Let me cut my question clear: why not use the 8 words RX HW fifo? For
bulk RX in 230400 case, is 50us enough to handle controller IRQ +
spi subsystem overhead + tty receive overhead for one char per spi
transfer model?

 
 So I'm rather convinced that the SPI transfer has to happen as soon as
 possible at a SPI device driver level without any guess on how the
 data will be clocked out. I suggest you to improve the dw_spi driver
 for better performance.
Will consider that, thanks

 
  [[    ..00  iiuu  eessoo  44rr11((eeggffnn--77
   ggccvvrriinn11((bbnnuu1144bbnnuu [[    ..00
   BBOO--8800  1100--8800  uuaall))
 
 huh, here I'm seeing another kind of problem: the same char is
 repeated two times (for example BBOO is written instead of BIOS). I'm
 quite sure (but I will triple check shortly) the the console driver
 doesn't do this (it's quite an easy loop in max3100_console_work, I
 will check wit a printascii to another serial port what is sent). So I
 guess the problem can be in the SPI master driver: is it correctly
 handling the CS changes? Isn't it having problems with DMA for example
 (I always found DMA for small data transfer (such 2 bytes in this
 case) problematic (for example many platforms just do it on a 4 byte
 boundary))? Have you tested it with other devices?
 
Yes, the controller HW has a register to chose slave devices and thus
the HW CS handling is transparent to software. The controller supports several
slave devices connecting to it simultaneously, one is a 3G modem whose
throughput is about several Mega bits. The current in-tree controller driver
lacks DMA part, which is rejected as the related DMA controller driver never
show up publicly.



--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-03-30 Thread christian pellegrin
On Tue, Mar 30, 2010 at 9:19 AM, Feng Tang feng.t...@intel.com wrote:

 No words in my last email suggested to use dynamic SPI clock speed,
 I admited that method is brutal and not mature when my driver was
 reviewed. In early version of our 3110 driver, we don't use dynamic
 clock speed, but the maxim clock supports by 3110, while we still use
 8 words buffer read for RX.

 Let me cut my question clear: why not use the 8 words RX HW fifo? For
 bulk RX in 230400 case, is 50us enough to handle controller IRQ +
 spi subsystem overhead + tty receive overhead for one char per spi
 transfer model?


For TX it's impossible. Before sending a char we have to check if T
bit says that the TX buffer is empty. Otherwise if SPI bus speed 
UART baud rate we will loose TX chars for sure. For RX maybe it might
be worth to always do 8 byte transfers and the look at R bit of
transfer i to see if we have a valid char in (i+1) instead of doing
single transfers. But here again we are fixing the insane latency per
start of transfer of the underlying SPI master controller. And let me
say that in the actual driver the 8 byte RX fifo *is* used: the loop
in max3100_ist drains RX buffer at a speed greater than the chars
coming (also at 115200). The 8-bytes RX buffer fills as a consequence
of delays in interrupt thread being awaken.

The tests I did with zmodem show that in the direction max3100 - PC
has very good efficiency compared to max3100 - PC. I guess the reason
is that the tx buffer is just 1 byte deep and we don't have any other
way to handle the T bit other than the do loop in max3100_ist. Maybe I
will try to do some instrumentation with systemtap to show this (it's
always nice to find some uses for systemtap). Anyway I'm sure that
it's always better to not loose chars being received than delaying
transmitted one for a bit.

-- 
Christian Pellegrin, see http://www.evolware.org/chri/
Real Programmers don't play tennis, or any other sport which requires
you to change clothes. Mountain climbing is OK, and Real Programmers
wear their climbing boots to work in case a mountain should suddenly
spring up in the middle of the computer room.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-03-30 Thread Alan Cox
 1) latency on rx chars becomes very high because we can process
 incoming transfers only after a full 8 byte (or whatever the spi
 transfer dimension is). For a 9600 this is too much.

There I would partly disagree. Fixing the spi driver clearly makes sense
but the serial driver should be batching better. Is there a reason the
driver couldn't adjust the size based upon the tty speed when getting a
termios request ?

 2) even worse is that we can do flow control decision only on such boundary.

For serial flow control it doesn't matter, its implicitly asynchronous
and if you only turn the fifo on at high speed you response time will be
reasonably bounded.

 3) this is not reliable: think of what happens if the actual SPI
 transfer speed we get will be slower that the one we requested. We
 won't be emptying the RX buffer fastly enough even if could.

Consoles are not usually balanced for I/O. I grant you probably don't
want to be using full fifo sized blocks but I'm not sure I understand why
there is a problem below that ?

Alan

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-03-30 Thread christian pellegrin
On Tue, Mar 30, 2010 at 10:46 AM, Alan Cox a...@lxorguk.ukuu.org.uk wrote:

 There I would partly disagree. Fixing the spi driver clearly makes sense
 but the serial driver should be batching better. Is there a reason the
 driver couldn't adjust the size based upon the tty speed when getting a
 termios request ?

Just a small note for those who don't know max31x0. There are just 2
instructions to tx/rx data: simple data rx (in which we read one data
from the fifo (the data is valid if R status bit is set), control
lines (CTS) and status bits) and a combined tx/rx (like before but we
also send one byte (if the T status bit the char will be accepted) and
set control lines (RTS)). Every of these instructions is 2 bytes long.
It's a full SPI message (so you have to assert CS before and de-assert
it afterwards to make it work). TX buffer is just 1 byte deep, RX one
is 8 bytes.

Unfortunately we cannot batch TX if SPI speed (in chars going in /
time unit) is higher than the baud rate of serial communication. We
need to check that T bis is asserted before sending a char, otherwise
it is lost. For low baud rates doing this means monopolizing the bus.
And there is a subtle problem if we reduce SPI speed to the UART baud
rate: we can ask only for a maximum speed on the SPI. In case this is
lower the one we asked (maybe because of SPI clock divisors) and we do
a TX batch big enough we could not be emptying the RX FIFO fast enough
and so we risk to loose chars in a full duplex operation (*).

We could do batch RX operation but perhaps we are just transferring
bytes for nothing because we don't know in advance if the R bit is
set. And we cannot interleave TX and RX like current drivers do
because we are forced to use the simple rx operation and not the
combined rx/tx.

There's another point about batching: single instructions are complete
SPI messages. Many SPI master drivers I worked with (S3C24xx, AT91
because of a bug in the hardware) manage CS as a normal gpio. Another
problem here is that many devices require a configurable delay on CS
change (see delay_usecs field of spi_transfer) which is not handle by
hardware and so the drivers must somehow break a SPI messages in
single transfers and do something at the end of every one of these. So
even we batch 10 tx/rx instructions I really don't think many SPI
master controllers will be able to do a DMA of 20 bytes. Anyway I
agree that if we find a way of doing batching it's a good thing
because better controllers can benefit.


 2) even worse is that we can do flow control decision only on such boundary.

 For serial flow control it doesn't matter, its implicitly asynchronous
 and if you only turn the fifo on at high speed you response time will be
 reasonably bounded.


if we somehow manage to batch a TX of n bytes and after the first one
CTS changes we are going to send n-1 chars anyway. OK, maybe our peer
de-asserted CTS when still having n-1 byes of buffer free and somehow
higher layer protocols will recover. So I agree this could be not so
worrisome. But we should keep n small enough.

 3) this is not reliable: think of what happens if the actual SPI
 transfer speed we get will be slower that the one we requested. We
 won't be emptying the RX buffer fastly enough even if could.

 Consoles are not usually balanced for I/O. I grant you probably don't
 want to be using full fifo sized blocks but I'm not sure I understand why
 there is a problem below that ?


My concern is expressed above (*). Perhaps it's me missing some point.
To make it clear: I'm more than happy to test a driver that takes
another approach and switch to it if it's better (but it has to have
support of multiple instances and a fair usage of SPI bandwidth at
least). But I really don't see a reliable way of adding batching of
tx/rx. And I think it will be better to provide a patch to the current
driver than to rewrite it completely.

-- 
Christian Pellegrin, see http://www.evolware.org/chri/
Real Programmers don't play tennis, or any other sport which requires
you to change clothes. Mountain climbing is OK, and Real Programmers
wear their climbing boots to work in case a mountain should suddenly
spring up in the middle of the computer room.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-03-29 Thread christian pellegrin
On Mon, Mar 29, 2010 at 4:48 AM, Feng Tang feng.t...@intel.com wrote:
 Hi,

Hi,


 I modified the code a little and run it on our HW platform, it really show
 some sigh of life: it can boots to console (the print format is not so good),
 I can input command and it execute correctly, but very slow, I type 3
 characters and it takes about 2 seconds to echo back on screen and start the
 execution, and after about 1 minute, the console hang there and input stopped
 to work.

never seen such a behavior. Which platform are you using? Which SPI
driver? Do you have a low level printk (printascii) that puts output
somewhere else so I can send you a patch with some debugging output?
Can you log in some other way (like via network) and see if the CPU
load is at 100% for some reason?

 This patch adds console support for the MAX3100 UART
 (console=ttyMAX0,11500). The SPI subsystem and an

 115200?


ack

 Does this imply to have to work with HW flow control? on my platform
 I have to remove the RTS bit to make it work.


no, I put RTS on  because it looks like a good default. I can make it
configurable. I just noticed on the data sheet that RTS is actually
inverted so a more sensible default would be to put it off. For
testing you should have flow control set to none on the machine you
are using as a terminal emulator.

 +             max3100_sr(s, tx, rx);

 It doesn't handle received characters here? If the console is printing out
 a bulk of message while user input some command, the command may be ignored.
 Myself have met the same problem in our driver.


yes but I think it's quite difficult to solve this problem in every
case. Console output is massively used only on boot when the user is
not supposed to type a lot.

 +     if (next != s-console_tail) {
 +             s-console_buf[next] = ch;
 +             s-console_head = next;
 +     }

 Also I saw max3100_sr() uses cpu_to_be16() and be16_to_cpu(), is it really
 necessary, our platform is little-endian(x86), and I have to disable them
 to make the code work. Is your test platform big-endian?


Have you configured your SPI controller as LSB first somehow, haven't
you? BTW my platform is a quite usual ARM9 S3C2440 which is little
endian.


-- 
Christian Pellegrin, see http://www.evolware.org/chri/
Real Programmers don't play tennis, or any other sport which requires
you to change clothes. Mountain climbing is OK, and Real Programmers
wear their climbing boots to work in case a mountain should suddenly
spring up in the middle of the computer room.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-03-29 Thread Feng Tang
On Mon, 29 Mar 2010 14:11:15 +0800
christian pellegrin chrip...@fsfe.org wrote:

  I modified the code a little and run it on our HW platform, it
  really show some sigh of life: it can boots to console (the print
  format is not so good), I can input command and it execute
  correctly, but very slow, I type 3 characters and it takes about 2
  seconds to echo back on screen and start the execution, and after
  about 1 minute, the console hang there and input stopped to work.
 
 never seen such a behavior. Which platform are you using? Which SPI
 driver? Do you have a low level printk (printascii) that puts output
 somewhere else so I can send you a patch with some debugging output?
 Can you log in some other way (like via network) and see if the CPU
 load is at 100% for some reason?

Hi,

Our platform is Intel Moorestown platform, and use a spi controller core
from Designware (drivers/spi/dw_*.c). I know the problem may probably be
caused by my setting, but the dw_spi driver works fine with our own
3110 driver.

For debug method, sadly I don't get another output port yet, but if you
have some debug patch, that's great, it will help when I find another debug
output than max3110.

 
  +             max3100_sr(s, tx, rx);
 
  It doesn't handle received characters here? If the console is
  printing out a bulk of message while user input some command, the
  command may be ignored. Myself have met the same problem in our
  driver.
 
 
 yes but I think it's quite difficult to solve this problem in every
 case. Console output is massively used only on boot when the user is
 not supposed to type a lot.

It's difficult but not impossible, actually our driver checks every word
read back and handle it if it contains a valid data

 
  +     if (next != s-console_tail) {
  +             s-console_buf[next] = ch;
  +             s-console_head = next;
  +     }
 
  Also I saw max3100_sr() uses cpu_to_be16() and be16_to_cpu(), is it
  really necessary, our platform is little-endian(x86), and I have to
  disable them to make the code work. Is your test platform
  big-endian?
 
 
 Have you configured your SPI controller as LSB first somehow, haven't
 you? BTW my platform is a quite usual ARM9 S3C2440 which is little
 endian.

yeah, you hit the point that our spi controller is LSB naturally (not
configured to), here may need a check for whether to do a swap


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-03-29 Thread christian pellegrin
On Mon, Mar 29, 2010 at 9:06 AM, Feng Tang feng.t...@intel.com wrote:

 Our platform is Intel Moorestown platform, and use a spi controller core
 from Designware (drivers/spi/dw_*.c). I know the problem may probably be
 caused by my setting, but the dw_spi driver works fine with our own
 3110 driver.

I had a look at the dw_spi driver. The spi_transfer path queues some
work to a worqueue that itself schedules a tasklet. I don't think this
is good for latency, I won't bet that such an architecture could
deliver good performance. Now I see why you needed to do only big fat
SPI transfers. Anyway this doesn't justify the 2 seconds delay between
chars coming in and going out through the max31x0 you are seeing. I
will try to analyze what's going on. BTW is only input slow or output
is sluggish too? The initial messages from the console are coming out
fast? If you disable the MAX3110 for console but you use just as a
normal terminal (set console=/dev/null in the kernel command line and
add getty 115200 /dev/ttyMAX0 in iniitab) is the interaction with the
system fine? Thanks for helping sorting out this.


 yes but I think it's quite difficult to solve this problem in every
 case. Console output is massively used only on boot when the user is
 not supposed to type a lot.

 It's difficult but not impossible, actually our driver checks every word
 read back and handle it if it contains a valid data


Of course it is possible, I just wanted to keep the max3100 a small
clean driver. Unfortunately console and serial drivers are two
different beings in the Linux kernel, but the max3100 implements the
tx-rx in one indivisible instruction (that is slow compared to
registers IO and has to be called in an preemptible contex for added
fun).  To implement what you are saying we need:

1) the console code has to check if the serial port associated to the
same physical max3100 is up (console driver start their life much
before serial ones).

2) if yes send data to the tty associated to the serial driver.
Locking is needed here.

I will implement this ASAP.

 Have you configured your SPI controller as LSB first somehow, haven't
 you? BTW my platform is a quite usual ARM9 S3C2440 which is little
 endian.

 yeah, you hit the point that our spi controller is LSB naturally (not
 configured to), here may need a check for whether to do a swap


ok, I think the dw_spi driver has to be fixed.


-- 
Christian Pellegrin, see http://www.evolware.org/chri/
Real Programmers don't play tennis, or any other sport which requires
you to change clothes. Mountain climbing is OK, and Real Programmers
wear their climbing boots to work in case a mountain should suddenly
spring up in the middle of the computer room.

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general


Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100

2010-03-29 Thread Feng Tang
On Mon, 29 Mar 2010 20:55:51 +0800
christian pellegrin chrip...@fsfe.org wrote:

 On Mon, Mar 29, 2010 at 9:06 AM, Feng Tang feng.t...@intel.com
 wrote:
 
  Our platform is Intel Moorestown platform, and use a spi controller
  core from Designware (drivers/spi/dw_*.c). I know the problem may
  probably be caused by my setting, but the dw_spi driver works fine
  with our own 3110 driver.
 
 I had a look at the dw_spi driver. The spi_transfer path queues some
 work to a worqueue that itself schedules a tasklet. I don't think this
 is good for latency, I won't bet that such an architecture could
 deliver good performance. Now I see why you needed to do only big fat
 SPI transfers. Anyway this doesn't justify the 2 seconds delay between
Hi

DW controller driver don't need max3110 driver to use big spi transfer,
the early version of our max3110 is also one word per spi transfer mode,
and the 2 drivers work fine, though the rx performance is not good (copy
 paste a bulk message to console).

Let me use some example to explain why I use big spi transfer for 3110:

When the HW works at 230400 bps, when we copy bulk data to the console,
max3110 will probably receive about 20K characters, so the time for
handling each char is 50us. If you use one char per spi transfer, it
will have to execute 20K spi_transfer in one second, and each need be
done in 50us, in this 50us you'll have to deal with controller IRQ 
handler + spi system overhead + tty receiving overhead. Is it a little
over-killing to use one char per spi transfer? while the HW does
have a 8 words RX FIFO 

 chars coming in and going out through the max31x0 you are seeing. I
 will try to analyze what's going on. BTW is only input slow or output
 is sluggish too? The initial messages from the console are coming out
 fast? If you disable the MAX3110 for console but you use just as a
 normal terminal (set console=/dev/null in the kernel command line and
 add getty 115200 /dev/ttyMAX0 in iniitab) is the interaction with the
 system fine? Thanks for helping sorting out this.

The output is not so slow as input, if we set the console=/dev/ttyMAX0,
the output is sluggish, looks like below, but when enter command line
console the output is smooth, while the input is very slow.

--
   kernel print log of max3100 over Moorestown platform
--

0.00] Moorestown CPU Lincroft identified
[[..00  iiuu  eessoo  44rr11((eeggffnn--77  
ggccvvrriinn11((bbnnuu1144bbnnuu
[[..00  BBOO--8800  1100--8800  uuaall))


]]  IISSee22::00
11 - 0f200 (usabl))
   [ .000  BIOS-e822:: 
022000110
00((eserved)
[0.00]  BIISSe820:: ee0 --0fec01000 (reseree))

  
]]  IISSee200
[[..00  BBOO--8800  ff00--  rrssrree))

  
]]NNttcc::NN
[[..00  8800uuddtt  aagg::((ssbbee  
==  rrssrree))


  00
00]]ee22  eeooeerrnnee  00aa--0011  uuaall))


]]llss__ffxx
2200mmxxaacc__ffxx00

]]MMRR  eeaall  yyee  nnaahhbbee


]]MMRR  iiee  aa
[[..00--  rrtt--aakk

[[..00--  rrtt--aakk]]  AABBuuaall

]]MMRR  aaiibbeerrnnee  
nnbbee::




 1) the console code has to check if the serial port associated to the
 same physical max3100 is up (console driver start their life much
 before serial ones).
 
 2) if yes send data to the tty associated to the serial driver.
 Locking is needed here.
 
 I will implement this ASAP.

cool!


--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
spi-devel-general mailing list
spi-devel-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/spi-devel-general