Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Mon, Mar 29, 2010 at 4:48 AM, Feng Tang feng.t...@intel.com wrote: + tx |= MAX3100_WD | MAX3100_RTS; Does this imply to have to work with HW flow control? on my platform I have to remove the RTS bit to make it work. Finally I had time to check this. If you compare the 8250 (or similar) data-sheet with the MAX31x0 one you see that the handling of the RTX/CTS bits is exactly the same (8250 about RTS bit for example: When any of these bits are cleared, the associated output is forced high. and MAX3110: Request-to-Send Bit. Controls the state of the RTS output. This bit is reset on power-up (RTS bit = 0 sets the RTS pin = logic high).). If you look at the 8250.c driver (grep for UART_MCR_RTS) you notice that the bit is set on device open (together with DTR of course). So I think the driver is doing the right thing here. -- Christian Pellegrin, see http://www.evolware.org/chri/ Real Programmers don't play tennis, or any other sport which requires you to change clothes. Mountain climbing is OK, and Real Programmers wear their climbing boots to work in case a mountain should suddenly spring up in the middle of the computer room. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
Hi, first of all I'm sorry for the late response, just now I found time to work on this. On Wed, Mar 31, 2010 at 8:04 AM, Grant Likely grant.lik...@secretlab.ca wrote: Is checking the T bit really necessary if Linux instead tracks the timing between bytes internally? ie. If the driver uses an hrtimer to schedule the submission of SPI transfers such that the time between SPI submissions is always greater than the time required for a serial character to be transmitted? I think you have to check it anyway. For example the SPI bus may be shared with another device so we don't know when our char will be sent (it might be delayed for more than the duration of a char being sent on the serial line if the initial execution of the SPI command is delayed). But using a hrtimer will be for sure more fair than polling T bit as far as resource usage is concerned. I was always hesitant about using hrtimers: I really don't know if all platforms support them with the needed granularity (at 115200 a char takes around 100us) and the aren't many users of them in the drivers directory (quite all of them are in the staging one). But it's definitely a good idea if hrtimers do work. I'll make some tests. You may be able to set this up into a free-running state machine. Submit the SPI transfers asynchronously, and use a callback to be notified when the transfer is completed. In most cases, the transfer will be completed every time, but if it is not, then the state machine can just wait another time period before submitting. By doing it this way, all the work can be handled at the atomic context without any workqueues or threaded IRQ handlers. yes a completely async design could improve performance (the greatest culprit for low performance (not mentioning slow SPI master drivers) is the latency in the delayed work being started). When I first wrote this driver I wanted to keep it simple so I was a bit frightened by a state-machine like design, but it can be done for sure. My concern here is: everything is the kernel is moving to doing as much as possible in a delayed work mechanics (see the introduction of threaded interrupts (which could became the default) or the coming soon death of IRQF_DISABLED). Is doing a big part of the work (of course I would use spi_async directly in the interrupt handler and handle the incoming/outgoing chars in spi_async callback which is usually called in an interrupt context) in an interrupt context antihistorical, isn't it? BTW: both of the design changes you mentioned seem sensible to me for better performance of the driver. But they don't do any form of batching and won't help if the underlying SPI master driver uses some form of delayed work itself. -- Christian Pellegrin, see http://www.evolware.org/chri/ Real Programmers don't play tennis, or any other sport which requires you to change clothes. Mountain climbing is OK, and Real Programmers wear their climbing boots to work in case a mountain should suddenly spring up in the middle of the computer room. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Mon, Apr 5, 2010 at 12:19 PM, christian pellegrin chrip...@fsfe.org wrote: Hi, first of all I'm sorry for the late response, just now I found time to work on this. On Wed, Mar 31, 2010 at 8:04 AM, Grant Likely grant.lik...@secretlab.ca wrote: Is checking the T bit really necessary if Linux instead tracks the timing between bytes internally? ie. If the driver uses an hrtimer to schedule the submission of SPI transfers such that the time between SPI submissions is always greater than the time required for a serial character to be transmitted? I think you have to check it anyway. For example the SPI bus may be shared with another device so we don't know when our char will be sent (it might be delayed for more than the duration of a char being sent on the serial line if the initial execution of the SPI command is delayed). But using a hrtimer will be for sure more fair than polling T bit as far as resource usage is concerned. I was always hesitant about using hrtimers: I really don't know if all platforms support them with the needed granularity (at 115200 a char takes around 100us) and the aren't many users of them in the drivers directory (quite all of them are in the staging one). But it's definitely a good idea if hrtimers do work. I'll make some tests. Another thing to think about: this device is never going to be high performance. If tx gets delayed, then it isn't a big deal because there won't be any data loss. The real concern is RX where if you don't process fast enough then characters get dropped. Fortunately you have an 8 byte deep buffer. So if each timer fire schedules more than one transfer (even if only one char is actually transmitted), then it is okay if the timer granularity is 300-400us at 115200. You may be able to set this up into a free-running state machine. Submit the SPI transfers asynchronously, and use a callback to be notified when the transfer is completed. In most cases, the transfer will be completed every time, but if it is not, then the state machine can just wait another time period before submitting. By doing it this way, all the work can be handled at the atomic context without any workqueues or threaded IRQ handlers. yes a completely async design could improve performance (the greatest culprit for low performance (not mentioning slow SPI master drivers) is the latency in the delayed work being started). When I first wrote this driver I wanted to keep it simple so I was a bit frightened by a state-machine like design, but it can be done for sure. My concern here is: everything is the kernel is moving to doing as much as possible in a delayed work mechanics (see the introduction of threaded interrupts (which could became the default) or the coming soon death of IRQF_DISABLED). Is doing a big part of the work (of course I would use spi_async directly in the interrupt handler and handle the incoming/outgoing chars in spi_async callback which is usually called in an interrupt context) in an interrupt context antihistorical, isn't it? If you only use spi_async (which must be atomic), and you don't do any heavy data processing or udelay()s in the irq path, then your driver will not be adding major latency. Use a ring buffer for both the tx and rx paths so that the console processing can be done in a workqueue or something and you should be just fine. BTW: both of the design changes you mentioned seem sensible to me for better performance of the driver. But they don't do any form of batching and won't help if the underlying SPI master driver uses some form of delayed work itself. Then the SPI master driver is broken. It must be fixed. :-) Submission of SPI transfers via spi_async() is supposed to always be atomic. g. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Tue, Mar 30, 2010 at 6:03 AM, christian pellegrin chrip...@fsfe.org wrote: On Tue, Mar 30, 2010 at 10:46 AM, Alan Cox a...@lxorguk.ukuu.org.uk wrote: There I would partly disagree. Fixing the spi driver clearly makes sense but the serial driver should be batching better. Is there a reason the driver couldn't adjust the size based upon the tty speed when getting a termios request ? Just a small note for those who don't know max31x0. There are just 2 instructions to tx/rx data: simple data rx (in which we read one data from the fifo (the data is valid if R status bit is set), control lines (CTS) and status bits) and a combined tx/rx (like before but we also send one byte (if the T status bit the char will be accepted) and set control lines (RTS)). Every of these instructions is 2 bytes long. It's a full SPI message (so you have to assert CS before and de-assert it afterwards to make it work). TX buffer is just 1 byte deep, RX one is 8 bytes. Is checking the T bit really necessary if Linux instead tracks the timing between bytes internally? ie. If the driver uses an hrtimer to schedule the submission of SPI transfers such that the time between SPI submissions is always greater than the time required for a serial character to be transmitted? You may be able to set this up into a free-running state machine. Submit the SPI transfers asynchronously, and use a callback to be notified when the transfer is completed. In most cases, the transfer will be completed every time, but if it is not, then the state machine can just wait another time period before submitting. By doing it this way, all the work can be handled at the atomic context without any workqueues or threaded IRQ handlers. Here's some draft )completely untested and uncompiled) code for the bottom half: The time rate would be calculated when the baud rate is set. I've glossed over a bunch of details and corner cases, but you get the idea... void max3100_spi_complete(void *__max3100) { struct max3100_port *max3100 = __max3100; spin_lock(max3100-lock); max3100-busy = 0; /* Process received characters here. Schedule a workqueue if needed. A ring buffer would probably work well. */ spin_unlock(max3100-lock); } enum hrtimer_restart max3100_hrtimer_callback( struct hrtimer *timer ) { struct max3100_port *max3100 = container_of(timer, struct max3100_port, hrtimer); if (max3100-busy) return HRTIMER_RESTART; /* still busy, try again later */ spin_lock(max3100-lock); /* The following 7 lines can probably actually be done once at driver probe time */ spi_message_init(max3100-spi_msg); spi_message_add_tail(max3100-spi_tran); max3100-spi_tran.tx_buf = mac3100-tx_buf; max3100-spi_tran.rx_buf = mac3100-rx_buf; max3100-spi_tran.len = 2; /* can this be larger to receive multiple bytes in 1 transfer? */ max3100-spi_msg.complete = max3100_spi_complete; max3100-spi_msg.context = max3100; /* Alternately, a bunch of transfers could be queued up with only the first actual * containing tx data. The rest would be for RX */ if (max3100-tx_pending) { /* a ring buffer may be a better idea here */ max3100-tx_buf[0] = max3100-tx_pending_data; max3100-tx_pending = 0; } spi_async(max3100-spi_msg); max3100-busy = 1; /* cleared by the completion callback */ spin_unlock(max3100-lock); return HRTIMER_RESTART; } g. -- Grant Likely, B.Sc., P.Eng. Secret Lab Technologies Ltd. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Tue, Mar 30, 2010 at 4:14 AM, Feng Tang feng.t...@intel.com wrote: On Mon, 29 Mar 2010 20:55:51 +0800 DW controller driver don't need max3110 driver to use big spi transfer, the early version of our max3110 is also one word per spi transfer mode, and the 2 drivers work fine, though the rx performance is not good (copy paste a bulk message to console). What I'm saying is that the reason of bad performance is that the underlying SPI driver has bad performance. A SPI master driver should do as little work as possible. For small transfers (2 bytes == 16 bits) at 5MHz (== 3.2 us) it's much better to spin in spi_transfer waiting for SPI done than scheduling anything or setting up a DMA transfer. Especially if you do *2* task schedule (spi_transfer queues a workqueue that schedules a tasklet) be prepared for *big* latency. I hope other people can comment on this if I'm saying something wrong. When the HW works at 230400 bps, when we copy bulk data to the console, max3110 will probably receive about 20K characters, so the time for handling each char is 50us. If you use one char per spi transfer, it will have to execute 20K spi_transfer in one second, and each need be done in 50us, in this 50us you'll have to deal with controller IRQ handler + spi system overhead + tty receiving overhead. Is it a little over-killing to use one char per spi transfer? while the HW does have a 8 words RX FIFO this is too hackish, max3100 wasn't conceived this way. Let me point some problems of such an approach: 1) latency on rx chars becomes very high because we can process incoming transfers only after a full 8 byte (or whatever the spi transfer dimension is). For a 9600 this is too much. 2) even worse is that we can do flow control decision only on such boundary. 3) this is not reliable: think of what happens if the actual SPI transfer speed we get will be slower that the one we requested. We won't be emptying the RX buffer fastly enough even if could. 4) for low speeds we are going to monopolize the SPI bus for ages. So I'm rather convinced that the SPI transfer has to happen as soon as possible at a SPI device driver level without any guess on how the data will be clocked out. I suggest you to improve the dw_spi driver for better performance. [[ ..00 iiuu eessoo 44rr11((eeggffnn--77 ggccvvrriinn11((bbnnuu1144bbnnuu [[ ..00 BBOO--8800 1100--8800 uuaall)) huh, here I'm seeing another kind of problem: the same char is repeated two times (for example BBOO is written instead of BIOS). I'm quite sure (but I will triple check shortly) the the console driver doesn't do this (it's quite an easy loop in max3100_console_work, I will check wit a printascii to another serial port what is sent). So I guess the problem can be in the SPI master driver: is it correctly handling the CS changes? Isn't it having problems with DMA for example (I always found DMA for small data transfer (such 2 bytes in this case) problematic (for example many platforms just do it on a 4 byte boundary))? Have you tested it with other devices? -- Christian Pellegrin, see http://www.evolware.org/chri/ Real Programmers don't play tennis, or any other sport which requires you to change clothes. Mountain climbing is OK, and Real Programmers wear their climbing boots to work in case a mountain should suddenly spring up in the middle of the computer room. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Tue, 30 Mar 2010 14:49:57 +0800 christian pellegrin chrip...@fsfe.org wrote: On Tue, Mar 30, 2010 at 4:14 AM, Feng Tang feng.t...@intel.com wrote: On Mon, 29 Mar 2010 20:55:51 +0800 When the HW works at 230400 bps, when we copy bulk data to the console, max3110 will probably receive about 20K characters, so the time for handling each char is 50us. If you use one char per spi transfer, it will have to execute 20K spi_transfer in one second, and each need be done in 50us, in this 50us you'll have to deal with controller IRQ handler + spi system overhead + tty receiving overhead. Is it a little over-killing to use one char per spi transfer? while the HW does have a 8 words RX FIFO this is too hackish, max3100 wasn't conceived this way. Let me point some problems of such an approach: 1) latency on rx chars becomes very high because we can process incoming transfers only after a full 8 byte (or whatever the spi transfer dimension is). For a 9600 this is too much. 2) even worse is that we can do flow control decision only on such boundary. 3) this is not reliable: think of what happens if the actual SPI transfer speed we get will be slower that the one we requested. We won't be emptying the RX buffer fastly enough even if could. 4) for low speeds we are going to monopolize the SPI bus for ages. No words in my last email suggested to use dynamic SPI clock speed, I admited that method is brutal and not mature when my driver was reviewed. In early version of our 3110 driver, we don't use dynamic clock speed, but the maxim clock supports by 3110, while we still use 8 words buffer read for RX. Let me cut my question clear: why not use the 8 words RX HW fifo? For bulk RX in 230400 case, is 50us enough to handle controller IRQ + spi subsystem overhead + tty receive overhead for one char per spi transfer model? So I'm rather convinced that the SPI transfer has to happen as soon as possible at a SPI device driver level without any guess on how the data will be clocked out. I suggest you to improve the dw_spi driver for better performance. Will consider that, thanks [[ ..00 iiuu eessoo 44rr11((eeggffnn--77 ggccvvrriinn11((bbnnuu1144bbnnuu [[ ..00 BBOO--8800 1100--8800 uuaall)) huh, here I'm seeing another kind of problem: the same char is repeated two times (for example BBOO is written instead of BIOS). I'm quite sure (but I will triple check shortly) the the console driver doesn't do this (it's quite an easy loop in max3100_console_work, I will check wit a printascii to another serial port what is sent). So I guess the problem can be in the SPI master driver: is it correctly handling the CS changes? Isn't it having problems with DMA for example (I always found DMA for small data transfer (such 2 bytes in this case) problematic (for example many platforms just do it on a 4 byte boundary))? Have you tested it with other devices? Yes, the controller HW has a register to chose slave devices and thus the HW CS handling is transparent to software. The controller supports several slave devices connecting to it simultaneously, one is a 3G modem whose throughput is about several Mega bits. The current in-tree controller driver lacks DMA part, which is rejected as the related DMA controller driver never show up publicly. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Tue, Mar 30, 2010 at 9:19 AM, Feng Tang feng.t...@intel.com wrote: No words in my last email suggested to use dynamic SPI clock speed, I admited that method is brutal and not mature when my driver was reviewed. In early version of our 3110 driver, we don't use dynamic clock speed, but the maxim clock supports by 3110, while we still use 8 words buffer read for RX. Let me cut my question clear: why not use the 8 words RX HW fifo? For bulk RX in 230400 case, is 50us enough to handle controller IRQ + spi subsystem overhead + tty receive overhead for one char per spi transfer model? For TX it's impossible. Before sending a char we have to check if T bit says that the TX buffer is empty. Otherwise if SPI bus speed UART baud rate we will loose TX chars for sure. For RX maybe it might be worth to always do 8 byte transfers and the look at R bit of transfer i to see if we have a valid char in (i+1) instead of doing single transfers. But here again we are fixing the insane latency per start of transfer of the underlying SPI master controller. And let me say that in the actual driver the 8 byte RX fifo *is* used: the loop in max3100_ist drains RX buffer at a speed greater than the chars coming (also at 115200). The 8-bytes RX buffer fills as a consequence of delays in interrupt thread being awaken. The tests I did with zmodem show that in the direction max3100 - PC has very good efficiency compared to max3100 - PC. I guess the reason is that the tx buffer is just 1 byte deep and we don't have any other way to handle the T bit other than the do loop in max3100_ist. Maybe I will try to do some instrumentation with systemtap to show this (it's always nice to find some uses for systemtap). Anyway I'm sure that it's always better to not loose chars being received than delaying transmitted one for a bit. -- Christian Pellegrin, see http://www.evolware.org/chri/ Real Programmers don't play tennis, or any other sport which requires you to change clothes. Mountain climbing is OK, and Real Programmers wear their climbing boots to work in case a mountain should suddenly spring up in the middle of the computer room. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
1) latency on rx chars becomes very high because we can process incoming transfers only after a full 8 byte (or whatever the spi transfer dimension is). For a 9600 this is too much. There I would partly disagree. Fixing the spi driver clearly makes sense but the serial driver should be batching better. Is there a reason the driver couldn't adjust the size based upon the tty speed when getting a termios request ? 2) even worse is that we can do flow control decision only on such boundary. For serial flow control it doesn't matter, its implicitly asynchronous and if you only turn the fifo on at high speed you response time will be reasonably bounded. 3) this is not reliable: think of what happens if the actual SPI transfer speed we get will be slower that the one we requested. We won't be emptying the RX buffer fastly enough even if could. Consoles are not usually balanced for I/O. I grant you probably don't want to be using full fifo sized blocks but I'm not sure I understand why there is a problem below that ? Alan -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Tue, Mar 30, 2010 at 10:46 AM, Alan Cox a...@lxorguk.ukuu.org.uk wrote: There I would partly disagree. Fixing the spi driver clearly makes sense but the serial driver should be batching better. Is there a reason the driver couldn't adjust the size based upon the tty speed when getting a termios request ? Just a small note for those who don't know max31x0. There are just 2 instructions to tx/rx data: simple data rx (in which we read one data from the fifo (the data is valid if R status bit is set), control lines (CTS) and status bits) and a combined tx/rx (like before but we also send one byte (if the T status bit the char will be accepted) and set control lines (RTS)). Every of these instructions is 2 bytes long. It's a full SPI message (so you have to assert CS before and de-assert it afterwards to make it work). TX buffer is just 1 byte deep, RX one is 8 bytes. Unfortunately we cannot batch TX if SPI speed (in chars going in / time unit) is higher than the baud rate of serial communication. We need to check that T bis is asserted before sending a char, otherwise it is lost. For low baud rates doing this means monopolizing the bus. And there is a subtle problem if we reduce SPI speed to the UART baud rate: we can ask only for a maximum speed on the SPI. In case this is lower the one we asked (maybe because of SPI clock divisors) and we do a TX batch big enough we could not be emptying the RX FIFO fast enough and so we risk to loose chars in a full duplex operation (*). We could do batch RX operation but perhaps we are just transferring bytes for nothing because we don't know in advance if the R bit is set. And we cannot interleave TX and RX like current drivers do because we are forced to use the simple rx operation and not the combined rx/tx. There's another point about batching: single instructions are complete SPI messages. Many SPI master drivers I worked with (S3C24xx, AT91 because of a bug in the hardware) manage CS as a normal gpio. Another problem here is that many devices require a configurable delay on CS change (see delay_usecs field of spi_transfer) which is not handle by hardware and so the drivers must somehow break a SPI messages in single transfers and do something at the end of every one of these. So even we batch 10 tx/rx instructions I really don't think many SPI master controllers will be able to do a DMA of 20 bytes. Anyway I agree that if we find a way of doing batching it's a good thing because better controllers can benefit. 2) even worse is that we can do flow control decision only on such boundary. For serial flow control it doesn't matter, its implicitly asynchronous and if you only turn the fifo on at high speed you response time will be reasonably bounded. if we somehow manage to batch a TX of n bytes and after the first one CTS changes we are going to send n-1 chars anyway. OK, maybe our peer de-asserted CTS when still having n-1 byes of buffer free and somehow higher layer protocols will recover. So I agree this could be not so worrisome. But we should keep n small enough. 3) this is not reliable: think of what happens if the actual SPI transfer speed we get will be slower that the one we requested. We won't be emptying the RX buffer fastly enough even if could. Consoles are not usually balanced for I/O. I grant you probably don't want to be using full fifo sized blocks but I'm not sure I understand why there is a problem below that ? My concern is expressed above (*). Perhaps it's me missing some point. To make it clear: I'm more than happy to test a driver that takes another approach and switch to it if it's better (but it has to have support of multiple instances and a fair usage of SPI bandwidth at least). But I really don't see a reliable way of adding batching of tx/rx. And I think it will be better to provide a patch to the current driver than to rewrite it completely. -- Christian Pellegrin, see http://www.evolware.org/chri/ Real Programmers don't play tennis, or any other sport which requires you to change clothes. Mountain climbing is OK, and Real Programmers wear their climbing boots to work in case a mountain should suddenly spring up in the middle of the computer room. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Mon, Mar 29, 2010 at 4:48 AM, Feng Tang feng.t...@intel.com wrote: Hi, Hi, I modified the code a little and run it on our HW platform, it really show some sigh of life: it can boots to console (the print format is not so good), I can input command and it execute correctly, but very slow, I type 3 characters and it takes about 2 seconds to echo back on screen and start the execution, and after about 1 minute, the console hang there and input stopped to work. never seen such a behavior. Which platform are you using? Which SPI driver? Do you have a low level printk (printascii) that puts output somewhere else so I can send you a patch with some debugging output? Can you log in some other way (like via network) and see if the CPU load is at 100% for some reason? This patch adds console support for the MAX3100 UART (console=ttyMAX0,11500). The SPI subsystem and an 115200? ack Does this imply to have to work with HW flow control? on my platform I have to remove the RTS bit to make it work. no, I put RTS on because it looks like a good default. I can make it configurable. I just noticed on the data sheet that RTS is actually inverted so a more sensible default would be to put it off. For testing you should have flow control set to none on the machine you are using as a terminal emulator. + max3100_sr(s, tx, rx); It doesn't handle received characters here? If the console is printing out a bulk of message while user input some command, the command may be ignored. Myself have met the same problem in our driver. yes but I think it's quite difficult to solve this problem in every case. Console output is massively used only on boot when the user is not supposed to type a lot. + if (next != s-console_tail) { + s-console_buf[next] = ch; + s-console_head = next; + } Also I saw max3100_sr() uses cpu_to_be16() and be16_to_cpu(), is it really necessary, our platform is little-endian(x86), and I have to disable them to make the code work. Is your test platform big-endian? Have you configured your SPI controller as LSB first somehow, haven't you? BTW my platform is a quite usual ARM9 S3C2440 which is little endian. -- Christian Pellegrin, see http://www.evolware.org/chri/ Real Programmers don't play tennis, or any other sport which requires you to change clothes. Mountain climbing is OK, and Real Programmers wear their climbing boots to work in case a mountain should suddenly spring up in the middle of the computer room. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Mon, 29 Mar 2010 14:11:15 +0800 christian pellegrin chrip...@fsfe.org wrote: I modified the code a little and run it on our HW platform, it really show some sigh of life: it can boots to console (the print format is not so good), I can input command and it execute correctly, but very slow, I type 3 characters and it takes about 2 seconds to echo back on screen and start the execution, and after about 1 minute, the console hang there and input stopped to work. never seen such a behavior. Which platform are you using? Which SPI driver? Do you have a low level printk (printascii) that puts output somewhere else so I can send you a patch with some debugging output? Can you log in some other way (like via network) and see if the CPU load is at 100% for some reason? Hi, Our platform is Intel Moorestown platform, and use a spi controller core from Designware (drivers/spi/dw_*.c). I know the problem may probably be caused by my setting, but the dw_spi driver works fine with our own 3110 driver. For debug method, sadly I don't get another output port yet, but if you have some debug patch, that's great, it will help when I find another debug output than max3110. + max3100_sr(s, tx, rx); It doesn't handle received characters here? If the console is printing out a bulk of message while user input some command, the command may be ignored. Myself have met the same problem in our driver. yes but I think it's quite difficult to solve this problem in every case. Console output is massively used only on boot when the user is not supposed to type a lot. It's difficult but not impossible, actually our driver checks every word read back and handle it if it contains a valid data + if (next != s-console_tail) { + s-console_buf[next] = ch; + s-console_head = next; + } Also I saw max3100_sr() uses cpu_to_be16() and be16_to_cpu(), is it really necessary, our platform is little-endian(x86), and I have to disable them to make the code work. Is your test platform big-endian? Have you configured your SPI controller as LSB first somehow, haven't you? BTW my platform is a quite usual ARM9 S3C2440 which is little endian. yeah, you hit the point that our spi controller is LSB naturally (not configured to), here may need a check for whether to do a swap -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Mon, Mar 29, 2010 at 9:06 AM, Feng Tang feng.t...@intel.com wrote: Our platform is Intel Moorestown platform, and use a spi controller core from Designware (drivers/spi/dw_*.c). I know the problem may probably be caused by my setting, but the dw_spi driver works fine with our own 3110 driver. I had a look at the dw_spi driver. The spi_transfer path queues some work to a worqueue that itself schedules a tasklet. I don't think this is good for latency, I won't bet that such an architecture could deliver good performance. Now I see why you needed to do only big fat SPI transfers. Anyway this doesn't justify the 2 seconds delay between chars coming in and going out through the max31x0 you are seeing. I will try to analyze what's going on. BTW is only input slow or output is sluggish too? The initial messages from the console are coming out fast? If you disable the MAX3110 for console but you use just as a normal terminal (set console=/dev/null in the kernel command line and add getty 115200 /dev/ttyMAX0 in iniitab) is the interaction with the system fine? Thanks for helping sorting out this. yes but I think it's quite difficult to solve this problem in every case. Console output is massively used only on boot when the user is not supposed to type a lot. It's difficult but not impossible, actually our driver checks every word read back and handle it if it contains a valid data Of course it is possible, I just wanted to keep the max3100 a small clean driver. Unfortunately console and serial drivers are two different beings in the Linux kernel, but the max3100 implements the tx-rx in one indivisible instruction (that is slow compared to registers IO and has to be called in an preemptible contex for added fun). To implement what you are saying we need: 1) the console code has to check if the serial port associated to the same physical max3100 is up (console driver start their life much before serial ones). 2) if yes send data to the tty associated to the serial driver. Locking is needed here. I will implement this ASAP. Have you configured your SPI controller as LSB first somehow, haven't you? BTW my platform is a quite usual ARM9 S3C2440 which is little endian. yeah, you hit the point that our spi controller is LSB naturally (not configured to), here may need a check for whether to do a swap ok, I think the dw_spi driver has to be fixed. -- Christian Pellegrin, see http://www.evolware.org/chri/ Real Programmers don't play tennis, or any other sport which requires you to change clothes. Mountain climbing is OK, and Real Programmers wear their climbing boots to work in case a mountain should suddenly spring up in the middle of the computer room. -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general
Re: [spi-devel-general] [PATCH v1 3/4] max3100: adds console support for MAX3100
On Mon, 29 Mar 2010 20:55:51 +0800 christian pellegrin chrip...@fsfe.org wrote: On Mon, Mar 29, 2010 at 9:06 AM, Feng Tang feng.t...@intel.com wrote: Our platform is Intel Moorestown platform, and use a spi controller core from Designware (drivers/spi/dw_*.c). I know the problem may probably be caused by my setting, but the dw_spi driver works fine with our own 3110 driver. I had a look at the dw_spi driver. The spi_transfer path queues some work to a worqueue that itself schedules a tasklet. I don't think this is good for latency, I won't bet that such an architecture could deliver good performance. Now I see why you needed to do only big fat SPI transfers. Anyway this doesn't justify the 2 seconds delay between Hi DW controller driver don't need max3110 driver to use big spi transfer, the early version of our max3110 is also one word per spi transfer mode, and the 2 drivers work fine, though the rx performance is not good (copy paste a bulk message to console). Let me use some example to explain why I use big spi transfer for 3110: When the HW works at 230400 bps, when we copy bulk data to the console, max3110 will probably receive about 20K characters, so the time for handling each char is 50us. If you use one char per spi transfer, it will have to execute 20K spi_transfer in one second, and each need be done in 50us, in this 50us you'll have to deal with controller IRQ handler + spi system overhead + tty receiving overhead. Is it a little over-killing to use one char per spi transfer? while the HW does have a 8 words RX FIFO chars coming in and going out through the max31x0 you are seeing. I will try to analyze what's going on. BTW is only input slow or output is sluggish too? The initial messages from the console are coming out fast? If you disable the MAX3110 for console but you use just as a normal terminal (set console=/dev/null in the kernel command line and add getty 115200 /dev/ttyMAX0 in iniitab) is the interaction with the system fine? Thanks for helping sorting out this. The output is not so slow as input, if we set the console=/dev/ttyMAX0, the output is sluggish, looks like below, but when enter command line console the output is smooth, while the input is very slow. -- kernel print log of max3100 over Moorestown platform -- 0.00] Moorestown CPU Lincroft identified [[..00 iiuu eessoo 44rr11((eeggffnn--77 ggccvvrriinn11((bbnnuu1144bbnnuu [[..00 BBOO--8800 1100--8800 uuaall)) ]] IISSee22::00 11 - 0f200 (usabl)) [ .000 BIOS-e822:: 022000110 00((eserved) [0.00] BIISSe820:: ee0 --0fec01000 (reseree)) ]] IISSee200 [[..00 BBOO--8800 ff00-- rrssrree)) ]]NNttcc::NN [[..00 8800uuddtt aagg::((ssbbee == rrssrree)) 00 00]]ee22 eeooeerrnnee 00aa--0011 uuaall)) ]]llss__ffxx 2200mmxxaacc__ffxx00 ]]MMRR eeaall yyee nnaahhbbee ]]MMRR iiee aa [[..00-- rrtt--aakk [[..00-- rrtt--aakk]] AABBuuaall ]]MMRR aaiibbeerrnnee nnbbee:: 1) the console code has to check if the serial port associated to the same physical max3100 is up (console driver start their life much before serial ones). 2) if yes send data to the tty associated to the serial driver. Locking is needed here. I will implement this ASAP. cool! -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general