Hi,
Am 04.02.2013 09:22, schrieb Laurent Gauch:
> Le 04.02.2013 04:20, Peter Henn a écrit :
>> Hi Laurent,
>>
>> Am 02.02.2013 17:23, schrieb Laurent Gauch:
>>> so for me I cannot see any trouble on the use of the send immediat 87
>>> command regarding and the full-speed and high-speed device.
>>> We have used it a lot as with our Amontec JTAGkey SVF Player or vector
>>> player....
>>> We never see any bad effect using this command.
>>>
>> That is an important information, because that means I can in principle
>> emulate the FT2232D chip simply by running a FT2232H on a USB1.1 hub and
>> trace it with the LeCroy.
>>
>> Yesterday I made some first tests with an LPC-3131 ARM base SoC. The
>> data read tests were really impressive using the ftdi driver and the
>> FT2232H based dongle:
1st here an updated
MEASUREMENT TABLE ARM READ MEMORY PERFORMANCE:
JTAG clock ! 6 MHz ! 3 MHz ! 2 MHz ! 1 MHz
------------------+-------------+------------+------------+-----------
ftdi --- USB2.0 !*145 KiB/s ! 82.9 KiB/s ! 56.2 KiB/s ! 30.3 KiB/s
ftdi 2ms USB2.0 ! 130.5 KiB/s ! 76 KiB/s ! 54.5 KiB/s | 30.0 KiB/s
ftdi --- USB1.1 ! *59 KiB/s ! 43 KiB/s ! 56.0 KiB/s ! 30.2 KiB/s
ftdi 2ms USB1.1 ! *58 KiB/s ! 59 KiB/s ! 52.9 KiB/s ! 29.6 KiB/s
ft2232 --- USB2.0 ! 126.4 KiB/s ! 76.3 KiB/s ! 55.2 KiB/s ! 29.8 KiB/s
ft2232 0ms USB2.0 ! 114 KiB/s ! 72.2 KiB/s ! 53.2 KiB/s ! 29.4 KiB/s
ft2232 1ms USB2.0 ! 93.3 KiB/s ! 48.4 KiB/s ! 47.2 KiB/s ! 25.4 KiB/s
ft2232 2ms USB2.0 ! 47.9 KiB/s ! 47.7 KiB/s ! 46.3 KiB/s ! 24.1 KiB/s
ft2232 --- USB1.1 ! *30 KiB/s ! 37 KiB/s !*45.4 KiB/s ! 25.1 KiB/s
ft2232 0ms USB1.1 ! *46.4 KiB/s ! 40.9 KiB/s !*32.0 KiB/s ! 20.1 KiB/s
ft2232 1ms USB1.1 ! 46.1 KiB/s ! 46.4 KiB/s ! 36 KiB/s ! 23.1 KiB/s
ft2232 2ms USB1.1 ! *30 KiB/s ! 28 KiB/s ! 26.1 KiB/s ! 19.2 KiB/s
Tested with:
- OpenOCD based on 0.6.1
- Host: Fujitsu S7220 Notebook with Debian Squeeze 64bit
- Target: Olimex LPC-H3131
- Action: reading out internal ROM.
- Calculation: average of three measured test runs
Notes:
If the maximum deviation against the average was > 1, I let the position
after the decimal point empty.
I have done some LeCroy USB-traces at those memory read throughput
values marked with "*".
A patched libusb allow me to setup also latency timer values of 0 for
the FT2232H chip.
A patched ftdi driver without using "send immediate command" and latency
timeout of 2ms allows me measurements marked with "ftdi 2ms USBx.x"
comparing against the standard ft2232 driver. This patch can be found in
the bottom of this email.
An ARM-JTAG memory read results in a long continuous JTAG read data
stream, because the internal CPU registers of the ARM are used as a
buffer and no special acknowledge synchronization is required compared
to a MIPS-JTAG memory read in PRACC mode. Therefore finally a "send
immediate command", which works as a JTAG read data flush of the target,
cannot increase the read performance for ARM CPU in the same way as it
can be seen for MIPS CPU running in JTAG PRACC mode.
EXPLANATION 1ST COLUMN:
The 1st column shows the driver type, latency time and USB bus type. If
the "send immediate command" is used, the latency time is marked with
"---", although naturally a default latency value has been setup by the
software.
ftdi --- : means new FTDI driver as is will be build by OpenOCD-0.6.1
ftdi 2ms : means FTDI driver _without_ "send immediate command" and with
additional latency time setup to 2ms instead the default 255ms
ft2232 --- : means ft2232 driver + ft2232 flush before read patch
and using the default 2ms latency timer value
ANALYZING RESULTS:
Interesting is the fact that both ftdi and ft2232 drivers, which used
the "send immediate command", show with USB1.1 and a JTAG clock rate of
2MHz a better memory read throughput compared to a 3MHz JTAG clock.
That is related to the timing interference between USB transaction
cycles and the JTAG clock rate.
FTDI DRIVER:
The LeCroy traces show that the new ftdi driver no longer uses one
buffer for USB-OUT data and then the same buffer for USB-IN data. Here
USB-OUT and USB-IN transactions are interleaved seen on USB bus,
although naturally the USB-IN transactions were very often NAKed (= Not
Acknowledge be the USB device). Running USB-IN and OUT in different
threads seems here the real throughput gain and not an inserted "send
immediate command".
Measurement standard ftdi driver with USB1.1 / USB2.0:
USB-IN NAK of 9us is alternated by USB-OUT transaction of 53us. A
successful USB-IN requires 55us. Looking onto the other JTAG clock
speeds and USB-2.0, you see here again USB-IN-(NAK) and USB-OUT
transactions alternated on the USB bus.
Optimizing could be done by reducing the number USB-IN NAK to improve
the bus bandwidth for the USB-OUT. But that would require USB
isochronous or interrupt transfer, which seems not to be supported by
the FT2232H chip for these endpoints.
FT2232 DRIVER:
Looking on the LeCroy traces of the patched ft2232 drivers, which uses
"send immediate command", it doesn't look like the FTDI2232H chip
process this command in every scenario at the required, right time. This
fact may be interesting for speeding up the ftdi driver especially for
USB1.1, although I think using USB2.0 is anyway the better way to
increase the performance. Using ft2232 driver with USB2.0 I haven't seen
that "don't react on send immediate command" effect. But USB2.0 runs
also with 512 Byte maximum Bulk-In transfer size compare to 64 Byte max
size of USB1.1, which means: One USB-IN package at USB2.0 speed can
transfer the whole ARM memory read data chunk per out-in cycle.
Just for the records: In general there are different effects, which have
an impact on memory read performance:
a) ft2232 unpatch driver, 6MHz J-TCK, 2ms Latency, USB1.1:
four 64byte USB-IN transactions are send immediately, but the 5th
USB-IN short package (<64 byte) was delayed by several USB-IN-NAK
transactions.
b) ft2232 "send immediate" patch, 6MHz J-TCK, 2ms Latency, USB1.1:
The USB transactions order looked basically the same as above! That
means: although the "send immediate command" was always seen in the
last USB-OUT package before the USB-IN transactions, the 5th USB-IN
short package was *not* transferred immediately.
c) ft2232 unpatched original driver, 6MHz J-TCK, 0ms Latency, USB1.1:
whole USB-IN data were send immediately. Also the last USB-IN is send
as a short package transfer immediately. But it requires additional
675us before the next data transfer - an USB-OUT transaction -
was possible.
Using slower JTAG clock rates also immediate send out USB-IN short
packages can be seen mostly beginning with the 2nd USB-IN
transaction.
d) ft2232 "send immediate" patch, 2MHz J-TCK, 2ms Latency, USB1.1:
whole USB-IN data transfer was interrupted by two short USB-IN-NAK
transactions. The last, 5th USB-IN transaction was send immediately
without additional delay. Both short USB-IN-NAK transaction delayed
together the data input by round about 500us.
So here the inserted "send immediate command" has an effect.
By the way I have investigated in that "add ft2232 flush before read"
patch at a time the ftdi driver was not yet available and because of the
really poor memory read performance of my MIPS target. So using a
cheap parallel port dongle you get better memory read throughput on MIPS
CPUs in JTAG PRACC mode than using the original ft2232 driver with USB2.0!
Today you can work with the ftdi driver, which has anyway the better
performance. Therefore there is no need for that "add ft2232 flush
before read" patch. So just take my findings as a feedback to this great
OpenOCD project.
Looking on MIPS memory read I found here with all drivers repeated
cycles of two separated USB-OUT transactions followed by one USB-IN
transaction. That seems to be related to the "upper layers" and is
something for a different mail thread to increase MIPS read performance
in JTAG PRACC mode.
>>> But in an other way, the patch touch the buffer by using buffer_write
>>> inside in the send_and_recv function.
>> Yepp, and therefore I took especially double care on that, including a
>> analyzing tour what happen inside the used libftdi!
>> Buffer_write is called several times in ft2232.c to setup the JTAG
>> commands and then finally handing over to the libftdi, which does an
>> additional copy of the whole data message again from RAM to RAM,
>> including the MPSSE command setup. This copying is really crazy and now
>> in the ftdi/libusb solution much better done.
>>
>>> This could generate issue if the buffer was full before the
>>> send_and_receiv.
>> Nop, that is hide by "buffer_write" function. Looking at line 414 in
>> ft2232.c gives you an answer:
>>
>> static inline void buffer_write(uint8_t val)
>> {
>> assert(ft2232_buffer);
>> assert((unsigned) ft2232_buffer_size< (unsigned) FT2232_BUFFER_SIZE);
>> ft2232_buffer[ft2232_buffer_size++] = val;
>> }
>> Or did you see a crash message during your tests?
> Yes, but this is an assert !!!
> adding the send immediat command should be done before calling the
> send_reicv anyway.
Yepp, the assert results in a program abort, which naturally should
never occurs for a final solution; here just tested with a 32 byte
buffer size instead of the default 128kByte:
openocd: ft2232.c:417: buffer_write: Assertion `(unsigned)
ft2232_buffer_size < (unsigned) 32' failed.
Aborted
So you are right, looking into the ft2232.c code we need to add one
spare byte in the buffer for the "send immediate command 0x87" in
several checks, which like to prevent an overflow for the USB output
buffer of FT2232_BUFFER_SIZE = 128 kbyte, in the case, when a long
output stream may be directly followed by an USB input OR we need to
check prior inserting the "0x87" byte, if enough free space is available
to prevent the above assertion.
> But as you say, the ft2232.c should be reworked and it should call the
> mpsse.c sub layer.
I think that driver rework is already done with the ftdi driver.
>>>>>>
>>>>>> Here are my measurements:
>>>>>> 1. I used an Core-2 Notebook and did the measurement running Debian
>>>>>> Lenny i386 and Debian Squeeze AMD64.
>>>>>> 2. The AR7161 JTAG speed was 10MHz in case of the USB FT2232H based
>>>>>> dongle and 0.5MHz in case of the Parport Wiggler dongle. Higher clock
>>>>>> rates result sometimes in incorrect CPU behaviour.
>>>>>> 3. I guess you see with your ARM system much higher throughput. However
>>>>>> I measured only read performance, because write was dramatically faster
>>>>>> on the MIPS system:
>>>>>> - 0.1 KiB/s with the original ft2232 driver and Olimex ARM-USB-OCD-H
>>>>>> - 0.5 KiB/s with the flush patch included in the ft2232 driver
>>>>>> - 0.5 KiB/s with a parport driver
>>>>>> - 5.0 KiB/s with a ftdi driver including all newest MIPS performance
>>>>>> patches using the Olimex ARM-USB-OCD-H
>>>>>> 4. Sometimes I measured a higher read performance, but the read data
>>>>>> were incorrect. The above reads are naturally related to correct data
>>>>>> read.
>>>>>>
...
>>>>>> Omit the FTDI flush command just for test purposes, because with FT2232D
>>>>>> the read throughput may be decreased. But note that this patch is still
>>>>>> untested.
Here the updated and used patch for my measurements:
--- a/src/jtag/drivers/mpsse.c
+++ b/src/jtag/drivers/mpsse.c
@@ -305,7 +305,7 @@
}
err = libusb_control_transfer(ctx->usb_dev, FTDI_DEVICE_OUT_REQTYPE,
- SIO_SET_LATENCY_TIMER_REQUEST, 255, ctx->index,
NULL, 0,
+ SIO_SET_LATENCY_TIMER_REQUEST, 2, ctx->index,
NULL, 0,
ctx->usb_write_timeout);
if (err < 0) {
LOG_ERROR("unable to set latency timer: %d", err);
@@ -777,7 +777,7 @@
struct libusb_transfer *read_transfer = 0;
struct transfer_result read_result = { .ctx = ctx, .done = true };
if (ctx->read_count) {
- buffer_write_byte(ctx, 0x87); /* SEND_IMMEDIATE */
+// buffer_write_byte(ctx, 0x87); /* SEND_IMMEDIATE */
read_result.done = false;
read_transfer = libusb_alloc_transfer(0);
libusb_fill_bulk_transfer(read_transfer, ctx->usb_dev,
ctx->in_ep, ctx->read_chunk,
>>>>>>
>>>>>>
>>>>>> Another way might be checking how often polling a JTAG status register
>>>>>> miss with/without the flush (patched) included, expecting that an ARM
>>>>>> system requires as well polling a status register in read mode. That
>>>>>> could give us a hint, if the decreased read performance is related to an
>>>>>> interference problem of the polling cycle time with the JTAG instruction
>>>>>> cycles. But when you measure higher read throughput using a different or
>>>>>> slower JTAG clock rates may give us also an indication here.
>>>>>>
>>>>>> Thanks and Kind Regards,
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> Am 31.01.2013 13:43, schrieb Spencer Oliver (Code Review):
>>>>>>> Spencer Oliver has posted comments on this change.
>>>>>>>
>>>>>>> Change subject: ft2232 flush before read
>>>>>>> ......................................................................
>>>>>>>
>>>>>>>
>>>>>>> Patch Set 1: I would prefer that you didn't submit this
>>>>>>>
>>>>>>> (1 inline comment)
>>>>>>>
>>>>>>> I have done some basic testing using a FT2232 and a FT2232H on a stm32
>>>>>>> board.
>>>>>>>
>>>>>>> A increase of about 4KiB/s was seen during a read test using the H
>>>>>>> adapter, however the older chip (non H) seemed to be slightly slower
>>>>>>> 3KiB/s. Did not see any change in write speed.
>>>>>>>
>>>>>>> Perhaps we need to check the type and use flush only for the newer H
>>>>>>> parts?
>>>>>>>
>>>>>>> The other issue is do we update this driver and add potential
>>>>>>> regressions when it is as such deprecated. Maybe needs discussing on
>>>>>>> the ml.
>>>>>>>
>>>>>>> ....................................................
>>>>>>> File src/jtag/drivers/ft2232.c
>>>>>>> Line 810: LOG_DEBUG("Send Immediate buffer to PC");
>>>>>>> can you remove any debug messages as they can make it a bit to noisy.
>>>>>>> if they are needed then use a _DEBUG_USB_IO_ block
------------------------------------------------------------------------------
The Go Parallel Website, sponsored by Intel - in partnership with Geeknet,
is your hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials, tech docs,
whitepapers, evaluation guides, and opinion stories. Check out the most
recent posts - join the conversation now. http://goparallel.sourceforge.net/
_______________________________________________
OpenOCD-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openocd-devel