Nicholas S-A wrote: >> >> I have not used it myself, but parallel transfer is potentially even >> simpler than serial, as it uses a basic FIFO interface. It looks like >> this: >> >> D0-D7 (inout): Data bus. >> TXE# (output): When low, data can be written to the FIFO. >> RXF# (output): When low, data can be read from the FIFO. >> WR# (input): Writes data to the FIFO on negedge. >> RD# (input): New data byte can be read on D0-D7 after negedge. >> Data output enabled while low. >> >> Pin descriptions and timing diagrams are on page 10 and 11, here: >> http://www.ftdichip.com/Documents/DataSheets/DS_FT245R_v105.pdf >> <http://www.ftdichip.com/Documents/DataSheets/DS_FT245R_v105.pdf> >> > > I suggested this earlier... > What about one of FTDI's USB - FIFO converters? (or USB-UART) > the FIFO ones have > "Data transfer rate to 1 Megabyte / second - D2XX Direct Drivers." > or > "Data transfer rate to 300 kilobyte / second - VCP Drivers." > I am not quite sure what D2XX or VCP means, but you can get D2XX for > Mac OS X, Linux, etc. > > http://www.ftdichip.com/Products/FT245R.htm > > at one megabyte per second, we can transfer all 256 megabytes in > approx 4 min, 16 seconds. > ... > There is also a so-called "USB to UART Switch" manufactured by > Dallas/Maxim Semiconductors > that can switch the two signals - this would be useful for having > both, but just USB would probably work. > > Did people recieve this? (I just wanted to make sure these are getting > through) > > > Anyway, I think that a FIFO interface is easier than a serial one. > Remember, however, that we have > a 4 MILLION gate FPGA with almost nothing on it, so we don't need an > insanely optimized download > system (in terms of latency in the FPGA, since any performance gains > are destroyed by the FIFO) > unless we go with the download-on-the-fly idea proposed by Daniel > (which I rather like - more later). > Because of this, it might be more practical to use the UART, since > most computers can get cheap > USB <-> RS232 adapters instead of an FTDI solution that is a pain to > connect up. If we are planning on > soldering these ahead of time for developers, I just assume go with > the FIFO (to save us programming time), > but if people are attaching themselves it might be easier to use UART. > > As for Daniel's idea: > Rather use a FIFO where the input is fed by 10x oversampled PCICLK and > output is read as fast as the transport link allows with given encoding > (transmiting changes only). It should work with a buffer for 10 samples > and you need at least half or less of the transfer rate of input, > because the signals somewhere MUST stay stable (otherwise we're talking > about 330MHz "noise" on all lines). These stable levels on input will > allow to clear the buffer, and they will occur once per PCICLK period. The signals are not all guaranteed to be stable on any given clock cycle -- read up on "stepping the PCI data/address lines". Effectively, the reflections of the signals on the bus are used to allow weaker bus drivers on the PCI device. In this case it takes a number of clock cycles for all the data/address lines to stabilize. During this multi-clock transition time the signals will be in indeterminant states which the capture may interpret as high or low. > > Essentially an infinite buffer (the hard drive) is on the > tracer machine, so we might as well store the signals there. <snip>
I doubt we can can compress even a mildly active PCI bus enough to transfer it over a 1Mb or similar serial link in real time. > It is a simple matter of sending data over the UART in "real time". The > signals are still coming in faster than we can shove them out but as > daniel said > there is no way that 330 Mhz noise is omnipresent on the bus (at > least, I should hope not...) No but there is potentially 33MBps of legitimate change happening on the bus under high load. It is pretty much impossible to compress this much data to send it real time to the trace machine. > > Thoughts? Thinking about it, perhaps we are going about this the wrong way. From what it sounds like, we have generally settled on some sort of serial UART interface. So, lets focus on getting a state capture implemented on the card first. We capture on the PCI bus clock just like a normal PCI card does. Lets start with a 32 bit 33MHz bus. That gives us 59 signals (if I counted correctly) to watch. We have 40 signals for interconnect. Take out eight for control signals and use 32 for data. If we transfer 96 bits between the two FPGAs per sample, we can have all 59 signals plus a 37 bit sample clock. This gives us room for (2^37)-1 samples or about an 1h 9m worth of wall clock time. We would run out of memory well before that. Assuming no compression, that would be 12TB of data. Now, take out the clock. Since we capture every clock cycle, the system doesn't need to track the clock count with the samples, it can be inferred. This drops us to 64 bits per PCI clock. Uncompressed we could store about 8.1sec worth of PCI traffic, but all the compression can be done on the big FPGA to allow for longer captures. We can certainly handle 66MHz transfers between the to FPGAs. As for storing the data in card memory, we can use a RLE encoding with a maximum run length of 32. This allows us to pack all 59 signals and the run length in a 64bit word. To further reduce transfer time to the display host, these can be compressed for transfer at transfer time. I would argue for a fixed block size Huffman encoding, mainly because it would be really easy to implement with even the most basic MPU core. For those concerned about searching back and forth through the capture, we can use a couple of the BRAMs to implement a lookup table that stores the memory address of key clock counts. If I did the math right, we could segment the memory up into 32k sample chunks with two BRAMs. From that point a moderately quick MPU core could find any sample within a second. Once we have the state capture logic working we can see what our limitations are and try working on timing capture. Since we won't be modifying the card, and any add ons will work in either mode, we can start with the simpler (but still useful) mode first. I would also suggest that we re-use anything from OpenCores that we possible can. For example, a quick search reveals a good MIPS I based CPU core with C compiler support, a DDR SDRAM controller, and a 16550 based UART. Add an interface to the capture logic and some memory address translation to tie it all together... Patrick M _______________________________________________ Open-graphics mailing list [email protected] http://lists.duskglow.com/mailman/listinfo/open-graphics List service provided by Duskglow Consulting, LLC (www.duskglow.com)
