[casper] BEE2 hanging
Hi all. We're working hard on cleaning up our 800 MHz Coherent Dedispersion pulsar machine for production. We have it working with 8 GPU machines, and from 64 to 2048 coarse channels. One problem we have is that with our output FPGA that rearranges the data and ships it off simultaneously over 4 10 GbE ports, sometimes sending an arm() command (which tells the system to start on the next 1 PPS) locks up the communication with that FPGA. The arm command (python) just does 2 writes to the same register, first sending a zero, then sending a one after sleeping for a second. If we kill the program that's trying to write to the fpga, we can unload the bof and reload it, it starts working again. Then it will fail again with an arm() at some random number of times later. It seems to fail more often if we run the system at high speed. Paul says it doesn't fail at all at 200 MHz, instead of our usual 800 MHz ADC clock rate. Our previous design that is for the regular guppi modes does not do this. Any ideas where to look for this? Does trying to read or write a non-existent register make borph unhappy enough to smite us? Thanks for any insight. John
Re: [casper] BEE2 hanging
Hi John, Are you running this arm() command on the BEE2 or are you using a udp or tcp server? Does it write the value in ascii or binary mode? BORPH has occasionally acted strangely for us when we use ascii mode so we don't use it anymore. Mark On Fri, Jan 29, 2010 at 1:23 PM, John Ford jf...@nrao.edu wrote: Hi all. We're working hard on cleaning up our 800 MHz Coherent Dedispersion pulsar machine for production. We have it working with 8 GPU machines, and from 64 to 2048 coarse channels. One problem we have is that with our output FPGA that rearranges the data and ships it off simultaneously over 4 10 GbE ports, sometimes sending an arm() command (which tells the system to start on the next 1 PPS) locks up the communication with that FPGA. The arm command (python) just does 2 writes to the same register, first sending a zero, then sending a one after sleeping for a second. If we kill the program that's trying to write to the fpga, we can unload the bof and reload it, it starts working again. Then it will fail again with an arm() at some random number of times later. It seems to fail more often if we run the system at high speed. Paul says it doesn't fail at all at 200 MHz, instead of our usual 800 MHz ADC clock rate. Our previous design that is for the regular guppi modes does not do this. Any ideas where to look for this? Does trying to read or write a non-existent register make borph unhappy enough to smite us? Thanks for any insight. John
Re: [casper] BEE2 hanging
Hi John, Are you running this arm() command on the BEE2 or are you using a udp or tcp server? There is a server on the bee2 that receives the arm() command from a client and then executes it locally on the control FPGA. Does it write the value in ascii or binary mode? Don't know. will find out. BORPH has occasionally acted strangely for us when we use ascii mode so we don't use it anymore. Good to know this. By the way, this is all with version 7.1. Thanks. John Mark On Fri, Jan 29, 2010 at 1:23 PM, John Ford jf...@nrao.edu wrote: Hi all. We're working hard on cleaning up our 800 MHz Coherent Dedispersion pulsar machine for production. We have it working with 8 GPU machines, and from 64 to 2048 coarse channels. One problem we have is that with our output FPGA that rearranges the data and ships it off simultaneously over 4 10 GbE ports, sometimes sending an arm() command (which tells the system to start on the next 1 PPS) locks up the communication with that FPGA. The arm command (python) just does 2 writes to the same register, first sending a zero, then sending a one after sleeping for a second. If we kill the program that's trying to write to the fpga, we can unload the bof and reload it, it starts working again. Then it will fail again with an arm() at some random number of times later. It seems to fail more often if we run the system at high speed. Paul says it doesn't fail at all at 200 MHz, instead of our usual 800 MHz ADC clock rate. Our previous design that is for the regular guppi modes does not do this. Any ideas where to look for this? Does trying to read or write a non-existent register make borph unhappy enough to smite us? Thanks for any insight. John
[casper] ROACH-based pulsar machine?
I'm trying to scope the hardware required for SERENDIP-type science piggy-backing on DSN down-link (passive, no transmitter) tracks. As a baseline, I'm assuming one ROACH per antenna per activity. Possible activities would be: * searching for pulsars and transient pulses * SETI * kurtosis for electrostatic discharges (lightning) For scoping the first task, is anyone working on a pulsar machine using one or more ROACH boards?How big a cluster of CPU/GPU units is reasonable for the real-time searching? Has anyone looked at porting SETI to a ROACH? Any suggestions for what else one might do with the unused bandwidth would be welcome. Thanks and regards Tom
Re: [casper] ROACH-based pulsar machine?
hi tom, there's a lot of current work in the areas you asked about: terry filiba recently ported the ibob based pulsar instrumentation to roach, (peter mcmahon and she developed this for parkes pulsar work). jonathan kocz and mathew bailes are working on roach porting as well. see peter's thesis and talk with terry and jonathan for more info. each GPU can handle 100 to 200 MHz dual pol depending on whether you are doing coherent dedispersion (timing), or spectroscopy (searching). matthew and jonathan are the experts at reading data from ibob/roach and using CPU cluster to do pulsar/transient search. john ford, paul demorest, scott ransom et al are the experts at using ibob/bee2 to packetize data (800 MHz dual pol) for GPU based pulsar cluster (see their fantastic GUPPI instrument). laura spitler, terry and mark wagner are working on porting setispec to roach. terry is also working on a GPU seti instrument, using roach or ibob to course channelize data, packetize it, and send to CPU/GPU for fine spectral analysis, thresholding, etc. andrew siemion and marin anderson have developed a kirtosis spectrometer for ibob and bee2, modeled after kirtosis ibob spectrometer developed by zhiwei liu and dale gary. best wishes, dan On 1/29/2010 3:02 PM, Tom Kuiper wrote: I'm trying to scope the hardware required for SERENDIP-type science piggy-backing on DSN down-link (passive, no transmitter) tracks. As a baseline, I'm assuming one ROACH per antenna per activity. Possible activities would be: * searching for pulsars and transient pulses * SETI * kurtosis for electrostatic discharges (lightning) For scoping the first task, is anyone working on a pulsar machine using one or more ROACH boards?How big a cluster of CPU/GPU units is reasonable for the real-time searching? Has anyone looked at porting SETI to a ROACH? Any suggestions for what else one might do with the unused bandwidth would be welcome. Thanks and regards Tom
Re: [casper] ROACH-based pulsar machine?
Hi Tom, One of the main bandwidth limitations in pulsar processing is the length of the dedispersion chirp function, which goes down quadratically with increasing frequency. Generally people split the band up into several ~4 MHz channels and coherently dedisperse each one separately. Each of these channels will have a very short chirp response, something like 50 microseconds at 8 GHz even for a high DM of 1000, so I'm pretty sure you're going to be limited by I/O bandwidth rather than processing power. You can run up to one GPU per processing core, but I don't have experience myself with where the bottleneck would be. Also keep in mind that timing pulsars may not be a good piggyback operation since you need to dwell on the pulsar for a few minutes. Glenn On Fri, Jan 29, 2010 at 4:17 PM, Tom Kuiper kui...@jpl.nasa.gov wrote: Dan Werthimer wrote: each GPU can handle 100 to 200 MHz dual pol depending on whether you are doing coherent dedispersion (timing), or spectroscopy (searching). matthew and jonathan are the experts at reading data from ibob/roach and using CPU cluster to do pulsar/transient search. john ford, paul demorest, scott ransom et al are the experts at using ibob/bee2 to packetize data (800 MHz dual pol) for GPU based pulsar cluster (see their fantastic GUPPI instrument). We could have up to 1400 MHz at once, 8200-8600 and 31,500-32,500 MHz but I think only one polarization. I saw that John Ford is using 8 GPUs for 800 MHz. Can you get several GPUs on the single bus of a multi-core host or does that cause too much of a bottle-neck? I also should think about doing the various piggy-back tasks in parallel. I'm guessing that setispec on a ROACH is a tight fit. How about two? The kurtosis is a very light task, I think, so can some of the left-over resources be used to expand the SETI bandwidth or refine the resolution? Anyway, for now it's some high-level wishing so I'll scope one unit at three dual-channel ADCs, three ROACHes, two 4 core hosts, and 8 GPUs. Does that seem reasonable? About $40K? (We have to pay Xilinx :-( .) Thanks for your help Tom
Re: [casper] ROACH-based pulsar machine?
Dan Werthimer wrote: each GPU can handle 100 to 200 MHz dual pol depending on whether you are doing coherent dedispersion (timing), or spectroscopy (searching). matthew and jonathan are the experts at reading data from ibob/roach and using CPU cluster to do pulsar/transient search. john ford, paul demorest, scott ransom et al are the experts at using ibob/bee2 to packetize data (800 MHz dual pol) for GPU based pulsar cluster (see their fantastic GUPPI instrument). We could have up to 1400 MHz at once, 8200-8600 and 31,500-32,500 MHz but I think only one polarization. I saw that John Ford is using 8 GPUs for 800 MHz. Can you get several GPUs on the single bus of a multi-core host or does that cause too much of a bottle-neck? I also should think about doing the various piggy-back tasks in parallel. I'm guessing that setispec on a ROACH is a tight fit. How about two? The kurtosis is a very light task, I think, so can some of the left-over resources be used to expand the SETI bandwidth or refine the resolution? Anyway, for now it's some high-level wishing so I'll scope one unit at three dual-channel ADCs, three ROACHes, two 4 core hosts, and 8 GPUs. Does that seem reasonable? About $40K? (We have to pay Xilinx :-( .) I think you'll run out of PCIe slots and/or bandwidth if you try to do it in 2 hosts. The 10 GbE cards need 8 lanes, and the GPUs need 16 lanes each. You'll need at least 2 10 GbE ports to service 4 GPUs. That's 4 X16 slots and 2 X8 slots. Paul Demorest spec'd out 8 hosts in our GPU cluster due to the I/O requirements, both 10 Gbe and GPU's. He may have been a bit conservative, but beware! My quick estimate says 45K or so assuming 4 hosts. It might be nice if we could come up with some benchmarks that show how much we can process with each GPU, how many GPUs and 10 GbE ports can be supported per host, etc. John Thanks for your help Tom
Re: [casper] ROACH-based pulsar machine?
On Fri, 29 Jan 2010, Tom Kuiper wrote: Dan Werthimer wrote: each GPU can handle 100 to 200 MHz dual pol depending on whether you are doing coherent dedispersion (timing), or spectroscopy (searching). matthew and jonathan are the experts at reading data from ibob/roach and using CPU cluster to do pulsar/transient search. john ford, paul demorest, scott ransom et al are the experts at using ibob/bee2 to packetize data (800 MHz dual pol) for GPU based pulsar cluster (see their fantastic GUPPI instrument). We could have up to 1400 MHz at once, 8200-8600 and 31,500-32,500 MHz but I think only one polarization. I saw that John Ford is using 8 GPUs for 800 MHz. Can you get several GPUs on the single bus of a multi-core host or does that cause too much of a bottle-neck? I also should think about doing the various piggy-back tasks in parallel. I'm guessing that setispec on a ROACH is a tight fit. How about two? The kurtosis is a very light task, I think, so can some of the left-over resources be used to expand the SETI bandwidth or refine the resolution? Anyway, for now it's some high-level wishing so I'll scope one unit at three dual-channel ADCs, three ROACHes, two 4 core hosts, and 8 GPUs. Does that seem reasonable? About $40K? (We have to pay Xilinx :-( .) Thanks for your help Hi Tom, couple thoughts about the pulsar applications: If your only frequency options will be 8 and 31 GHz there's probably not too much point in doing coherent dedispersion.. unless you're interested in sub-us time resolution (like Glenn's giant pulse stuff). We use it for timing pulsars, but at much lower freqs, generally 0.3-2.0 GHz. You don't need coherent dedisp for pulsar searches. You mentioned real-time searching with GPUs. That could be an interesting application, but I don't have a good feeling for how much BW/card is possible in this case. In standard psr searches we record fast-sampled spectra to disk (at 25-100 MB/s) then do the searching offline. Also, most pulsars are pretty weak at 8 GHz, and extremely weak at 31 GHz. The typical spectral index is something like -1.8. Hope this helps! -Paul
Re: [casper] ROACH-based pulsar machine?
On Friday 29 January 2010 10:18:42 pm John Ford wrote: Dan Werthimer wrote: each GPU can handle 100 to 200 MHz dual pol depending on whether you are doing coherent dedispersion (timing), or spectroscopy (searching). matthew and jonathan are the experts at reading data from ibob/roach and using CPU cluster to do pulsar/transient search. john ford, paul demorest, scott ransom et al are the experts at using ibob/bee2 to packetize data (800 MHz dual pol) for GPU based pulsar cluster (see their fantastic GUPPI instrument). We could have up to 1400 MHz at once, 8200-8600 and 31,500-32,500 MHz but I think only one polarization. I saw that John Ford is using 8 GPUs for 800 MHz. Can you get several GPUs on the single bus of a multi-core host or does that cause too much of a bottle-neck? I also should think about doing the various piggy-back tasks in parallel. I'm guessing that setispec on a ROACH is a tight fit. How about two? The kurtosis is a very light task, I think, so can some of the left-over resources be used to expand the SETI bandwidth or refine the resolution? Anyway, for now it's some high-level wishing so I'll scope one unit at three dual-channel ADCs, three ROACHes, two 4 core hosts, and 8 GPUs. Does that seem reasonable? About $40K? (We have to pay Xilinx :-( .) I think you'll run out of PCIe slots and/or bandwidth if you try to do it in 2 hosts. The 10 GbE cards need 8 lanes, and the GPUs need 16 lanes each. You'll need at least 2 10 GbE ports to service 4 GPUs. That's 4 X16 slots and 2 X8 slots. Paul Demorest spec'd out 8 hosts in our GPU cluster due to the I/O requirements, both 10 Gbe and GPU's. He may have been a bit conservative, but beware! I just finished a travel day from hell and was going to respond exactly to this point, but John beat me too it. I think the real limitation with wideBW pulsar processing on CPUs/GPUs nowadays is the I/O. So consider this email strong support of John's comments. Scott -- Scott M. RansomAddress: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sran...@nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989