[casper] BEE2 hanging

2010-01-29 Thread John Ford
Hi all.

We're working hard on cleaning up our 800 MHz Coherent Dedispersion pulsar
machine for production.  We have it working with 8 GPU machines, and from
64 to 2048 coarse channels.

One problem we have is that with our output FPGA that rearranges the data
and ships it off simultaneously over 4 10 GbE ports, sometimes sending an
arm() command (which tells the system to start on the next 1 PPS) locks up
the communication with that FPGA.

The arm command (python) just does 2 writes to the same register, first
sending a zero, then sending a one after sleeping for a second.

If we kill the program that's trying to write to the fpga, we can unload
the bof and reload it, it starts working again.  Then it will fail again
with an arm() at some random number of times later.

It seems to fail more often if we run the system at high speed.  Paul says
it doesn't fail at all at 200 MHz, instead of our usual 800 MHz ADC clock
rate.

Our previous design that is for the regular guppi modes does not do this.

Any ideas where to look for this?

Does trying to read or write a non-existent register make borph unhappy
enough to smite us?

Thanks for any insight.

John





Re: [casper] BEE2 hanging

2010-01-29 Thread Mark Wagner
Hi John,

Are you running this arm() command on the BEE2 or are you using a udp or tcp
server?  Does it write the value in ascii or binary mode?  BORPH has
occasionally acted strangely for us when we use ascii mode so we don't use
it anymore.

Mark

On Fri, Jan 29, 2010 at 1:23 PM, John Ford jf...@nrao.edu wrote:

 Hi all.

 We're working hard on cleaning up our 800 MHz Coherent Dedispersion pulsar
 machine for production.  We have it working with 8 GPU machines, and from
 64 to 2048 coarse channels.

 One problem we have is that with our output FPGA that rearranges the data
 and ships it off simultaneously over 4 10 GbE ports, sometimes sending an
 arm() command (which tells the system to start on the next 1 PPS) locks up
 the communication with that FPGA.

 The arm command (python) just does 2 writes to the same register, first
 sending a zero, then sending a one after sleeping for a second.

 If we kill the program that's trying to write to the fpga, we can unload
 the bof and reload it, it starts working again.  Then it will fail again
 with an arm() at some random number of times later.

 It seems to fail more often if we run the system at high speed.  Paul says
 it doesn't fail at all at 200 MHz, instead of our usual 800 MHz ADC clock
 rate.

 Our previous design that is for the regular guppi modes does not do this.

 Any ideas where to look for this?

 Does trying to read or write a non-existent register make borph unhappy
 enough to smite us?

 Thanks for any insight.

 John






Re: [casper] BEE2 hanging

2010-01-29 Thread John Ford
 Hi John,

 Are you running this arm() command on the BEE2 or are you using a udp or
 tcp
 server?

There is a server on the bee2 that receives the arm() command from a
client and then executes it locally on the control FPGA.

 Does it write the value in ascii or binary mode?

Don't know. will find out.

 BORPH has
 occasionally acted strangely for us when we use ascii mode so we don't use
 it anymore.

Good to know this.  By the way, this is all with version 7.1.

Thanks.

John


 Mark

 On Fri, Jan 29, 2010 at 1:23 PM, John Ford jf...@nrao.edu wrote:

 Hi all.

 We're working hard on cleaning up our 800 MHz Coherent Dedispersion
 pulsar
 machine for production.  We have it working with 8 GPU machines, and
 from
 64 to 2048 coarse channels.

 One problem we have is that with our output FPGA that rearranges the
 data
 and ships it off simultaneously over 4 10 GbE ports, sometimes sending
 an
 arm() command (which tells the system to start on the next 1 PPS) locks
 up
 the communication with that FPGA.

 The arm command (python) just does 2 writes to the same register, first
 sending a zero, then sending a one after sleeping for a second.

 If we kill the program that's trying to write to the fpga, we can unload
 the bof and reload it, it starts working again.  Then it will fail again
 with an arm() at some random number of times later.

 It seems to fail more often if we run the system at high speed.  Paul
 says
 it doesn't fail at all at 200 MHz, instead of our usual 800 MHz ADC
 clock
 rate.

 Our previous design that is for the regular guppi modes does not do
 this.

 Any ideas where to look for this?

 Does trying to read or write a non-existent register make borph unhappy
 enough to smite us?

 Thanks for any insight.

 John










[casper] ROACH-based pulsar machine?

2010-01-29 Thread Tom Kuiper
I'm trying to scope the hardware required for SERENDIP-type science 
piggy-backing on DSN down-link (passive, no transmitter) tracks.
As a baseline, I'm assuming one ROACH per antenna per activity.  
Possible activities would be:


   * searching for pulsars and transient pulses
   * SETI
   * kurtosis for electrostatic discharges (lightning)

For scoping the first task, is anyone working on a pulsar machine using 
one or more ROACH boards?How big a cluster of CPU/GPU units is 
reasonable for the real-time searching?


Has anyone looked at porting SETI to a ROACH?

Any suggestions for what else one might do with the unused bandwidth 
would be welcome.


Thanks and regards

Tom


Re: [casper] ROACH-based pulsar machine?

2010-01-29 Thread Dan Werthimer


hi tom,

there's a lot of current work in the areas you asked about:

terry filiba recently ported the ibob based pulsar instrumentation to roach,
(peter mcmahon and she developed this for parkes pulsar work).
jonathan kocz and mathew bailes are working on roach porting as well.
see peter's thesis and talk with terry and jonathan for more info.
each GPU can handle 100 to 200 MHz dual pol depending on whether
you are doing coherent dedispersion (timing), or spectroscopy (searching).
matthew and jonathan are the experts at reading data from ibob/roach  and
using CPU cluster to do pulsar/transient search.
john ford, paul demorest, scott ransom et al are the experts at using 
ibob/bee2

to packetize data (800 MHz dual pol) for GPU based pulsar cluster
(see their fantastic GUPPI instrument).

laura spitler, terry and mark wagner are working on porting setispec to 
roach.
terry is also working on a GPU seti instrument,  using roach or ibob to 
course channelize
data,  packetize it, and send to  CPU/GPU for fine spectral analysis, 
thresholding, etc.


andrew siemion and marin anderson have developed a kirtosis spectrometer 
for ibob
and bee2, modeled after kirtosis ibob spectrometer developed by zhiwei 
liu and dale gary.


best wishes,

dan

On 1/29/2010 3:02 PM, Tom Kuiper wrote:
I'm trying to scope the hardware required for SERENDIP-type science 
piggy-backing on DSN down-link (passive, no transmitter) tracks.
As a baseline, I'm assuming one ROACH per antenna per activity.  
Possible activities would be:


* searching for pulsars and transient pulses
* SETI
* kurtosis for electrostatic discharges (lightning)

For scoping the first task, is anyone working on a pulsar machine 
using one or more ROACH boards?How big a cluster of CPU/GPU units 
is reasonable for the real-time searching?


Has anyone looked at porting SETI to a ROACH?

Any suggestions for what else one might do with the unused bandwidth 
would be welcome.


Thanks and regards

Tom




Re: [casper] ROACH-based pulsar machine?

2010-01-29 Thread G Jones
Hi Tom,
One of the main bandwidth limitations in pulsar processing is the length of
the dedispersion chirp function, which goes down quadratically with
increasing frequency. Generally people split the band up into several ~4 MHz
channels and coherently dedisperse each one separately. Each of these
channels will have a very short chirp response, something like 50
microseconds at 8 GHz even for a high DM of 1000, so I'm pretty sure you're
going to be limited by I/O bandwidth rather than processing power.
You can run up to one GPU per processing core, but I don't have experience
myself with where the bottleneck would be.

Also keep in mind that timing pulsars may not be a good piggyback operation
since you need to dwell on the pulsar for a few minutes.

Glenn

On Fri, Jan 29, 2010 at 4:17 PM, Tom Kuiper kui...@jpl.nasa.gov wrote:

  Dan Werthimer wrote:

 each GPU can handle 100 to 200 MHz dual pol depending on whether
 you are doing coherent dedispersion (timing), or spectroscopy (searching).
 matthew and jonathan are the experts at reading data from ibob/roach  and
 using CPU cluster to do pulsar/transient search.
 john ford, paul demorest, scott ransom et al are the experts at using
 ibob/bee2
 to packetize data (800 MHz dual pol) for GPU based pulsar cluster
 (see their fantastic GUPPI instrument).

 We could have up to 1400 MHz at once, 8200-8600 and 31,500-32,500 MHz but I
 think only one polarization.  I saw that John Ford is using 8 GPUs for 800
 MHz.  Can you get several GPUs on the single bus of a multi-core host or
 does that cause too much of a bottle-neck?  I also should think about doing
 the various piggy-back tasks in parallel.  I'm guessing that setispec on a
 ROACH is a tight fit.  How about two?  The kurtosis is a very light task, I
 think, so can some of the left-over resources be used to expand the SETI
 bandwidth or refine the resolution?

 Anyway, for now it's some high-level wishing so I'll scope one unit at
 three dual-channel ADCs, three ROACHes, two 4 core hosts, and 8 GPUs. Does
 that seem reasonable?  About $40K? (We have to pay Xilinx :-( .)

 Thanks for your help

 Tom




Re: [casper] ROACH-based pulsar machine?

2010-01-29 Thread John Ford
 Dan Werthimer wrote:
 each GPU can handle 100 to 200 MHz dual pol depending on whether
 you are doing coherent dedispersion (timing), or spectroscopy
 (searching).
 matthew and jonathan are the experts at reading data from ibob/roach
 and
 using CPU cluster to do pulsar/transient search.
 john ford, paul demorest, scott ransom et al are the experts at using
 ibob/bee2
 to packetize data (800 MHz dual pol) for GPU based pulsar cluster
 (see their fantastic GUPPI instrument).
 We could have up to 1400 MHz at once, 8200-8600 and 31,500-32,500 MHz
 but I think only one polarization.  I saw that John Ford is using 8 GPUs
 for 800 MHz.  Can you get several GPUs on the single bus of a multi-core
 host or does that cause too much of a bottle-neck?  I also should think
 about doing the various piggy-back tasks in parallel.  I'm guessing that
 setispec on a ROACH is a tight fit.  How about two?  The kurtosis is a
 very light task, I think, so can some of the left-over resources be used
 to expand the SETI bandwidth or refine the resolution?

 Anyway, for now it's some high-level wishing so I'll scope one unit at
 three dual-channel ADCs, three ROACHes, two 4 core hosts, and 8 GPUs.
 Does that seem reasonable?  About $40K? (We have to pay Xilinx :-( .)

I think you'll run out of PCIe slots and/or bandwidth if you try to do it
in 2 hosts.  The 10 GbE cards need 8 lanes, and the GPUs need 16 lanes
each.  You'll need at least 2 10 GbE ports to service 4 GPUs.  That's 4
X16 slots and 2 X8 slots.  Paul Demorest spec'd out 8 hosts in our GPU
cluster due to the I/O requirements, both 10 Gbe and GPU's.  He may have
been a bit conservative, but beware!

My quick estimate says 45K or so assuming 4 hosts.

It might be nice if we could come up with some benchmarks that show how
much we can process with each GPU, how many GPUs and 10 GbE ports can be
supported per host, etc.

John

 Thanks for your help

 Tom







Re: [casper] ROACH-based pulsar machine?

2010-01-29 Thread Paul Demorest

On Fri, 29 Jan 2010, Tom Kuiper wrote:


Dan Werthimer wrote:

 each GPU can handle 100 to 200 MHz dual pol depending on whether
 you are doing coherent dedispersion (timing), or spectroscopy (searching).
 matthew and jonathan are the experts at reading data from ibob/roach  and
 using CPU cluster to do pulsar/transient search. john ford, paul demorest,
 scott ransom et al are the experts at using ibob/bee2
 to packetize data (800 MHz dual pol) for GPU based pulsar cluster
 (see their fantastic GUPPI instrument).
We could have up to 1400 MHz at once, 8200-8600 and 31,500-32,500 MHz but I 
think only one polarization.  I saw that John Ford is using 8 GPUs for 800 
MHz.  Can you get several GPUs on the single bus of a multi-core host or does 
that cause too much of a bottle-neck?  I also should think about doing the 
various piggy-back tasks in parallel.  I'm guessing that setispec on a ROACH 
is a tight fit.  How about two?  The kurtosis is a very light task, I think, 
so can some of the left-over resources be used to expand the SETI bandwidth 
or refine the resolution?


Anyway, for now it's some high-level wishing so I'll scope one unit at three 
dual-channel ADCs, three ROACHes, two 4 core hosts, and 8 GPUs. Does that 
seem reasonable?  About $40K? (We have to pay Xilinx :-( .)


Thanks for your help


Hi Tom,

couple thoughts about the pulsar applications:

If your only frequency options will be 8 and 31 GHz there's probably not 
too much point in doing coherent dedispersion.. unless you're interested 
in sub-us time resolution (like Glenn's giant pulse stuff).  We use it for 
timing pulsars, but at much lower freqs, generally 0.3-2.0 GHz.  You don't 
need coherent dedisp for pulsar searches.


You mentioned real-time searching with GPUs.  That could be an interesting 
application, but I don't have a good feeling for how much BW/card is 
possible in this case.  In standard psr searches we record fast-sampled 
spectra to disk (at 25-100 MB/s) then do the searching offline.


Also, most pulsars are pretty weak at 8 GHz, and extremely weak at 31 GHz. 
The typical spectral index is something like -1.8.


Hope this helps!

-Paul



Re: [casper] ROACH-based pulsar machine?

2010-01-29 Thread Scott Ransom
On Friday 29 January 2010 10:18:42 pm John Ford wrote:
  Dan Werthimer wrote:
  each GPU can handle 100 to 200 MHz dual pol depending on whether
  you are doing coherent dedispersion (timing), or spectroscopy
  (searching).
  matthew and jonathan are the experts at reading data from
  ibob/roach and
  using CPU cluster to do pulsar/transient search.
  john ford, paul demorest, scott ransom et al are the experts at
  using ibob/bee2
  to packetize data (800 MHz dual pol) for GPU based pulsar cluster
  (see their fantastic GUPPI instrument).
 
  We could have up to 1400 MHz at once, 8200-8600 and 31,500-32,500
  MHz but I think only one polarization.  I saw that John Ford is
  using 8 GPUs for 800 MHz.  Can you get several GPUs on the single
  bus of a multi-core host or does that cause too much of a
  bottle-neck?  I also should think about doing the various
  piggy-back tasks in parallel.  I'm guessing that setispec on a
  ROACH is a tight fit.  How about two?  The kurtosis is a very light
  task, I think, so can some of the left-over resources be used to
  expand the SETI bandwidth or refine the resolution?
 
  Anyway, for now it's some high-level wishing so I'll scope one unit
  at three dual-channel ADCs, three ROACHes, two 4 core hosts, and 8
  GPUs. Does that seem reasonable?  About $40K? (We have to pay
  Xilinx :-( .)
 
 I think you'll run out of PCIe slots and/or bandwidth if you try to
  do it in 2 hosts.  The 10 GbE cards need 8 lanes, and the GPUs need
  16 lanes each.  You'll need at least 2 10 GbE ports to service 4
  GPUs.  That's 4 X16 slots and 2 X8 slots.  Paul Demorest spec'd out
  8 hosts in our GPU cluster due to the I/O requirements, both 10 Gbe
  and GPU's.  He may have been a bit conservative, but beware!

I just finished a travel day from hell and was going to respond exactly 
to this point, but John beat me too it.  I think the real limitation 
with wideBW pulsar processing on CPUs/GPUs nowadays is the I/O.  So 
consider this email strong support of John's comments.

Scott

-- 
Scott M. RansomAddress:  NRAO
Phone:  (434) 296-0320   520 Edgemont Rd.
email:  sran...@nrao.edu Charlottesville, VA 22903 USA
GPG Fingerprint: 06A9 9553 78BE 16DB 407B  FFCA 9BFA B6FF FFD3 2989