Re: [casper] about boffile download using tut3.py

2014-10-27 Thread Marc Welz
On Sat, Oct 25, 2014 at 2:33 PM, Wang Jinqing jqw...@shao.ac.cn wrote:

 For tut1 I can use telnet to roach2,the using the command like

 nc -w 2 -q 2 192.168.111.10   name.bof

 to download the bof file.But tut3.py looks not in that way. What should I
 do  ?


I think try using the same approach as for tut 1


  By the way,is there a linux system in roach2?


Yes, there are flash chips soldered onto the roach. They contain several
partitions, and one of them is a writable filesystem


 For I even can't find a SD card on the board.

 error information:

 192.168.40.60: ?progdev tut3_2014_Oct_24_0848.bof



 192.168.40.60: #log info 992952462866 raw attempting\_to\_empty\_fpga

 192.168.40.60: #log info 992952462866 raw
 attempting\_to\_program\_tut3_2014_Oct_24_0848.bof

 192.168.40.60: #log error 992952462867 raw
 unable\_to\_open\_boffile\_./tut3_2014_Oct_24_0848.bof:\_No\_such\_file\_or\_directory


Progdev requires a file on the local filesystem - if it hasn't been
transferred/uploaded to it previously, then this won't be found. Use the
upload* requests to transfer the bof file on to the roach

regards

marc


Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley
Just a note that I don't recommend you adjust FPGA clock frequencies while it's 
operating. In theory, you should do a global reset in case the PLL/DLLs lose 
lock during clock transitions, in which case the logic could be in a uncertain 
state. But the Sysgen flow just does a single POR. 

A better solution might be to keep the 10GbE cores turned off (enable line 
pulled low) on initialisation, until things are configured (tgtap started etc), 
and only then enable the transmission using a SW register.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:

 Hi Richard,Joe, all,
 Thanks for your help,It finally can receive packets now!
 As you point,After enabled the ADC card and run bof file(./adc_init.rb roach1 
 bof file)in 200 Mhz (or higher than it), We need run init fengien script in 
 about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow the packet 
 transfer.  then we can turn the frequency  higher.However the finally ADC 
 clock frequency is up to 120 Mhz in my experiment.Our final ADC frequency 
 standard is 250 Mhz. Maybe I need run the bof file in a higher ADC frequency 
 first to make a final steady 250 Mhz ADC clock frequncy.
 Why it need init in a lower frequency and turn it up? That didn't make 
 sense.Is the hardware going wrong?As the yellow block adc16*250-8 is designed 
 for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the final 
 frequency in your experiment? 
 Any reply will be helpful!
 Best Regards!
 peter
 
 
 
 
 
 
 At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com wrote:
 Peter,
 
 That's correct. We downloaded the FPGA firmware and programmed the ROACH with 
 the precompiled bitstream. When we didn't get any data beyond that single 
 packet, we stuck some overflow status registers in the model and found that 
 we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
 
 We have actually found a way to get packets to flow, but it isn't a good fix. 
 When we turn the ADC clock frequency down to about 75 MHz, the packets begin 
 to flow. There is an opinion in our group that the 10-GbE buffer overflow is 
 a transient behavior, and, hence, if we slowly turn up the clock frequency 
 after the ROACH has started up, packets may continue to flow in steady-state 
 operation. We haven't tested this yet, though.
 
 Richard Black
 
 On Thu, Oct 23, 2014 at 8:39 PM, peter peterniu...@163.com wrote:
 Hi Richard, All,
 As you said the size of isolate packet is changing every time. ) :
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
 10:10:55.622053 IP 10.10.2.1.8511  10.10.2.9.8511: UDP, length 4616
 Ddi you download the PAPER gateware on the casper  
 (https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest ) directly? How 
 about the PAPER bof file run on your system? Have you met overflow before?I 
 download and install  PAPER model as the website says ,but the overflow shows 
 when I run the paper_feng_netstat.rb.
 Thanks for your information.
 peter
 
 
 
 
 
 At 2014-10-24 09:59:12, Richard Black aeldstes...@gmail.com wrote:
 Peter,
 
 I don't mean to hijack your thread, but we've been having a very similar (and 
 time-absorbing) issue with the PAPER f-engine FPGA firmware here at BYU. Out 
 of curiosity, does this single packet that you're receiving in tcpdump change 
 in size every time you reprogram the ROACH? We've seen this happen, and we're 
 pretty sure that this isolated packet is the 10-GbE buffer flushing when the 
 10-GbE core is initialized (i.e. the enable signal isn't sync'd with the 
 start of new packet).
 
 Regardless of whether we have the same issue, I'm very interested to see this 
 problem's resolution.
 
 Good luck,
 
 Richard Black
 
 On Thu, Oct 23, 2014 at 7:50 PM, peter peterniu...@163.com wrote:
 Hi Joe,  All,
 I find a thing this morning , there is one packet send out from roach When I 
 run PAPER model, which I got from HPC tcpdump:
 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
 listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
 09:04:02.757813 IP 10.10.2.1.8511  10.10.2.9.8511: UDP, length 6456
 
 The lenght is not expected 8200+8 ,and far from full TX buffer size 
 8K+512.And the other packets are stopped from overflow.
 I have tried to change the tutorial 2 packet size to 8200 bytes and 8K +512 
 bytes. It is  a good transfer.I also make sure the boundary size is indeed 
 8K+512 ,because while I change size to 8K+513 byetes ,There is no data 
 send.So the received packet this morning with length 6456  is totally under 
 the limit.But what caused the other packets  in overflow? 
 Any suggestions could be helpful !
 peter
 
 
 
 
 
 
 At 2014-10-24 00:37:14, Kujawski, Joseph jkujaw...@siena.edu wrote:
 Peter,
 
 By cadence of the broadcast, I mean how often are the 8200 byte packets 

[casper] OS for development: Ubuntu 14.04?

2014-10-27 Thread Schoenwald, Adam (GSFC-5640)[GSFC - HIGHER EDUCATION]
Hi everyone,
I'm just getting started with a roach2 board and am about to 
set up a development station. So far I have been using windows 7 and 
encountering issues. A new computer will be coming in soon, but has Ubuntu 
14.04 LTS on it. Has anyone had any issues with this? Should I be planning to 
wipe it and install 12.04, or are the compatibility issues minimal and easily 
resolved?

My plan was to just follow the instructions at 
https://casper.berkeley.edu/wiki/MSSGE_Setup_with_Xilinx_14.x_and_Matlab_2012b 
but then I saw there were some problems with 13.03 
(http://www.mail-archive.com/casper%40lists.berkeley.edu/msg04260.html).

Any input here would be helpful,

Thanks,
Adam Schoenwald


Re: [casper] OS for development: Ubuntu 14.04?

2014-10-27 Thread Jack Hickish
Hi Adam,

I'm using Ubuntu 14.04 and things seem to work as they should, as long
as you follow the instructions on that wiki page. Though not used by
the toolflow, Vivado 2014.3 officially supports ubuntu 14.04, if
that's a concern to you.

Having said that, I think if I were to go through the setup process
again, on a machine that wasn't my everyday desktop, I'd probably go
for one of the free RedHat like distros like CentOS, just to try and
minimize unforseen headaches (I've occasionally had ubuntu updates
break things, but no more seriously than going back and redoing some
of the steps in the wiki).

Good luck!
Jack


On 27 October 2014 14:17, Schoenwald, Adam (GSFC-5640)[GSFC -  HIGHER
EDUCATION] adam.schoenw...@nasa.gov wrote:
 Hi everyone,

 I’m just getting started with a roach2 board and am about to
 set up a development station. So far I have been using windows 7 and
 encountering issues. A new computer will be coming in soon, but has Ubuntu
 14.04 LTS on it. Has anyone had any issues with this? Should I be planning
 to wipe it and install 12.04, or are the compatibility issues minimal and
 easily resolved?



 My plan was to just follow the instructions at
 https://casper.berkeley.edu/wiki/MSSGE_Setup_with_Xilinx_14.x_and_Matlab_2012b
 but then I saw there were some problems with 13.03
 (http://www.mail-archive.com/casper%40lists.berkeley.edu/msg04260.html).



 Any input here would be helpful,



 Thanks,

 Adam Schoenwald



Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
Jason,

Thanks for your comments. While I agree that changing the ADC frequency
mid-operation is non-kosher and could result in uncertain behavior, the
issue at hand for us is to figure out what is going on with the PAPER model
that has been published on the CASPER wiki. This naturally won't be (and
shouldn't be) the end-all solution to this problem.

This is a reportedly fully-functional model that shouldn't require any
major changes in order to operate. However, this has clearly not been the
case in at least two independent situations (us and Peter). This begs the
question: what's so different about our use of PAPER?

We, at BYU, have made painstakingly sure that our IP addressing schemes,
switch ports, and scripts are all configured correctly (thanks to David
MacMahon for that, btw), but we still have hit the proverbial brick wall of
10-GbE overflow.  When I last corresponded with David, he explained that he
remembers having a similar issue before, but can't recall exactly what the
problem was.

In any case, the fact that by turning down the ADC clock prior to start-up
prevents the 10-GbE core from overflowing is a major lead for us at BYU
(we've been spinning our wheels on this issue for several months now). By
no means are we proposing mid-run ADC clock modifications, but this appears
to be a very subtle (and quite sinister, in my opinion) bug.

Any thoughts as to what might be going on?

Richard Black

On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za wrote:

 Just a note that I don't recommend you adjust FPGA clock frequencies while
 it's operating. In theory, you should do a global reset in case the
 PLL/DLLs lose lock during clock transitions, in which case the logic could
 be in a uncertain state. But the Sysgen flow just does a single POR.

 A better solution might be to keep the 10GbE cores turned off (enable line
 pulled low) on initialisation, until things are configured (tgtap started
 etc), and only then enable the transmission using a SW register.

 Jason Manley
 CBF Manager
 SKA-SA

 Cell: +27 82 662 7726
 Work: +27 21 506 7300

 On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:

  Hi Richard,Joe, all,
  Thanks for your help,It finally can receive packets now!
  As you point,After enabled the ADC card and run bof file(./adc_init.rb
 roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien
 script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow
 the packet transfer.  then we can turn the frequency  higher.However the
 finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC
 frequency standard is 250 Mhz. Maybe I need run the bof file in a higher
 ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
  Why it need init in a lower frequency and turn it up? That didn't make
 sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
 designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
 final frequency in your experiment?
  Any reply will be helpful!
  Best Regards!
  peter
 
 
 
 
 
 
  At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com wrote:
  Peter,
 
  That's correct. We downloaded the FPGA firmware and programmed the ROACH
 with the precompiled bitstream. When we didn't get any data beyond that
 single packet, we stuck some overflow status registers in the model and
 found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
 
  We have actually found a way to get packets to flow, but it isn't a good
 fix. When we turn the ADC clock frequency down to about 75 MHz, the packets
 begin to flow. There is an opinion in our group that the 10-GbE buffer
 overflow is a transient behavior, and, hence, if we slowly turn up the
 clock frequency after the ROACH has started up, packets may continue to
 flow in steady-state operation. We haven't tested this yet, though.
 
  Richard Black
 
  On Thu, Oct 23, 2014 at 8:39 PM, peter peterniu...@163.com wrote:
  Hi Richard, All,
  As you said the size of isolate packet is changing every time. ) :
  tcpdump: verbose output suppressed, use -v or -vv for full protocol
 decode
  listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
  10:10:55.622053 IP 10.10.2.1.8511  10.10.2.9.8511: UDP, length 4616
  Ddi you download the PAPER gateware on the casper  (
 https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest ) directly?
 How about the PAPER bof file run on your system? Have you met overflow
 before?I download and install  PAPER model as the website says ,but the
 overflow shows when I run the paper_feng_netstat.rb.
  Thanks for your information.
  peter
 
 
 
 
 
  At 2014-10-24 09:59:12, Richard Black aeldstes...@gmail.com wrote:
  Peter,
 
  I don't mean to hijack your thread, but we've been having a very similar
 (and time-absorbing) issue with the PAPER f-engine FPGA firmware here at
 BYU. Out of curiosity, does this single packet that you're receiving in
 tcpdump change in size every 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley
I suspect the 10GbE core's input FIFO is overflowing on startup. One key thing 
with this core is to the ensure that your design keeps the enable port held low 
until the core's been configured. The core becomes unusable once the TX FIFO 
overflows. This has been a long-standing bug (my emails trace back to 2009) but 
it's so easy to work around that I don't think anyone's bothered looking into 
fixing it.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:

 Jason,
 
 Thanks for your comments. While I agree that changing the ADC frequency 
 mid-operation is non-kosher and could result in uncertain behavior, the issue 
 at hand for us is to figure out what is going on with the PAPER model that 
 has been published on the CASPER wiki. This naturally won't be (and shouldn't 
 be) the end-all solution to this problem.
 
 This is a reportedly fully-functional model that shouldn't require any major 
 changes in order to operate. However, this has clearly not been the case in 
 at least two independent situations (us and Peter). This begs the question: 
 what's so different about our use of PAPER?
 
 We, at BYU, have made painstakingly sure that our IP addressing schemes, 
 switch ports, and scripts are all configured correctly (thanks to David 
 MacMahon for that, btw), but we still have hit the proverbial brick wall of 
 10-GbE overflow.  When I last corresponded with David, he explained that he 
 remembers having a similar issue before, but can't recall exactly what the 
 problem was.
 
 In any case, the fact that by turning down the ADC clock prior to start-up 
 prevents the 10-GbE core from overflowing is a major lead for us at BYU 
 (we've been spinning our wheels on this issue for several months now). By no 
 means are we proposing mid-run ADC clock modifications, but this appears to 
 be a very subtle (and quite sinister, in my opinion) bug.
 
 Any thoughts as to what might be going on?
 
 Richard Black
 
 On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za wrote:
 Just a note that I don't recommend you adjust FPGA clock frequencies while 
 it's operating. In theory, you should do a global reset in case the PLL/DLLs 
 lose lock during clock transitions, in which case the logic could be in a 
 uncertain state. But the Sysgen flow just does a single POR.
 
 A better solution might be to keep the 10GbE cores turned off (enable line 
 pulled low) on initialisation, until things are configured (tgtap started 
 etc), and only then enable the transmission using a SW register.
 
 Jason Manley
 CBF Manager
 SKA-SA
 
 Cell: +27 82 662 7726
 Work: +27 21 506 7300
 
 On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:
 
  Hi Richard,Joe, all,
  Thanks for your help,It finally can receive packets now!
  As you point,After enabled the ADC card and run bof file(./adc_init.rb 
  roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien 
  script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow 
  the packet transfer.  then we can turn the frequency  higher.However the 
  finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC 
  frequency standard is 250 Mhz. Maybe I need run the bof file in a higher 
  ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
  Why it need init in a lower frequency and turn it up? That didn't make 
  sense.Is the hardware going wrong?As the yellow block adc16*250-8 is 
  designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the 
  final frequency in your experiment?
  Any reply will be helpful!
  Best Regards!
  peter
 
 
 
 
 
 
  At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com wrote:
  Peter,
 
  That's correct. We downloaded the FPGA firmware and programmed the ROACH 
  with the precompiled bitstream. When we didn't get any data beyond that 
  single packet, we stuck some overflow status registers in the model and 
  found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
 
  We have actually found a way to get packets to flow, but it isn't a good 
  fix. When we turn the ADC clock frequency down to about 75 MHz, the packets 
  begin to flow. There is an opinion in our group that the 10-GbE buffer 
  overflow is a transient behavior, and, hence, if we slowly turn up the 
  clock frequency after the ROACH has started up, packets may continue to 
  flow in steady-state operation. We haven't tested this yet, though.
 
  Richard Black
 
  On Thu, Oct 23, 2014 at 8:39 PM, peter peterniu...@163.com wrote:
  Hi Richard, All,
  As you said the size of isolate packet is changing every time. ) :
  tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  listening on px1-2, link-type EN10MB (Ethernet), capture size 65535 bytes
  10:10:55.622053 IP 10.10.2.1.8511  10.10.2.9.8511: UDP, length 4616
  Ddi you download the PAPER gateware on the casper  
  

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
By enable port, I assume you mean the valid port. I've been looking at
the PAPER model carefully for some time now, and that is how it operates.
It has a gated valid signal with a software register on each 10-GbE core.

Once again, this is not our model. This is one made available on the CASPER
wiki and run without modifications.

Richard Black

On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za wrote:

 I suspect the 10GbE core's input FIFO is overflowing on startup. One key
 thing with this core is to the ensure that your design keeps the enable
 port held low until the core's been configured. The core becomes unusable
 once the TX FIFO overflows. This has been a long-standing bug (my emails
 trace back to 2009) but it's so easy to work around that I don't think
 anyone's bothered looking into fixing it.

 Jason Manley
 CBF Manager
 SKA-SA

 Cell: +27 82 662 7726
 Work: +27 21 506 7300

 On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:

  Jason,
 
  Thanks for your comments. While I agree that changing the ADC frequency
 mid-operation is non-kosher and could result in uncertain behavior, the
 issue at hand for us is to figure out what is going on with the PAPER model
 that has been published on the CASPER wiki. This naturally won't be (and
 shouldn't be) the end-all solution to this problem.
 
  This is a reportedly fully-functional model that shouldn't require any
 major changes in order to operate. However, this has clearly not been the
 case in at least two independent situations (us and Peter). This begs the
 question: what's so different about our use of PAPER?
 
  We, at BYU, have made painstakingly sure that our IP addressing schemes,
 switch ports, and scripts are all configured correctly (thanks to David
 MacMahon for that, btw), but we still have hit the proverbial brick wall of
 10-GbE overflow.  When I last corresponded with David, he explained that he
 remembers having a similar issue before, but can't recall exactly what the
 problem was.
 
  In any case, the fact that by turning down the ADC clock prior to
 start-up prevents the 10-GbE core from overflowing is a major lead for us
 at BYU (we've been spinning our wheels on this issue for several months
 now). By no means are we proposing mid-run ADC clock modifications, but
 this appears to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za wrote:
  Just a note that I don't recommend you adjust FPGA clock frequencies
 while it's operating. In theory, you should do a global reset in case the
 PLL/DLLs lose lock during clock transitions, in which case the logic could
 be in a uncertain state. But the Sysgen flow just does a single POR.
 
  A better solution might be to keep the 10GbE cores turned off (enable
 line pulled low) on initialisation, until things are configured (tgtap
 started etc), and only then enable the transmission using a SW register.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:
 
   Hi Richard,Joe, all,
   Thanks for your help,It finally can receive packets now!
   As you point,After enabled the ADC card and run bof file(./adc_init.rb
 roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien
 script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow
 the packet transfer.  then we can turn the frequency  higher.However the
 finally ADC clock frequency is up to 120 Mhz in my experiment.Our final ADC
 frequency standard is 250 Mhz. Maybe I need run the bof file in a higher
 ADC frequency first to make a final steady 250 Mhz ADC clock frequncy.
   Why it need init in a lower frequency and turn it up? That didn't make
 sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
 designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
 final frequency in your experiment?
   Any reply will be helpful!
   Best Regards!
   peter
  
  
  
  
  
  
   At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com wrote:
   Peter,
  
   That's correct. We downloaded the FPGA firmware and programmed the
 ROACH with the precompiled bitstream. When we didn't get any data beyond
 that single packet, we stuck some overflow status registers in the model
 and found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
  
   We have actually found a way to get packets to flow, but it isn't a
 good fix. When we turn the ADC clock frequency down to about 75 MHz, the
 packets begin to flow. There is an opinion in our group that the 10-GbE
 buffer overflow is a transient behavior, and, hence, if we slowly turn up
 the clock frequency after the ROACH has started up, packets may continue to
 flow in steady-state operation. We haven't tested this yet, though.
  
   Richard Black
  
   On Thu, Oct 23, 2014 at 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jason Manley
Yep, ok, so whoever did it (Dave?) already knows about this issue and has dealt 
with it. So scratch that idea then! Only other thing to check is to make sure 
you don't actually toggle that software register until the core is configured.

Jason Manley
CBF Manager
SKA-SA

Cell: +27 82 662 7726
Work: +27 21 506 7300

On 27 Oct 2014, at 18:38, Richard Black aeldstes...@gmail.com wrote:

 By enable port, I assume you mean the valid port. I've been looking at 
 the PAPER model carefully for some time now, and that is how it operates. It 
 has a gated valid signal with a software register on each 10-GbE core.
 
 Once again, this is not our model. This is one made available on the CASPER 
 wiki and run without modifications.
 
 Richard Black
 
 On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za wrote:
 I suspect the 10GbE core's input FIFO is overflowing on startup. One key 
 thing with this core is to the ensure that your design keeps the enable port 
 held low until the core's been configured. The core becomes unusable once the 
 TX FIFO overflows. This has been a long-standing bug (my emails trace back to 
 2009) but it's so easy to work around that I don't think anyone's bothered 
 looking into fixing it.
 
 Jason Manley
 CBF Manager
 SKA-SA
 
 Cell: +27 82 662 7726
 Work: +27 21 506 7300
 
 On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:
 
  Jason,
 
  Thanks for your comments. While I agree that changing the ADC frequency 
  mid-operation is non-kosher and could result in uncertain behavior, the 
  issue at hand for us is to figure out what is going on with the PAPER model 
  that has been published on the CASPER wiki. This naturally won't be (and 
  shouldn't be) the end-all solution to this problem.
 
  This is a reportedly fully-functional model that shouldn't require any 
  major changes in order to operate. However, this has clearly not been the 
  case in at least two independent situations (us and Peter). This begs the 
  question: what's so different about our use of PAPER?
 
  We, at BYU, have made painstakingly sure that our IP addressing schemes, 
  switch ports, and scripts are all configured correctly (thanks to David 
  MacMahon for that, btw), but we still have hit the proverbial brick wall of 
  10-GbE overflow.  When I last corresponded with David, he explained that he 
  remembers having a similar issue before, but can't recall exactly what the 
  problem was.
 
  In any case, the fact that by turning down the ADC clock prior to start-up 
  prevents the 10-GbE core from overflowing is a major lead for us at BYU 
  (we've been spinning our wheels on this issue for several months now). By 
  no means are we proposing mid-run ADC clock modifications, but this appears 
  to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za wrote:
  Just a note that I don't recommend you adjust FPGA clock frequencies while 
  it's operating. In theory, you should do a global reset in case the 
  PLL/DLLs lose lock during clock transitions, in which case the logic could 
  be in a uncertain state. But the Sysgen flow just does a single POR.
 
  A better solution might be to keep the 10GbE cores turned off (enable line 
  pulled low) on initialisation, until things are configured (tgtap started 
  etc), and only then enable the transmission using a SW register.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:
 
   Hi Richard,Joe, all,
   Thanks for your help,It finally can receive packets now!
   As you point,After enabled the ADC card and run bof file(./adc_init.rb 
   roach1 bof file)in 200 Mhz (or higher than it), We need run init fengien 
   script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 ) ,That will allow 
   the packet transfer.  then we can turn the frequency  higher.However the 
   finally ADC clock frequency is up to 120 Mhz in my experiment.Our final 
   ADC frequency standard is 250 Mhz. Maybe I need run the bof file in a 
   higher ADC frequency first to make a final steady 250 Mhz ADC clock 
   frequncy.
   Why it need init in a lower frequency and turn it up? That didn't make 
   sense.Is the hardware going wrong?As the yellow block adc16*250-8 is 
   designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the 
   final frequency in your experiment?
   Any reply will be helpful!
   Best Regards!
   peter
  
  
  
  
  
  
   At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com wrote:
   Peter,
  
   That's correct. We downloaded the FPGA firmware and programmed the ROACH 
   with the precompiled bitstream. When we didn't get any data beyond that 
   single packet, we stuck some overflow status registers in the model and 
   found that we were overflowing at 1025 64-bit words (i.e. 8200 bytes).
 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
Jason,

Fair point. One of our guys is currently trying to get ChipScope configured
to make sure all our control signals are correct. We'll definitely look at
that signal too. Hopefully that will finally put this issue to rest.

Thanks for the tip,

Richard Black

On Mon, Oct 27, 2014 at 10:47 AM, Jason Manley jman...@ska.ac.za wrote:

 Yep, ok, so whoever did it (Dave?) already knows about this issue and has
 dealt with it. So scratch that idea then! Only other thing to check is to
 make sure you don't actually toggle that software register until the core
 is configured.

 Jason Manley
 CBF Manager
 SKA-SA

 Cell: +27 82 662 7726
 Work: +27 21 506 7300

 On 27 Oct 2014, at 18:38, Richard Black aeldstes...@gmail.com wrote:

  By enable port, I assume you mean the valid port. I've been looking
 at the PAPER model carefully for some time now, and that is how it
 operates. It has a gated valid signal with a software register on each
 10-GbE core.
 
  Once again, this is not our model. This is one made available on the
 CASPER wiki and run without modifications.
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za
 wrote:
  I suspect the 10GbE core's input FIFO is overflowing on startup. One key
 thing with this core is to the ensure that your design keeps the enable
 port held low until the core's been configured. The core becomes unusable
 once the TX FIFO overflows. This has been a long-standing bug (my emails
 trace back to 2009) but it's so easy to work around that I don't think
 anyone's bothered looking into fixing it.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:
 
   Jason,
  
   Thanks for your comments. While I agree that changing the ADC
 frequency mid-operation is non-kosher and could result in uncertain
 behavior, the issue at hand for us is to figure out what is going on with
 the PAPER model that has been published on the CASPER wiki. This naturally
 won't be (and shouldn't be) the end-all solution to this problem.
  
   This is a reportedly fully-functional model that shouldn't require any
 major changes in order to operate. However, this has clearly not been the
 case in at least two independent situations (us and Peter). This begs the
 question: what's so different about our use of PAPER?
  
   We, at BYU, have made painstakingly sure that our IP addressing
 schemes, switch ports, and scripts are all configured correctly (thanks to
 David MacMahon for that, btw), but we still have hit the proverbial brick
 wall of 10-GbE overflow.  When I last corresponded with David, he explained
 that he remembers having a similar issue before, but can't recall exactly
 what the problem was.
  
   In any case, the fact that by turning down the ADC clock prior to
 start-up prevents the 10-GbE core from overflowing is a major lead for us
 at BYU (we've been spinning our wheels on this issue for several months
 now). By no means are we proposing mid-run ADC clock modifications, but
 this appears to be a very subtle (and quite sinister, in my opinion) bug.
  
   Any thoughts as to what might be going on?
  
   Richard Black
  
   On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za
 wrote:
   Just a note that I don't recommend you adjust FPGA clock frequencies
 while it's operating. In theory, you should do a global reset in case the
 PLL/DLLs lose lock during clock transitions, in which case the logic could
 be in a uncertain state. But the Sysgen flow just does a single POR.
  
   A better solution might be to keep the 10GbE cores turned off (enable
 line pulled low) on initialisation, until things are configured (tgtap
 started etc), and only then enable the transmission using a SW register.
  
   Jason Manley
   CBF Manager
   SKA-SA
  
   Cell: +27 82 662 7726
   Work: +27 21 506 7300
  
   On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:
  
Hi Richard,Joe, all,
Thanks for your help,It finally can receive packets now!
As you point,After enabled the ADC card and run bof
 file(./adc_init.rb roach1 bof file)in 200 Mhz (or higher than it), We need
 run init fengien script in about 75 Mhz ,(./paper_feng_init.rb roach1:0 )
 ,That will allow the packet transfer.  then we can turn the frequency
 higher.However the finally ADC clock frequency is up to 120 Mhz in my
 experiment.Our final ADC frequency standard is 250 Mhz. Maybe I need run
 the bof file in a higher ADC frequency first to make a final steady 250 Mhz
 ADC clock frequncy.
Why it need init in a lower frequency and turn it up? That didn't
 make sense.Is the hardware going wrong?As the yellow block adc16*250-8 is
 designed for 250 Mhz, it should be ok for 200Mhz or 250 Mhz.How about the
 final frequency in your experiment?
Any reply will be helpful!
Best Regards!
peter
   
   
   
   
   
   
At 2014-10-25 00:36:52, Richard Black aeldstes...@gmail.com
 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jack Hickish
Hi Richard,

I've just had a very brief look at the design / software, so take this
email with a pinch of salt, but on the off-chance you haven't checked
this

It looks like the PAPER F-engine setup on running the start script for
software / firmware out of the box is --

1. Disable all ethernet interfaces
2. Arm sync generator, wait 1 second for PPS
3. Reset ethernet interfaces
4. Enable interfaces.

These four steps seem like they should be safe, yet the behaviour
you're describing sounds like the design is midway sending a packet,
then gets a sync, gives up sending an end-of-frame and starts sending
a new packet, at which point the old packet + the new packet =
overflow.

Knowing that the design works for paper, my wondering is whether after
arming the sync generator syncs are flowing through the design before
the ethernet interface is enabled. Do you have a PPS-like input? the
fengine initialisation script seems to wait for a second after arming,
but if your sync input is something significantly slower, you could
have problems.

I'm sceptical about this theory (I think the symptoms would be lots of
OK packets when you brought up the interface, and then it dying when
the sync arrives, rather than a single good packet like you're
seeing), but if the firmware + software really is the same as that
working with paper, and the wiki hasn't just got out of sync with the
paper devs, perhaps the problem is in your hardware setup

Cheers,
Jack

On 27 October 2014 16:38, Richard Black aeldstes...@gmail.com wrote:
 By enable port, I assume you mean the valid port. I've been looking at
 the PAPER model carefully for some time now, and that is how it operates. It
 has a gated valid signal with a software register on each 10-GbE core.

 Once again, this is not our model. This is one made available on the CASPER
 wiki and run without modifications.

 Richard Black

 On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za wrote:

 I suspect the 10GbE core's input FIFO is overflowing on startup. One key
 thing with this core is to the ensure that your design keeps the enable port
 held low until the core's been configured. The core becomes unusable once
 the TX FIFO overflows. This has been a long-standing bug (my emails trace
 back to 2009) but it's so easy to work around that I don't think anyone's
 bothered looking into fixing it.

 Jason Manley
 CBF Manager
 SKA-SA

 Cell: +27 82 662 7726
 Work: +27 21 506 7300

 On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:

  Jason,
 
  Thanks for your comments. While I agree that changing the ADC frequency
  mid-operation is non-kosher and could result in uncertain behavior, the
  issue at hand for us is to figure out what is going on with the PAPER model
  that has been published on the CASPER wiki. This naturally won't be (and
  shouldn't be) the end-all solution to this problem.
 
  This is a reportedly fully-functional model that shouldn't require any
  major changes in order to operate. However, this has clearly not been the
  case in at least two independent situations (us and Peter). This begs the
  question: what's so different about our use of PAPER?
 
  We, at BYU, have made painstakingly sure that our IP addressing schemes,
  switch ports, and scripts are all configured correctly (thanks to David
  MacMahon for that, btw), but we still have hit the proverbial brick wall of
  10-GbE overflow.  When I last corresponded with David, he explained that he
  remembers having a similar issue before, but can't recall exactly what the
  problem was.
 
  In any case, the fact that by turning down the ADC clock prior to
  start-up prevents the 10-GbE core from overflowing is a major lead for us 
  at
  BYU (we've been spinning our wheels on this issue for several months now).
  By no means are we proposing mid-run ADC clock modifications, but this
  appears to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za wrote:
  Just a note that I don't recommend you adjust FPGA clock frequencies
  while it's operating. In theory, you should do a global reset in case the
  PLL/DLLs lose lock during clock transitions, in which case the logic could
  be in a uncertain state. But the Sysgen flow just does a single POR.
 
  A better solution might be to keep the 10GbE cores turned off (enable
  line pulled low) on initialisation, until things are configured (tgtap
  started etc), and only then enable the transmission using a SW register.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 25 Oct 2014, at 10:34, peter peterniu...@163.com wrote:
 
   Hi Richard,Joe, all,
   Thanks for your help,It finally can receive packets now!
   As you point,After enabled the ADC card and run bof file(./adc_init.rb
   roach1 bof file)in 200 Mhz (or higher than it), We need run init 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
Jack,

I appreciate your help. I tend to agree that the issue is likely a hardware
configuration problem, but we have been trying to match it as closely as
possible.

We do feed a 1-PPS signal into the board, but I'm hazy on the details of
the other pulse parameters. I'll look into that as well.

So, if I understand you correctly, you believe that the sync pulse is
reaching the ethernet interfaces *after* the cores are enabled? If that is
the case, couldn't we delay enabling the 10-GbE cores for another second to
fix it? This might be a quick way to test that theory, but please correct
me if I've misunderstood.

Richard Black

On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish jackhick...@gmail.com
wrote:

 Hi Richard,

 I've just had a very brief look at the design / software, so take this
 email with a pinch of salt, but on the off-chance you haven't checked
 this

 It looks like the PAPER F-engine setup on running the start script for
 software / firmware out of the box is --

 1. Disable all ethernet interfaces
 2. Arm sync generator, wait 1 second for PPS
 3. Reset ethernet interfaces
 4. Enable interfaces.

 These four steps seem like they should be safe, yet the behaviour
 you're describing sounds like the design is midway sending a packet,
 then gets a sync, gives up sending an end-of-frame and starts sending
 a new packet, at which point the old packet + the new packet =
 overflow.

 Knowing that the design works for paper, my wondering is whether after
 arming the sync generator syncs are flowing through the design before
 the ethernet interface is enabled. Do you have a PPS-like input? the
 fengine initialisation script seems to wait for a second after arming,
 but if your sync input is something significantly slower, you could
 have problems.

 I'm sceptical about this theory (I think the symptoms would be lots of
 OK packets when you brought up the interface, and then it dying when
 the sync arrives, rather than a single good packet like you're
 seeing), but if the firmware + software really is the same as that
 working with paper, and the wiki hasn't just got out of sync with the
 paper devs, perhaps the problem is in your hardware setup

 Cheers,
 Jack

 On 27 October 2014 16:38, Richard Black aeldstes...@gmail.com wrote:
  By enable port, I assume you mean the valid port. I've been looking
 at
  the PAPER model carefully for some time now, and that is how it
 operates. It
  has a gated valid signal with a software register on each 10-GbE core.
 
  Once again, this is not our model. This is one made available on the
 CASPER
  wiki and run without modifications.
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za
 wrote:
 
  I suspect the 10GbE core's input FIFO is overflowing on startup. One key
  thing with this core is to the ensure that your design keeps the enable
 port
  held low until the core's been configured. The core becomes unusable
 once
  the TX FIFO overflows. This has been a long-standing bug (my emails
 trace
  back to 2009) but it's so easy to work around that I don't think
 anyone's
  bothered looking into fixing it.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:
 
   Jason,
  
   Thanks for your comments. While I agree that changing the ADC
 frequency
   mid-operation is non-kosher and could result in uncertain behavior,
 the
   issue at hand for us is to figure out what is going on with the PAPER
 model
   that has been published on the CASPER wiki. This naturally won't be
 (and
   shouldn't be) the end-all solution to this problem.
  
   This is a reportedly fully-functional model that shouldn't require any
   major changes in order to operate. However, this has clearly not been
 the
   case in at least two independent situations (us and Peter). This begs
 the
   question: what's so different about our use of PAPER?
  
   We, at BYU, have made painstakingly sure that our IP addressing
 schemes,
   switch ports, and scripts are all configured correctly (thanks to
 David
   MacMahon for that, btw), but we still have hit the proverbial brick
 wall of
   10-GbE overflow.  When I last corresponded with David, he explained
 that he
   remembers having a similar issue before, but can't recall exactly
 what the
   problem was.
  
   In any case, the fact that by turning down the ADC clock prior to
   start-up prevents the 10-GbE core from overflowing is a major lead
 for us at
   BYU (we've been spinning our wheels on this issue for several months
 now).
   By no means are we proposing mid-run ADC clock modifications, but this
   appears to be a very subtle (and quite sinister, in my opinion) bug.
  
   Any thoughts as to what might be going on?
  
   Richard Black
  
   On Mon, Oct 27, 2014 at 2:41 AM, Jason Manley jman...@ska.ac.za
 wrote:
   Just a note that I don't recommend you adjust FPGA clock frequencies
   while it's 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Jack Hickish
Hi Richard,

That's my theory, though I doubt it's right. But as you say, an easy
test is just to delay after issuing a sync for a couple more seconds
and see if that helps. But if your PPS is a real PPS (rather than just
a square wave at some vague 1s period) then I can't see what
difference this would make.
When that doesn't help, my inclination would be to start prodding the
10gbe control signals from software to make sure the reset / sw
enables are working / see if a tge reset without a new sync behaves
differently. But I can't imagine how that would be broken unless the
stuff on github is out of date (which I doubt).

Jack

On 27 October 2014 17:28, Richard Black aeldstes...@gmail.com wrote:
 Jack,

 I appreciate your help. I tend to agree that the issue is likely a hardware
 configuration problem, but we have been trying to match it as closely as
 possible.

 We do feed a 1-PPS signal into the board, but I'm hazy on the details of the
 other pulse parameters. I'll look into that as well.

 So, if I understand you correctly, you believe that the sync pulse is
 reaching the ethernet interfaces after the cores are enabled? If that is the
 case, couldn't we delay enabling the 10-GbE cores for another second to fix
 it? This might be a quick way to test that theory, but please correct me if
 I've misunderstood.

 Richard Black

 On Mon, Oct 27, 2014 at 11:05 AM, Jack Hickish jackhick...@gmail.com
 wrote:

 Hi Richard,

 I've just had a very brief look at the design / software, so take this
 email with a pinch of salt, but on the off-chance you haven't checked
 this

 It looks like the PAPER F-engine setup on running the start script for
 software / firmware out of the box is --

 1. Disable all ethernet interfaces
 2. Arm sync generator, wait 1 second for PPS
 3. Reset ethernet interfaces
 4. Enable interfaces.

 These four steps seem like they should be safe, yet the behaviour
 you're describing sounds like the design is midway sending a packet,
 then gets a sync, gives up sending an end-of-frame and starts sending
 a new packet, at which point the old packet + the new packet =
 overflow.

 Knowing that the design works for paper, my wondering is whether after
 arming the sync generator syncs are flowing through the design before
 the ethernet interface is enabled. Do you have a PPS-like input? the
 fengine initialisation script seems to wait for a second after arming,
 but if your sync input is something significantly slower, you could
 have problems.

 I'm sceptical about this theory (I think the symptoms would be lots of
 OK packets when you brought up the interface, and then it dying when
 the sync arrives, rather than a single good packet like you're
 seeing), but if the firmware + software really is the same as that
 working with paper, and the wiki hasn't just got out of sync with the
 paper devs, perhaps the problem is in your hardware setup

 Cheers,
 Jack

 On 27 October 2014 16:38, Richard Black aeldstes...@gmail.com wrote:
  By enable port, I assume you mean the valid port. I've been looking
  at
  the PAPER model carefully for some time now, and that is how it
  operates. It
  has a gated valid signal with a software register on each 10-GbE core.
 
  Once again, this is not our model. This is one made available on the
  CASPER
  wiki and run without modifications.
 
  Richard Black
 
  On Mon, Oct 27, 2014 at 10:34 AM, Jason Manley jman...@ska.ac.za
  wrote:
 
  I suspect the 10GbE core's input FIFO is overflowing on startup. One
  key
  thing with this core is to the ensure that your design keeps the enable
  port
  held low until the core's been configured. The core becomes unusable
  once
  the TX FIFO overflows. This has been a long-standing bug (my emails
  trace
  back to 2009) but it's so easy to work around that I don't think
  anyone's
  bothered looking into fixing it.
 
  Jason Manley
  CBF Manager
  SKA-SA
 
  Cell: +27 82 662 7726
  Work: +27 21 506 7300
 
  On 27 Oct 2014, at 18:25, Richard Black aeldstes...@gmail.com wrote:
 
   Jason,
  
   Thanks for your comments. While I agree that changing the ADC
   frequency
   mid-operation is non-kosher and could result in uncertain behavior,
   the
   issue at hand for us is to figure out what is going on with the PAPER
   model
   that has been published on the CASPER wiki. This naturally won't be
   (and
   shouldn't be) the end-all solution to this problem.
  
   This is a reportedly fully-functional model that shouldn't require
   any
   major changes in order to operate. However, this has clearly not been
   the
   case in at least two independent situations (us and Peter). This begs
   the
   question: what's so different about our use of PAPER?
  
   We, at BYU, have made painstakingly sure that our IP addressing
   schemes,
   switch ports, and scripts are all configured correctly (thanks to
   David
   MacMahon for that, btw), but we still have hit the proverbial brick
   wall of
   10-GbE overflow.  When I last 

Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread David MacMahon
Hi, Richard,

On Oct 27, 2014, at 9:25 AM, Richard Black wrote:

 This is a reportedly fully-functional model that shouldn't require any major 
 changes in order to operate. However, this has clearly not been the case in 
 at least two independent situations (us and Peter). This begs the question: 
 what's so different about our use of PAPER?

I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the one 
being used by the PAPER correlator currently fielded in South Africa.  It is 
definitely a fully functional model.  That image (and all source files for it) 
is available from the git repo listed on the PAPER Correlator Manifest page of 
the CASPER Wiki:

https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest

 We, at BYU, have made painstakingly sure that our IP addressing schemes, 
 switch ports, and scripts are all configured correctly (thanks to David 
 MacMahon for that, btw), but we still have hit the proverbial brick wall of 
 10-GbE overflow.  When I last corresponded with David, he explained that he 
 remembers having a similar issue before, but can't recall exactly what the 
 problem was.

Really?  I recall saying that I often forget about increasing the MTU of the 10 
GbE switch and NICs.  I don't recall saying that I had a similar issue before 
but couldn't remember the problem.

 In any case, the fact that by turning down the ADC clock prior to start-up 
 prevents the 10-GbE core from overflowing is a major lead for us at BYU 
 (we've been spinning our wheels on this issue for several months now). By no 
 means are we proposing mid-run ADC clock modifications, but this appears to 
 be a very subtle (and quite sinister, in my opinion) bug.
 
 Any thoughts as to what might be going on?

I cannot explain the 10 GbE overflow that you and Peter are experiencing.  I 
have pushed some updates to the rb-papergpu.git repository listed on the PAPER 
Correlator Manifest page.  The paper_feng_init.rb script now verifies that the 
ADC clocks are locked and provides options for issuing a software sync (only 
recommended for lab use) and for not storing the time of synchronization in 
redis (also only recommended for lab use).

The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1) 
while they are held in reset.  Since you are using the paper_feng_init.rb 
script, this should not be happening (unless something has gone wrong during 
the running of that script) because that script specifically and explicitly 
disables the tx_valid signal before putting the cores into reset and it takes 
the cores out of reset before enabling the tx_valid signal.  So assuming that 
this is not the cause of the overflows, there must be something else that is 
causing the 10 GbE cores to be unable to transmit data fast enough to keep up 
with the data stream it is being fed.  Two things that could cause this are 1) 
running the design faster than the 200 MHz sample clock that it was built for 
and/or 2) some link issue that prevents the core from sending data.  
Unfortunately, I think both of those ideas are also pretty far fetched given 
all you've done to try to get the system working.  I wonder whether there is 
some difference in the ROACH2 firmware (u-boot version or CPLD programming) or 
PPC Linux setup or tcpborhpserver revision or ???.

Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data to 
make sure that it looks OK?

Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread Richard Black
David,

We'll take another close look at what model we are actually using, just to
be safe.

I went back and looked at our e-mails, and sure enough, you're right. You
were referring to the MTU issue as being the problem you tend to suppress
all memory of. It was just that you stated it in a separate paragraph, so,
out-of-context, I extrapolated that you have had the same problem before.
My bad for dragging your good name through the mud. :)

We will also update our local repositories, in the event some bizarre race
condition exists on our end.

I didn't know that the buffer could fill up while reset was asserted. We'll
definitely have to check up on that too.

We haven't tried dumping raw ADC data yet since we have been trying to get
the data link working first. After that, we were planning to inject signal
and examine outputs.

Thanks,

Richard Black

On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon dav...@astro.berkeley.edu
wrote:

 Hi, Richard,

 On Oct 27, 2014, at 9:25 AM, Richard Black wrote:

  This is a reportedly fully-functional model that shouldn't require any
 major changes in order to operate. However, this has clearly not been the
 case in at least two independent situations (us and Peter). This begs the
 question: what's so different about our use of PAPER?

 I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is
 the one being used by the PAPER correlator currently fielded in South
 Africa.  It is definitely a fully functional model.  That image (and all
 source files for it) is available from the git repo listed on the PAPER
 Correlator Manifest page of the CASPER Wiki:

 https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest

  We, at BYU, have made painstakingly sure that our IP addressing schemes,
 switch ports, and scripts are all configured correctly (thanks to David
 MacMahon for that, btw), but we still have hit the proverbial brick wall of
 10-GbE overflow.  When I last corresponded with David, he explained that he
 remembers having a similar issue before, but can't recall exactly what the
 problem was.

 Really?  I recall saying that I often forget about increasing the MTU of
 the 10 GbE switch and NICs.  I don't recall saying that I had a similar
 issue before but couldn't remember the problem.

  In any case, the fact that by turning down the ADC clock prior to
 start-up prevents the 10-GbE core from overflowing is a major lead for us
 at BYU (we've been spinning our wheels on this issue for several months
 now). By no means are we proposing mid-run ADC clock modifications, but
 this appears to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?

 I cannot explain the 10 GbE overflow that you and Peter are experiencing.
 I have pushed some updates to the rb-papergpu.git repository listed on the
 PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies
 that the ADC clocks are locked and provides options for issuing a software
 sync (only recommended for lab use) and for not storing the time of
 synchronization in redis (also only recommended for lab use).

 The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1)
 while they are held in reset.  Since you are using the paper_feng_init.rb
 script, this should not be happening (unless something has gone wrong
 during the running of that script) because that script specifically and
 explicitly disables the tx_valid signal before putting the cores into reset
 and it takes the cores out of reset before enabling the tx_valid signal.
 So assuming that this is not the cause of the overflows, there must be
 something else that is causing the 10 GbE cores to be unable to transmit
 data fast enough to keep up with the data stream it is being fed.  Two
 things that could cause this are 1) running the design faster than the 200
 MHz sample clock that it was built for and/or 2) some link issue that
 prevents the core from sending data.  Unfortunately, I think both of those
 ideas are also pretty far fetched given all you've done to try to get the
 system working.  I wonder whether there is some difference in the ROACH2
 firmware (u-boot version or CPLD programming) or PPC Linux setup or
 tcpborhpserver revision or ???.

 Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data
 to make sure that it looks OK?

 Dave




Re: [casper] Problem about the adc frequency in PAPER model.

2014-10-27 Thread David MacMahon
Hi, Richard and Peter,

Another possibility that crossed my mind is perhaps your ROACH2s were from the 
batch where the incorrect oscillator was installed for U72.  This seems 
unlikely for Richard based on this email (which also describes the incorrect 
oscillator problem in general):

https://www.mail-archive.com/casper@lists.berkeley.edu/msg04909.html

Maybe it's worth a double check anyway?

Dave

On Oct 27, 2014, at 1:41 PM, Richard Black wrote:

 David,
 
 We'll take another close look at what model we are actually using, just to be 
 safe.
 
 I went back and looked at our e-mails, and sure enough, you're right. You 
 were referring to the MTU issue as being the problem you tend to suppress all 
 memory of. It was just that you stated it in a separate paragraph, so, 
 out-of-context, I extrapolated that you have had the same problem before. My 
 bad for dragging your good name through the mud. :)
 
 We will also update our local repositories, in the event some bizarre race 
 condition exists on our end.
 
 I didn't know that the buffer could fill up while reset was asserted. We'll 
 definitely have to check up on that too.
 
 We haven't tried dumping raw ADC data yet since we have been trying to get 
 the data link working first. After that, we were planning to inject signal 
 and examine outputs.
 
 Thanks,
 
 Richard Black
 
 On Mon, Oct 27, 2014 at 2:26 PM, David MacMahon dav...@astro.berkeley.edu 
 wrote:
 Hi, Richard,
 
 On Oct 27, 2014, at 9:25 AM, Richard Black wrote:
 
  This is a reportedly fully-functional model that shouldn't require any 
  major changes in order to operate. However, this has clearly not been the 
  case in at least two independent situations (us and Peter). This begs the 
  question: what's so different about our use of PAPER?
 
 I just verified that the roach2_fengine_2013_Oct_14_1756.bof.gz file is the 
 one being used by the PAPER correlator currently fielded in South Africa.  It 
 is definitely a fully functional model.  That image (and all source files for 
 it) is available from the git repo listed on the PAPER Correlator Manifest 
 page of the CASPER Wiki:
 
 https://casper.berkeley.edu/wiki/PAPER_Correlator_Manifest
 
  We, at BYU, have made painstakingly sure that our IP addressing schemes, 
  switch ports, and scripts are all configured correctly (thanks to David 
  MacMahon for that, btw), but we still have hit the proverbial brick wall of 
  10-GbE overflow.  When I last corresponded with David, he explained that he 
  remembers having a similar issue before, but can't recall exactly what the 
  problem was.
 
 Really?  I recall saying that I often forget about increasing the MTU of the 
 10 GbE switch and NICs.  I don't recall saying that I had a similar issue 
 before but couldn't remember the problem.
 
  In any case, the fact that by turning down the ADC clock prior to start-up 
  prevents the 10-GbE core from overflowing is a major lead for us at BYU 
  (we've been spinning our wheels on this issue for several months now). By 
  no means are we proposing mid-run ADC clock modifications, but this appears 
  to be a very subtle (and quite sinister, in my opinion) bug.
 
  Any thoughts as to what might be going on?
 
 I cannot explain the 10 GbE overflow that you and Peter are experiencing.  I 
 have pushed some updates to the rb-papergpu.git repository listed on the 
 PAPER Correlator Manifest page.  The paper_feng_init.rb script now verifies 
 that the ADC clocks are locked and provides options for issuing a software 
 sync (only recommended for lab use) and for not storing the time of 
 synchronization in redis (also only recommended for lab use).
 
 The 10 GbE cores can overflow if they are fed valid data (i.e. tx_valid=1) 
 while they are held in reset.  Since you are using the paper_feng_init.rb 
 script, this should not be happening (unless something has gone wrong during 
 the running of that script) because that script specifically and explicitly 
 disables the tx_valid signal before putting the cores into reset and it takes 
 the cores out of reset before enabling the tx_valid signal.  So assuming that 
 this is not the cause of the overflows, there must be something else that is 
 causing the 10 GbE cores to be unable to transmit data fast enough to keep up 
 with the data stream it is being fed.  Two things that could cause this are 
 1) running the design faster than the 200 MHz sample clock that it was built 
 for and/or 2) some link issue that prevents the core from sending data.  
 Unfortunately, I think both of those ideas are also pretty far fetched given 
 all you've done to try to get the system working.  I wonder whether there is 
 some difference in the ROACH2 firmware (u-boot version or CPLD programming) 
 or PPC Linux setup or tcpborhpserver revision or ???.
 
 Have you tried using adc16_dump_chans.rb to dump snapshots of the ADC data to 
 make sure that it looks OK?
 
 Dave
 
 




[casper] spectrometer implementation using LX110T instead of SX95T

2014-10-27 Thread Louis Dartez
Hi all, 

I have implemented a 4 channel 200MHz correlating spectrometer on a 
ROACH 1 using the Virtex5 SX95T. Currently, I am trying to get this same design 
to compile and run on a LX110T chip instead. I know that the SX95T is much more 
DSP intensive and more suitable for this sort of thing. During compilation for 
the LX110T I ran into the expected issues with resources and trying to use more 
than were available on the LX110T. I was wondering if anyone had any 
tips/advice on how to go about this? Has anyone out there run into similar 
situations? What knobs should I be able to tweak to get the design to compile 
for a LX110T? Is it even possible?

I’d be more than happy to share my mdl (slx) files if needed. :)

Thanks in advance!
L
 Louis P. Dartez
 Graduate Research Assistant
 STARGATE
 Center for Advanced Radio Astronomy
 University of Texas at Brownsville
 (956) 372-5812



Re: [casper] spectrometer implementation using LX110T instead of SX95T

2014-10-27 Thread Dan Werthimer
hi louis,

are you running out of memory?   dsp48's?  slices?

if memory, the easiest thing to do is cut back on
number of frequency channels.

best,

dan


On Mon, Oct 27, 2014 at 4:59 PM, Louis Dartez louisdar...@gmail.com wrote:
 Hi all,

 I have implemented a 4 channel 200MHz correlating spectrometer on a ROACH 1
 using the Virtex5 SX95T. Currently, I am trying to get this same design to
 compile and run on a LX110T chip instead. I know that the SX95T is much more
 DSP intensive and more suitable for this sort of thing. During compilation
 for the LX110T I ran into the expected issues with resources and trying to
 use more than were available on the LX110T. I was wondering if anyone had
 any tips/advice on how to go about this? Has anyone out there run into
 similar situations? What knobs should I be able to tweak to get the design
 to compile for a LX110T? Is it even possible?

 I’d be more than happy to share my mdl (slx) files if needed. :)

 Thanks in advance!
 L

 Louis P. Dartez

 Graduate Research Assistant

 STARGATE

 Center for Advanced Radio Astronomy

 University of Texas at Brownsville

 (956) 372-5812





Re: [casper] spectrometer implementation using LX110T instead of SX95T

2014-10-27 Thread Louis Dartez
Hi Dan, 
Slices is what seems to be problem from the error report (which I can 
send around tomorrow morning when I’m back in the lab). I seem to remember that 
that the compiler raised an error stating that I was trying to ~80k slices when 
only ~60k are available. I knew this would be a slippery slope when I started. 
But it would be great if we could salvage our LX110T. 

Any chance someone out there has a ROACHI SX95T that’s just collecting 
dust?
L
 Louis P. Dartez
 Graduate Research Assistant
 STARGATE
 Center for Advanced Radio Astronomy
 University of Texas Rio Grande Valley
 (956) 372-5812

 On Oct 27, 2014, at 7:31 PM, Dan Werthimer d...@ssl.berkeley.edu wrote:
 
 hi louis,
 
 are you running out of memory?   dsp48's?  slices?
 
 if memory, the easiest thing to do is cut back on
 number of frequency channels.
 
 best,
 
 dan
 
 
 On Mon, Oct 27, 2014 at 4:59 PM, Louis Dartez louisdar...@gmail.com wrote:
 Hi all,
 
 I have implemented a 4 channel 200MHz correlating spectrometer on a ROACH 1
 using the Virtex5 SX95T. Currently, I am trying to get this same design to
 compile and run on a LX110T chip instead. I know that the SX95T is much more
 DSP intensive and more suitable for this sort of thing. During compilation
 for the LX110T I ran into the expected issues with resources and trying to
 use more than were available on the LX110T. I was wondering if anyone had
 any tips/advice on how to go about this? Has anyone out there run into
 similar situations? What knobs should I be able to tweak to get the design
 to compile for a LX110T? Is it even possible?
 
 I’d be more than happy to share my mdl (slx) files if needed. :)
 
 Thanks in advance!
 L
 
 Louis P. Dartez
 
 Graduate Research Assistant
 
 STARGATE
 
 Center for Advanced Radio Astronomy
 
 University of Texas at Brownsville
 
 (956) 372-5812
 
 



Re: [casper] spectrometer implementation using LX110T instead of SX95T

2014-10-27 Thread Jack Hickish
Hi louis,

I've just checked the spec sheet - 64 multipliers!! I'm guessing you ran
out of slices when you (or maybe the compiler) pushed lots of multipliers
into logic? (the lx has more slices than the sx)
Maybe send around your utilisation summary tomorrow - it sounds like you
might need to find some pretty substantial savings from somewhere.

A few things which might help save logic:
- hard code the fft shift schedule
- lower the number of fft bits, or use bit growth
- if the fft uses pipelined convert cores for rounding, using a cheap
rounding strategy with low latency should help.
- combine fir filters in a single block which shares logic (if you haven't
already), same with the fft.

But you might be better off finding a different roach :)

Jack
 On 28 Oct 2014 00:37, Louis Dartez louisdar...@gmail.com wrote:

 Hi Dan,
 Slices is what seems to be problem from the error report (which I can send
 around tomorrow morning when I’m back in the lab). I seem to remember that
 that the compiler raised an error stating that I was trying to ~80k slices
 when only ~60k are available. I knew this would be a slippery slope when I
 started. But it would be great if we could salvage our LX110T.

 Any chance someone out there has a ROACHI SX95T that’s just collecting
 dust?
 L

 Louis P. Dartez

 Graduate Research Assistant

 STARGATE

 Center for Advanced Radio Astronomy

 University of Texas Rio Grande Valley

 (956) 372-5812


 On Oct 27, 2014, at 7:31 PM, Dan Werthimer d...@ssl.berkeley.edu wrote:

 hi louis,

 are you running out of memory?   dsp48's?  slices?

 if memory, the easiest thing to do is cut back on
 number of frequency channels.

 best,

 dan


 On Mon, Oct 27, 2014 at 4:59 PM, Louis Dartez louisdar...@gmail.com
 wrote:

 Hi all,

 I have implemented a 4 channel 200MHz correlating spectrometer on a ROACH 1
 using the Virtex5 SX95T. Currently, I am trying to get this same design to
 compile and run on a LX110T chip instead. I know that the SX95T is much
 more
 DSP intensive and more suitable for this sort of thing. During compilation
 for the LX110T I ran into the expected issues with resources and trying to
 use more than were available on the LX110T. I was wondering if anyone had
 any tips/advice on how to go about this? Has anyone out there run into
 similar situations? What knobs should I be able to tweak to get the design
 to compile for a LX110T? Is it even possible?

 I’d be more than happy to share my mdl (slx) files if needed. :)

 Thanks in advance!
 L

 Louis P. Dartez

 Graduate Research Assistant

 STARGATE

 Center for Advanced Radio Astronomy

 University of Texas at Brownsville

 (956) 372-5812