Re: [casper] Fwd: Re: SPDO ROACH spectrometer

2009-11-04 Thread Jason Manley
As you say, NFS mounts work correctly which would indicate that the  
network is operating as expected. WRT other errors, are you certain  
that all reads/writes on FPGA are on 32-bit boundaries? Byte-sized and  
16-bit reads are supposed to work, but we have found that for some  
reason they sometimes cause crashes. It doesn't break immediately, but  
on subsequent bus transactions. These manifest as kernel crashes.


Jason

On 04 Nov 2009, at 07:56, Kjetil Wormnes wrote:


Hi all,

For reference I've attached a summary of our problems below, and a  
few things I have attempted to do to isolate it. The short of it is  
that we are unable to transfer large amounts of data across the  
ethernet reliably regardless of;

--kernel version
--whether we are usb mount or nfs mount root file system.
--network protocol used for transfer

The way the crash happens varies, and is not repeatable. Sometime it  
seems to be a userspace crash, sometimes it is a kernel panic. I  
have been unable to see any real pattern in the crash reports. This  
to me seems to indicate that the root cause of the problem may be  
common, and either an obscure kernel problem or possibly something  
in the interface between the kernel and the hardware or in the  
hardware itself.


It wouldn't be a big effort to re implement our software to run on a  
remote machine and talk to the ROACH over KATCP, rather than run  
locally on the ppc. But since it would require a complete rewrite of  
the software, we haven't tested this yet. Perhaps it is worth trying.


The catch is that I am still really unsure whether we are dealing  
with many symptoms of the same problem; or many different problems.


Anyway, I would like to thank you for all your input, and will let  
you know if and how we find a satisfactory solution.


cheers

Kjetil



Here is the summary:



*The problem*
The system crashes when downloading large files. There appears to be  
varying causes for this crash that may or may not have a common  
underlying reason.


I have attempted to isolate the problem by
• Downloading using different protocols and software; ssh and two  
different ftp servers.

• Mounting the filesystem over NFS as opposed to USB
• Installing well-known and used kernels, and comparing to custom  
kernels.

SSH
SSH always crashes with “Invalid MAC on input” or related error  
messages. This appears to be a problem with SSH.


*FTP*
System instabilities were observed using two different ftp servers;  
proftpd and pure-ftpd.


In the best case, with pure-ftpd was able to download 2-3 files,  
each of size about 2GB before system crashing. Looking through the  
call stack seemed to indicate that the crash happened in EMAC  
interface functions. (ie ethernet).


However, we have no way of knowing whether these crashes are in fact  
rather side-effects of the USB subsystem misbehaving. Jason from the  
Casper mailing list has once again reconfirmed that USB on powerpcs  
is notoriously unreliable.


*DIFFERENT KERNELS - DIFFERENT PROBLEMS*
Using some kernels (the latest) saw the link unable to come up at  
all, while both a custom compiled older kernel (a couple of months  
ago) and a downloaded image, uImage-20091006-mmcfix both saw the  
link come up, but with all the crashes described.


*ELIMINATING USB AS A CAUSE*
To eliminate the effects of USB, I mounted the root filesystem  
remotely using NFS. I make a few observations;


*SSH*
Still dies from time to time with the Invalid MAC error message.  
This was expected as we have already pretty much determined that  
this error is ssh-specific and not related to our other worries.


*ETHERNET*
Comes up nicely. System mounts remotely and file access has not  
caused any obvious problems. In fact I have not really had any  
problems that I can trace directly back to the Ethernet.


That being said, the systems seems to crash after a little while  
with this setup also. The error messages have been varying. Only  
once has it been a kernel crash, and then, looking at the call stack  
it no longer appears to crash inside EMAC access functions.


The download speeds seem quite variable; but this is probably more  
likely due to the network since the operating system is over NFS  
than the ROACH board itself.






Jason Manley wrote:
Marc Welz or David George built that kernel. They are the best  
people  to ask about this. I've cc'd them, though I'm not sure  
either would  have the config file from that release. It might be  
easiest to  checkout an older svn version.


Might I suggest that instead of recording data to a USB HDD, that  
you  rather record it across the network to another computer? If  
you don't  want to use KATCP for dumping the data directly from  
your FPGA, you  can always mount an NFS network share on your ROACH  
and record the  data there. The USB on the PPC platforms are  
notoriously unreliable.


Jason

On 03 Nov 2009, at 03:05, Kjetil Wormnes wrote:



Hi Jason,

Thank you again for your 

Re: [casper] program ROACH over JTAG

2009-11-04 Thread C-H Cheng

Hello Suraj

Thanks for you explanation.
Maybe I don't describe clearly.
I mean if I want program the FPAG on ROACH over JTAG.
Also, the bit file is generated by Xilinx ISE not CASPER toolflow.
In this moment, the ucf file of my design is only included the IO pins of my 
design.

But a lot of pins that ROACH are needed are not included in my design.
How could I program the FPGA of ROACH with my bit file?

Thanks,
C-H Cheng


Hello,

On Nov 3, 2009, at 6:39 PM, C-H Cheng wrote:


Hello All

If I want to simulate a design in ISE and generate a bit file to 
download to ROACH over JTAG.
You can do this using the .bit generated by the CASPER toolflow, 
available in the same location as the .bof.  bof files are generated  from 
.bit files using the script 'mkbof' distributed in  'XPS_ROACH_BASE'.



A problem I meet is the FPGA pin number assignment.
For example, in ISE I select the device is vxs95t and the FPGA pin 
assignment in ucf file is according to my desing.
But ROACH has a lot of FPGA pins which are not in my design but  needed 
for ROACH, sys_clk_n, sys_clk_p, aux0_clk_p, aux_clk_n,  ppc_irq_n, 
...etc.
Part of the magic of the toolflow is that it adds the necessary IO  pins 
to the .ucf depending on which IO blocks you have selected to use  (10gbe, 
adc, etc.).  It's part of the reason that IO blocks get  special 
designation as yellowblocks, as they are processed  differently for each 
block.


Hence, I can't add these FPGA pins which ROACH is needed into the  ucf 
file of my design.
It would definitely be faster to just put the I/O blocks you want to  use 
into a blank model file, and use the CASPER toolflow to generate  the 
.ucf, by un-checking the boxes for 'update system design', 'system 
generator', and 'ISE/EDK/bitgen' in the 'bee_xps' dialog.  This should 
take about a minute to complete.


-Suraj 





Re: [casper] Fwd: Re: SPDO ROACH spectrometer

2009-11-04 Thread Jason Manley
Also, make sure you're running newwer versions of uboot and the CPLD  
image. Bus settings changed some months back and improved stability  
significantly.


Uboot will report the versions, and I recommend:

U-Boot 2008.10-svn2226 (Aug  7 2009 - 16:06:44)
...
Monitor Revision: 8.3.1698
CPLD Revision:8.1.0

at the very least, you should have CPLD Revision 8.0.1588.

The only outstanding bug that regularly affects me is that u-boot  
sometimes doesn't detect the PPC's SDRAM on startup. The system then  
hangs. Replacing the DIMM with registered memory (same as FPGA DIMM)  
apparently fixes this.


Jason

On 04 Nov 2009, at 07:56, Kjetil Wormnes wrote:


Hi all,

For reference I've attached a summary of our problems below, and a  
few things I have attempted to do to isolate it. The short of it is  
that we are unable to transfer large amounts of data across the  
ethernet reliably regardless of;

--kernel version
--whether we are usb mount or nfs mount root file system.
--network protocol used for transfer

The way the crash happens varies, and is not repeatable. Sometime it  
seems to be a userspace crash, sometimes it is a kernel panic. I  
have been unable to see any real pattern in the crash reports. This  
to me seems to indicate that the root cause of the problem may be  
common, and either an obscure kernel problem or possibly something  
in the interface between the kernel and the hardware or in the  
hardware itself.


It wouldn't be a big effort to re implement our software to run on a  
remote machine and talk to the ROACH over KATCP, rather than run  
locally on the ppc. But since it would require a complete rewrite of  
the software, we haven't tested this yet. Perhaps it is worth trying.


The catch is that I am still really unsure whether we are dealing  
with many symptoms of the same problem; or many different problems.


Anyway, I would like to thank you for all your input, and will let  
you know if and how we find a satisfactory solution.


cheers

Kjetil



Here is the summary:



*The problem*
The system crashes when downloading large files. There appears to be  
varying causes for this crash that may or may not have a common  
underlying reason.


I have attempted to isolate the problem by
• Downloading using different protocols and software; ssh and two  
different ftp servers.

• Mounting the filesystem over NFS as opposed to USB
• Installing well-known and used kernels, and comparing to custom  
kernels.

SSH
SSH always crashes with “Invalid MAC on input” or related error  
messages. This appears to be a problem with SSH.


*FTP*
System instabilities were observed using two different ftp servers;  
proftpd and pure-ftpd.


In the best case, with pure-ftpd was able to download 2-3 files,  
each of size about 2GB before system crashing. Looking through the  
call stack seemed to indicate that the crash happened in EMAC  
interface functions. (ie ethernet).


However, we have no way of knowing whether these crashes are in fact  
rather side-effects of the USB subsystem misbehaving. Jason from the  
Casper mailing list has once again reconfirmed that USB on powerpcs  
is notoriously unreliable.


*DIFFERENT KERNELS - DIFFERENT PROBLEMS*
Using some kernels (the latest) saw the link unable to come up at  
all, while both a custom compiled older kernel (a couple of months  
ago) and a downloaded image, uImage-20091006-mmcfix both saw the  
link come up, but with all the crashes described.


*ELIMINATING USB AS A CAUSE*
To eliminate the effects of USB, I mounted the root filesystem  
remotely using NFS. I make a few observations;


*SSH*
Still dies from time to time with the Invalid MAC error message.  
This was expected as we have already pretty much determined that  
this error is ssh-specific and not related to our other worries.


*ETHERNET*
Comes up nicely. System mounts remotely and file access has not  
caused any obvious problems. In fact I have not really had any  
problems that I can trace directly back to the Ethernet.


That being said, the systems seems to crash after a little while  
with this setup also. The error messages have been varying. Only  
once has it been a kernel crash, and then, looking at the call stack  
it no longer appears to crash inside EMAC access functions.


The download speeds seem quite variable; but this is probably more  
likely due to the network since the operating system is over NFS  
than the ROACH board itself.






Jason Manley wrote:
Marc Welz or David George built that kernel. They are the best  
people  to ask about this. I've cc'd them, though I'm not sure  
either would  have the config file from that release. It might be  
easiest to  checkout an older svn version.


Might I suggest that instead of recording data to a USB HDD, that  
you  rather record it across the network to another computer? If  
you don't  want to use KATCP for dumping the data directly from  
your FPGA, you  can always mount an NFS network share on your ROACH  

Re: [casper] program ROACH over JTAG

2009-11-04 Thread Jason Manley
If you already have a bitstream, simply plug a JTAG programmer into P2  
(labelled Xilinx JTAG), and use IMPACT.


But if I read your email correctly, you haven't configured clocks or  
anything so I'm not sure what the point of this exercise is. I agree  
with Suraj, easiest would be to start with a CASPER-toolflow generated  
base system and add/modify from there.


Jason


On 04 Nov 2009, at 10:33, C-H Cheng wrote:


Hello Suraj

Thanks for you explanation.
Maybe I don't describe clearly.
I mean if I want program the FPAG on ROACH over JTAG.
Also, the bit file is generated by Xilinx ISE not CASPER toolflow.
In this moment, the ucf file of my design is only included the IO  
pins of my design.

But a lot of pins that ROACH are needed are not included in my design.
How could I program the FPGA of ROACH with my bit file?

Thanks,
C-H Cheng


Hello,

On Nov 3, 2009, at 6:39 PM, C-H Cheng wrote:


Hello All

If I want to simulate a design in ISE and generate a bit file to  
download to ROACH over JTAG.
You can do this using the .bit generated by the CASPER toolflow,  
available in the same location as the .bof.  bof files are  
generated  from .bit files using the script 'mkbof' distributed in   
'XPS_ROACH_BASE'.



A problem I meet is the FPGA pin number assignment.
For example, in ISE I select the device is vxs95t and the FPGA pin  
assignment in ucf file is according to my desing.
But ROACH has a lot of FPGA pins which are not in my design but   
needed for ROACH, sys_clk_n, sys_clk_p, aux0_clk_p, aux_clk_n,   
ppc_irq_n, ...etc.
Part of the magic of the toolflow is that it adds the necessary IO   
pins to the .ucf depending on which IO blocks you have selected to  
use  (10gbe, adc, etc.).  It's part of the reason that IO blocks  
get  special designation as yellowblocks, as they are processed   
differently for each block.


Hence, I can't add these FPGA pins which ROACH is needed into the   
ucf file of my design.
It would definitely be faster to just put the I/O blocks you want  
to  use into a blank model file, and use the CASPER toolflow to  
generate  the .ucf, by un-checking the boxes for 'update system  
design', 'system generator', and 'ISE/EDK/bitgen' in the 'bee_xps'  
dialog.  This should take about a minute to complete.


-Suraj








[casper] 10.1 designs on BEE2

2009-11-04 Thread Jack Hickish
Hi all,

I'm trying to retire an old 7.1 virtual machine to the digital grave that it
deserves. I ported a BEE2 design to 10.1, and successfully compiled the
relevant bof files for the BEE.

Unfortunately, when I run the boffile, the BEE hangs. If I run the process
in the background, I can see the /proc/... filesystem is populated with the
right registers/brams, but these hang when I try to access these with either
cat/echo or C-based commands (which work in an identical 7.1 design).

Glenn Jones seems to have experienced a similar thing
(http://www.mail-archive.com/casper@lists.berkeley.edu/msg00358.html), but
the maillist archive hasn't led me to any solutions so far. Does anyone have
an explanation/fix/info?

Any help appreciated,

Jack




Re: [casper] Fwd: Re: SPDO ROACH spectrometer

2009-11-04 Thread Kjetil Wormnes

Hi Jason,

Thanks for your pointers; I am currently not actually using the FPGA. 
Just focusing on being able to talk to the powerpc reliably at the moment.


The system does also crash when using NFS, but as I said and you noted; 
it is more difficult to trace them directly back to EMACS related kernel 
functions. It may very well be a secondary symptom of something else.


Now your suggested versions for Uboot/CPLD/Monitor are interesting.

We have two roach boards; the newer one that I have been testing is 
reporting


U-Boot 2008.10-svn2157 (Jul  31 2009 - 17:15:22)
...
Monitor Revision: 7.3.0
CPLD Revision:7.5.6

Whereas the older Roach that Wan has been using reports

U-Boot 2008.10-svn1923 (May  29 2009 - 17:22:43)
...
Monitor Revision: 6.5.1429
CPLD Revision:2.0.5


Leaving this older one aside for reference for now, I have upgraded the U-boot 
image on the newer roach to 20090807-uboot-nohack.bin, which is actually from 
revision 2212, but seemed to be the closest to the suggested revision I could 
find without compiling the image myself.


I was unsuccessfully looking around for how to upgrade the CPLD/Monitor. 
Would you be able to point me in the right direction?


I'll test for any improvements with the new uboot now.

Thanks again

Kjetil

Jason Manley wrote:

Also, make sure you're running newwer versions of uboot and the CPLD
image. Bus settings changed some months back and improved stability
significantly.

Uboot will report the versions, and I recommend:

U-Boot 2008.10-svn2226 (Aug  7 2009 - 16:06:44)
...
Monitor Revision: 8.3.1698
CPLD Revision:8.1.0

at the very least, you should have CPLD Revision 8.0.1588.

The only outstanding bug that regularly affects me is that u-boot
sometimes doesn't detect the PPC's SDRAM on startup. The system then
hangs. Replacing the DIMM with registered memory (same as FPGA DIMM)
apparently fixes this.

Jason

On 04 Nov 2009, at 07:56, Kjetil Wormnes wrote:

  

Hi all,

For reference I've attached a summary of our problems below, and a
few things I have attempted to do to isolate it. The short of it is
that we are unable to transfer large amounts of data across the
ethernet reliably regardless of;
--kernel version
--whether we are usb mount or nfs mount root file system.
--network protocol used for transfer

The way the crash happens varies, and is not repeatable. Sometime it
seems to be a userspace crash, sometimes it is a kernel panic. I
have been unable to see any real pattern in the crash reports. This
to me seems to indicate that the root cause of the problem may be
common, and either an obscure kernel problem or possibly something
in the interface between the kernel and the hardware or in the
hardware itself.

It wouldn't be a big effort to re implement our software to run on a
remote machine and talk to the ROACH over KATCP, rather than run
locally on the ppc. But since it would require a complete rewrite of
the software, we haven't tested this yet. Perhaps it is worth trying.

The catch is that I am still really unsure whether we are dealing
with many symptoms of the same problem; or many different problems.

Anyway, I would like to thank you for all your input, and will let
you know if and how we find a satisfactory solution.

cheers

Kjetil



Here is the summary:



*The problem*
The system crashes when downloading large files. There appears to be
varying causes for this crash that may or may not have a common
underlying reason.

I have attempted to isolate the problem by
• Downloading using different protocols and software; ssh and two
different ftp servers.
• Mounting the filesystem over NFS as opposed to USB
• Installing well-known and used kernels, and comparing to custom
kernels.
SSH
SSH always crashes with “Invalid MAC on input” or related error
messages. This appears to be a problem with SSH.

*FTP*
System instabilities were observed using two different ftp servers;
proftpd and pure-ftpd.

In the best case, with pure-ftpd was able to download 2-3 files,
each of size about 2GB before system crashing. Looking through the
call stack seemed to indicate that the crash happened in EMAC
interface functions. (ie ethernet).

However, we have no way of knowing whether these crashes are in fact
rather side-effects of the USB subsystem misbehaving. Jason from the
Casper mailing list has once again reconfirmed that USB on powerpcs
is notoriously unreliable.

*DIFFERENT KERNELS - DIFFERENT PROBLEMS*
Using some kernels (the latest) saw the link unable to come up at
all, while both a custom compiled older kernel (a couple of months
ago) and a downloaded image, uImage-20091006-mmcfix both saw the
link come up, but with all the crashes described.

*ELIMINATING USB AS A CAUSE*
To eliminate the effects of USB, I mounted the root filesystem
remotely using NFS. I make a few observations;

*SSH*
Still dies from time to time with the Invalid MAC error message.
This was expected as we have already pretty much determined that
this error is 

Re: [casper] Fwd: Re: SPDO ROACH spectrometer

2009-11-04 Thread John Ford
 Hi Jason,

Hi all.

I think it would be very cool if someone who knows could make a wiki page
to tell us what the suggested set of cpld/uboot/linux codes are, and if
the suggested versions are different for different purposes.  There's
getting to be quite a few choices in the archive.

We are firing up our ROACH development and I would like to start out with
the most stable set of firmware I can.

Thanks

John


 Thanks for your pointers; I am currently not actually using the FPGA.
 Just focusing on being able to talk to the powerpc reliably at the moment.

 The system does also crash when using NFS, but as I said and you noted;
 it is more difficult to trace them directly back to EMACS related kernel
 functions. It may very well be a secondary symptom of something else.

 Now your suggested versions for Uboot/CPLD/Monitor are interesting.

 We have two roach boards; the newer one that I have been testing is
 reporting

 U-Boot 2008.10-svn2157 (Jul  31 2009 - 17:15:22)
 ...
 Monitor Revision: 7.3.0
 CPLD Revision:7.5.6

 Whereas the older Roach that Wan has been using reports

 U-Boot 2008.10-svn1923 (May  29 2009 - 17:22:43)
 ...
 Monitor Revision: 6.5.1429
 CPLD Revision:2.0.5


 Leaving this older one aside for reference for now, I have upgraded the
 U-boot image on the newer roach to 20090807-uboot-nohack.bin, which is
 actually from revision 2212, but seemed to be the closest to the suggested
 revision I could find without compiling the image myself.


 I was unsuccessfully looking around for how to upgrade the CPLD/Monitor.
 Would you be able to point me in the right direction?

 I'll test for any improvements with the new uboot now.

 Thanks again

 Kjetil

 Jason Manley wrote:
 Also, make sure you're running newwer versions of uboot and the CPLD
 image. Bus settings changed some months back and improved stability
 significantly.

 Uboot will report the versions, and I recommend:

 U-Boot 2008.10-svn2226 (Aug  7 2009 - 16:06:44)
 ...
 Monitor Revision: 8.3.1698
 CPLD Revision:8.1.0

 at the very least, you should have CPLD Revision 8.0.1588.

 The only outstanding bug that regularly affects me is that u-boot
 sometimes doesn't detect the PPC's SDRAM on startup. The system then
 hangs. Replacing the DIMM with registered memory (same as FPGA DIMM)
 apparently fixes this.

 Jason

 On 04 Nov 2009, at 07:56, Kjetil Wormnes wrote:


 Hi all,

 For reference I've attached a summary of our problems below, and a
 few things I have attempted to do to isolate it. The short of it is
 that we are unable to transfer large amounts of data across the
 ethernet reliably regardless of;
 --kernel version
 --whether we are usb mount or nfs mount root file system.
 --network protocol used for transfer

 The way the crash happens varies, and is not repeatable. Sometime it
 seems to be a userspace crash, sometimes it is a kernel panic. I
 have been unable to see any real pattern in the crash reports. This
 to me seems to indicate that the root cause of the problem may be
 common, and either an obscure kernel problem or possibly something
 in the interface between the kernel and the hardware or in the
 hardware itself.

 It wouldn't be a big effort to re implement our software to run on a
 remote machine and talk to the ROACH over KATCP, rather than run
 locally on the ppc. But since it would require a complete rewrite of
 the software, we haven't tested this yet. Perhaps it is worth trying.

 The catch is that I am still really unsure whether we are dealing
 with many symptoms of the same problem; or many different problems.

 Anyway, I would like to thank you for all your input, and will let
 you know if and how we find a satisfactory solution.

 cheers

 Kjetil



 Here is the summary:



 *The problem*
 The system crashes when downloading large files. There appears to be
 varying causes for this crash that may or may not have a common
 underlying reason.

 I have attempted to isolate the problem by
 • Downloading using different protocols and software; ssh and two
 different ftp servers.
 • Mounting the filesystem over NFS as opposed to USB
 • Installing well-known and used kernels, and comparing to custom
 kernels.
 SSH
 SSH always crashes with “Invalid MAC on input” or related error
 messages. This appears to be a problem with SSH.

 *FTP*
 System instabilities were observed using two different ftp servers;
 proftpd and pure-ftpd.

 In the best case, with pure-ftpd was able to download 2-3 files,
 each of size about 2GB before system crashing. Looking through the
 call stack seemed to indicate that the crash happened in EMAC
 interface functions. (ie ethernet).

 However, we have no way of knowing whether these crashes are in fact
 rather side-effects of the USB subsystem misbehaving. Jason from the
 Casper mailing list has once again reconfirmed that USB on powerpcs
 is notoriously unreliable.

 *DIFFERENT KERNELS - DIFFERENT PROBLEMS*
 Using some kernels (the latest) saw the link unable to come up at

Re: [casper] Fwd: Re: SPDO ROACH spectrometer

2009-11-04 Thread Wan.Cheng
Hi Jason:

As Kjetil mentioned, we are only working on OS. The OS crash and network 
problem appears without our program running.

Cheers

Wan 

-Original Message-
From: Jason Manley [mailto:jasonman...@gmail.com] 
Sent: Wednesday, 4 November 2009 7:22 PM
To: Wormnes, Kjetil (ATNF, Marsfield)
Cc: Marc Welz; David George; Cheng, Wan (ATNF, Marsfield); 
casper@lists.berkeley.edu
Subject: Re: [casper] Fwd: Re: SPDO ROACH spectrometer

As you say, NFS mounts work correctly which would indicate that the  
network is operating as expected. WRT other errors, are you certain  
that all reads/writes on FPGA are on 32-bit boundaries? Byte-sized and  
16-bit reads are supposed to work, but we have found that for some  
reason they sometimes cause crashes. It doesn't break immediately, but  
on subsequent bus transactions. These manifest as kernel crashes.

Jason

On 04 Nov 2009, at 07:56, Kjetil Wormnes wrote:

 Hi all,

 For reference I've attached a summary of our problems below, and a  
 few things I have attempted to do to isolate it. The short of it is  
 that we are unable to transfer large amounts of data across the  
 ethernet reliably regardless of;
 --kernel version
 --whether we are usb mount or nfs mount root file system.
 --network protocol used for transfer

 The way the crash happens varies, and is not repeatable. Sometime it  
 seems to be a userspace crash, sometimes it is a kernel panic. I  
 have been unable to see any real pattern in the crash reports. This  
 to me seems to indicate that the root cause of the problem may be  
 common, and either an obscure kernel problem or possibly something  
 in the interface between the kernel and the hardware or in the  
 hardware itself.

 It wouldn't be a big effort to re implement our software to run on a  
 remote machine and talk to the ROACH over KATCP, rather than run  
 locally on the ppc. But since it would require a complete rewrite of  
 the software, we haven't tested this yet. Perhaps it is worth trying.

 The catch is that I am still really unsure whether we are dealing  
 with many symptoms of the same problem; or many different problems.

 Anyway, I would like to thank you for all your input, and will let  
 you know if and how we find a satisfactory solution.

 cheers

 Kjetil



 Here is the summary:



 *The problem*
 The system crashes when downloading large files. There appears to be  
 varying causes for this crash that may or may not have a common  
 underlying reason.

 I have attempted to isolate the problem by
 * Downloading using different protocols and software; ssh and two  
 different ftp servers.
 * Mounting the filesystem over NFS as opposed to USB
 * Installing well-known and used kernels, and comparing to custom  
 kernels.
 SSH
 SSH always crashes with Invalid MAC on input or related error  
 messages. This appears to be a problem with SSH.

 *FTP*
 System instabilities were observed using two different ftp servers;  
 proftpd and pure-ftpd.

 In the best case, with pure-ftpd was able to download 2-3 files,  
 each of size about 2GB before system crashing. Looking through the  
 call stack seemed to indicate that the crash happened in EMAC  
 interface functions. (ie ethernet).

 However, we have no way of knowing whether these crashes are in fact  
 rather side-effects of the USB subsystem misbehaving. Jason from the  
 Casper mailing list has once again reconfirmed that USB on powerpcs  
 is notoriously unreliable.

 *DIFFERENT KERNELS - DIFFERENT PROBLEMS*
 Using some kernels (the latest) saw the link unable to come up at  
 all, while both a custom compiled older kernel (a couple of months  
 ago) and a downloaded image, uImage-20091006-mmcfix both saw the  
 link come up, but with all the crashes described.

 *ELIMINATING USB AS A CAUSE*
 To eliminate the effects of USB, I mounted the root filesystem  
 remotely using NFS. I make a few observations;

 *SSH*
 Still dies from time to time with the Invalid MAC error message.  
 This was expected as we have already pretty much determined that  
 this error is ssh-specific and not related to our other worries.

 *ETHERNET*
 Comes up nicely. System mounts remotely and file access has not  
 caused any obvious problems. In fact I have not really had any  
 problems that I can trace directly back to the Ethernet.

 That being said, the systems seems to crash after a little while  
 with this setup also. The error messages have been varying. Only  
 once has it been a kernel crash, and then, looking at the call stack  
 it no longer appears to crash inside EMAC access functions.

 The download speeds seem quite variable; but this is probably more  
 likely due to the network since the operating system is over NFS  
 than the ROACH board itself.





 Jason Manley wrote:
 Marc Welz or David George built that kernel. They are the best  
 people  to ask about this. I've cc'd them, though I'm not sure  
 either would  have the config file from that release. It might be  
 

Re: [casper] Fwd: Re: SPDO ROACH spectrometer

2009-11-04 Thread Wan.Cheng
Hi Jason:

Thanks for you help.
But I could not find the tut4 in workshop. Could you please provide me the 
exact link? Thanks.

As I know, the KATCP only provide data in ASCII. Is this right? And for KATCP, 
all command and data are transferred by network. But we expeirence some network 
failure without running our own program. I guess it might not be a good idea to 
run our application over network at the moment.

For my experience with KATCP failure, I think it belong two cases, one is OS 
could crash when I run bof file. Another is network connection could be broken 
when I access the data or registers. I can provide more details when I see them 
again.

Anyway, thanks Jason for all information you provide.

Cheers

Wan

-Original Message-
From: Jason Manley [mailto:jasonman...@gmail.com] 
Sent: Wednesday, 4 November 2009 4:35 PM
To: Cheng, Wan (ATNF, Marsfield)
Cc: Wormnes, Kjetil (ATNF, Marsfield); m...@ska.ac.za; david.geo...@ska.ac.za; 
casper@lists.berkeley.edu
Subject: Re: [casper] Fwd: Re: SPDO ROACH spectrometer

comments appended below...

On 04 Nov 2009, at 01:04, wan.ch...@csiro.au wan.ch...@csiro.au  
wrote:

 Hi Jason:

 Could you please let me know how do you get data using KATCP?
Look at the wideband poco example (tut4) from the workshop. It is a  
fully-functional correlator on a single ROACH board, and includes  
python scripts which demonstrate the use of KATCP. Pulling data from  
DRAM is the same as retrieving it from BRAM or QDR.

 I also used KATCP for a while. But I can not find an efficient way  
 to read a large mount data from FPGA, like 1GB data stored in the  
 Dram.
This is possible without any complicated trickery using KATCP. Expect  
data rates of about 10MB/s if you implement a reasonable form of  
hardware/software handshaking.

 And there are a few difficulties as well. Such as, network is not  
 reliable, OS crashed when I download bof file sometimes.
This should never happen. Please provide details.

Jason

 -Original Message-
 From: Jason Manley [mailto:jasonman...@gmail.com]
 Sent: Tuesday, 3 November 2009 5:40 PM
 To: Wormnes, Kjetil (ATNF, Marsfield); Marc Welz; David George
 Cc: Cheng, Wan (ATNF, Marsfield); casper@lists.berkeley.edu
 Subject: Re: [casper] Fwd: Re: SPDO ROACH spectrometer

 Marc Welz or David George built that kernel. They are the best people
 to ask about this. I've cc'd them, though I'm not sure either would
 have the config file from that release. It might be easiest to
 checkout an older svn version.

 Might I suggest that instead of recording data to a USB HDD, that you
 rather record it across the network to another computer? If you don't
 want to use KATCP for dumping the data directly from your FPGA, you
 can always mount an NFS network share on your ROACH and record the
 data there. The USB on the PPC platforms are notoriously unreliable.

 Jason

 On 03 Nov 2009, at 03:05, Kjetil Wormnes wrote:

 Hi Jason,

 Thank you again for your reply. I can use FTP or even write my own
 little raw socket transfer routine, and it seems to work, I can
 transfer  a few gigabyte-size files.

 However, at the end of this, the other problem kicks in; causing a
 system crash. I believe this is a kernel problem, as it exhibits
 itself differently with different kernels I have tried.

 So, putting the ssh problem aside as something that we can work
 around and returning to the other request I made;

 I am compiling my own kernel because I seem to need to in order to
 get EHCI and EXT3 to work properly.

 However, when I do, EMAC can't autonegotiate a link, and even
 forcing it to something doesn't work. The link comes up, then drops
 out again... repeatedly.

 The interesting thing is this problem *does not* occur when I
 compile my kernel using an svn checkout from a couple of months ago.
 Even with the exact same .config file.

 At least this is the case as far as I can tell.

 Now, in order to be 100% sure that it is in fact a difference in the
 source that is causing this problem, rather than just the .config. I
 would love it if you could send me the .config file used to compile
 the uImage-20091006-mmcfix kernel.

 The ethernet interface does appear to be more stable with that
 kernel, but unfortunately I can't use it as it doesn't allow USB 2.0
 speeds, so if you please, the .config file would be very useful.

 Thanks again for all your help


 Kjetil


 Jason Manley wrote:
 There appears to be some issue with ssh on ROACH with large
 transfers.  It is definitely not a hardware problem as other
 network transfers  work fine. Both Andrew Martens and myself
 regularly transfer large  amounts of data (1GB) using KATCP. This
 ssh bug has become a low  priority for us as we concentrate on
 other things. If you do not want  to try'n debug it yourself, I
 recommend you try an FTP server.

 Kjetil, you are correct; at present, KATCP does not support
 transfer  of arbitrary files from filesystem.

 Jason

 On 02 Nov 2009, at 00:51, Kjetil 

Re: [casper] Fwd: Re: SPDO ROACH spectrometer

2009-11-04 Thread Wan.Cheng
Hi Jason:

I guess the latest CPLD is quite important for us.

But for Uboot, I am not sure. Will all PPC registers be re-initialized again in 
the linux core? Or the linux core use default value initialized by Uboot? Will 
Uboot load OS into Dram before set the program pointer to the OS start address? 
Or OS can load itself into the Dram?

Thanks

Wan

-Original Message-
From: Jason Manley [mailto:jasonman...@gmail.com]
Sent: Wednesday, 4 November 2009 7:40 PM
To: Wormnes, Kjetil (ATNF, Marsfield)
Cc: Marc Welz; David George; Cheng, Wan (ATNF, Marsfield); 
casper@lists.berkeley.edu
Subject: Re: [casper] Fwd: Re: SPDO ROACH spectrometer

Also, make sure you're running newwer versions of uboot and the CPLD
image. Bus settings changed some months back and improved stability
significantly.

Uboot will report the versions, and I recommend:

U-Boot 2008.10-svn2226 (Aug  7 2009 - 16:06:44)
...
Monitor Revision: 8.3.1698
CPLD Revision:8.1.0

at the very least, you should have CPLD Revision 8.0.1588.

The only outstanding bug that regularly affects me is that u-boot
sometimes doesn't detect the PPC's SDRAM on startup. The system then
hangs. Replacing the DIMM with registered memory (same as FPGA DIMM)
apparently fixes this.

Jason

On 04 Nov 2009, at 07:56, Kjetil Wormnes wrote:

 Hi all,

 For reference I've attached a summary of our problems below, and a
 few things I have attempted to do to isolate it. The short of it is
 that we are unable to transfer large amounts of data across the
 ethernet reliably regardless of;
 --kernel version
 --whether we are usb mount or nfs mount root file system.
 --network protocol used for transfer

 The way the crash happens varies, and is not repeatable. Sometime it
 seems to be a userspace crash, sometimes it is a kernel panic. I
 have been unable to see any real pattern in the crash reports. This
 to me seems to indicate that the root cause of the problem may be
 common, and either an obscure kernel problem or possibly something
 in the interface between the kernel and the hardware or in the
 hardware itself.

 It wouldn't be a big effort to re implement our software to run on a
 remote machine and talk to the ROACH over KATCP, rather than run
 locally on the ppc. But since it would require a complete rewrite of
 the software, we haven't tested this yet. Perhaps it is worth trying.

 The catch is that I am still really unsure whether we are dealing
 with many symptoms of the same problem; or many different problems.

 Anyway, I would like to thank you for all your input, and will let
 you know if and how we find a satisfactory solution.

 cheers

 Kjetil



 Here is the summary:



 *The problem*
 The system crashes when downloading large files. There appears to be
 varying causes for this crash that may or may not have a common
 underlying reason.

 I have attempted to isolate the problem by
 * Downloading using different protocols and software; ssh and two
 different ftp servers.
 * Mounting the filesystem over NFS as opposed to USB
 * Installing well-known and used kernels, and comparing to custom
 kernels.
 SSH
 SSH always crashes with Invalid MAC on input or related error
 messages. This appears to be a problem with SSH.

 *FTP*
 System instabilities were observed using two different ftp servers;
 proftpd and pure-ftpd.

 In the best case, with pure-ftpd was able to download 2-3 files,
 each of size about 2GB before system crashing. Looking through the
 call stack seemed to indicate that the crash happened in EMAC
 interface functions. (ie ethernet).

 However, we have no way of knowing whether these crashes are in fact
 rather side-effects of the USB subsystem misbehaving. Jason from the
 Casper mailing list has once again reconfirmed that USB on powerpcs
 is notoriously unreliable.

 *DIFFERENT KERNELS - DIFFERENT PROBLEMS*
 Using some kernels (the latest) saw the link unable to come up at
 all, while both a custom compiled older kernel (a couple of months
 ago) and a downloaded image, uImage-20091006-mmcfix both saw the
 link come up, but with all the crashes described.

 *ELIMINATING USB AS A CAUSE*
 To eliminate the effects of USB, I mounted the root filesystem
 remotely using NFS. I make a few observations;

 *SSH*
 Still dies from time to time with the Invalid MAC error message.
 This was expected as we have already pretty much determined that
 this error is ssh-specific and not related to our other worries.

 *ETHERNET*
 Comes up nicely. System mounts remotely and file access has not
 caused any obvious problems. In fact I have not really had any
 problems that I can trace directly back to the Ethernet.

 That being said, the systems seems to crash after a little while
 with this setup also. The error messages have been varying. Only
 once has it been a kernel crash, and then, looking at the call stack
 it no longer appears to crash inside EMAC access functions.

 The download speeds seem quite variable; but this is probably more
 likely due to the 

[casper] xlUpdateModel

2009-11-04 Thread John Ford
Hi all.  I'm trying to port a rather complex model to 10.1, and I had
hoped that xlUpdateModel would allow me to do it rather easily, but when I
run it  Matlab crashes.  It seems to choke on the gavrt library's vacc
module.  Has anyone gotten this to work, or should I forget it and just
redraw the model?

John





Re: [casper] xlUpdateModel

2009-11-04 Thread G Jones
John,
I think the shortcomings of xlUpdateModel are what made the transition
from 7.1 to 10.1 so painful. Dynamically drawn blocks like the vacc
will not be handled correctly in general. Therefore, I think it will
be much easier and reliable to simply redraw the diagram block for
block in 10.1.
Glenn

On Wed, Nov 4, 2009 at 4:12 PM, John Ford jf...@nrao.edu wrote:
 Hi all.  I'm trying to port a rather complex model to 10.1, and I had
 hoped that xlUpdateModel would allow me to do it rather easily, but when I
 run it  Matlab crashes.  It seems to choke on the gavrt library's vacc
 module.  Has anyone gotten this to work, or should I forget it and just
 redraw the model?

 John