RapidIO: MC Exception when enumerating peer to peer connection

2010-10-27 Thread Thomas Taranowski
Hi all,

I'm trying to bring up a RapidIO on my p2020 on v2.6.36-rc7.  I'm
running into an issue when the host tries to enumerate the agent
devices, and fails miserably.  The rio driver does a
fsl_rio_config_read with a destid of 255, after which it hangs, until
I get a timeout exception (Handled now, thanks!).  The port connection
get's trained, and everything looks good, but I get no response.  Any
ideas on what to look at to debug this?

Also, after the timeout, the driver seems to hang after that,
presumably because the port is in an error-stop state, because when I
use my trusty jtag to issue a Port Link Maintenance Request and
request status, I get back unrecoverable ackID error for port 1, and
error-stopped port_status for port 2.

0_ffec0140 : 0004 8050 02000202   ...P
0_ffec0150 :   00120002 4261  B`..
0_ffec0160 : 0004 0005    


Reference information below:


U-Boot LAW configuration:

Local Access Window Configuration
LAWBAR1 : 0x000a, LAWAR1 : 0x80c00017 /* SRIO Port 1 */
LAWBAR2 : 0x000a1000, LAWAR2 : 0x80d00017 /8 SRIO Port 2 */


Excerpt from linux boot:

Setting up RapidIO peer-to-peer network /rapi...@ffec

~ fsl_rio_setup 

of:fsl-of-rio ffec.rapidio: Of-device full name /rapi...@ffec
of:fsl-of-rio ffec.rapidio: Regs: [mem 0xffec-0xffed]
of:fsl-of-rio ffec.rapidio: LAW start 0xa000,
 size 0x0100.
of:fsl-of-rio ffec.rapidio: pwirq: 48, bellirq: 50, txirq: 53, rxirq 54
of:fsl-of-rio ffec.rapidio: Overriding RIO_PORT setting to single lane 0
of:fsl-of-rio ffec.rapidio: RapidIO PHY type: serial
of:fsl-of-rio ffec.rapidio: Hardware port width: 4
of:fsl-of-rio ffec.rapidio: Training connection status: Single-lane 0
of:fsl-of-rio ffec.rapidio: RapidIO Common Transport System size: 256
EIPWQBAR: 0x IPWQBAR: 0x1f107000
IPWMR: 0x00100120 IPWSR: 0x
RIO: enumerate master port 0, RIO0 mport
rio_enum_host: port `RIO0 mport`
fsl_local_config_write: index 0 offset 0068 data 
fsl_local_config_read: index 0 offset 0068
fsl_local_config_write: index 0 offset 0060 data 
master port device id set to 0 (next=0)
fsl_local_config_read: index 0 offset 000c
fsl_local_config_read: index 0 offset 0100
fsl_local_config_read: index 0 offset 0158
RIO: Rio network device created
rio_enable_rx_tx_port(local = 1, destid = 0, hopcount = 0, port_num = 0)
fsl_local_config_read: index 0 offset 000c
fsl_local_config_read: index 0 offset 0100
fsl_local_config_read: index 0 offset 015c
fsl_local_config_write: index 0 offset 015c data 4261
Enumerating rionet.
RIO: acquiring  device lock...
fsl_rio_config_read: index 0 destid 255 hopcount 0 offset 0068 len 4
RIO: fsl_rio_mcheck_exception - MC Exception handled. reason=0x8000
RIO: cfg_read error -14 for ff:0:68
RIO_LTLEDCSR = 0x0


Thanks!
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

2010-10-27 Thread pacman
Segher Boessenkool writes:
> 
> >> 1) Figure out what exactly is going on;
> >
> > I thought we were past that.
> 
> We are not.
> 
> > The startup sequence leaves the device in a
> > bad
> > state (writing 1000 times per second to memory that the kernel believes is
> > not in use), so it needs to be given a reset command before the kernel
> > tries
> > to use that memory.
> 
> The question now is what causes the firmware to do that, and then
> what is the best way to stop it from doing that.

As far as I can tell, it turns on the host controller during the global
probe, which is not wrong because USB devices could theoretically be used for
booting, or for console display. Then it never turns off the host controller
because someone forgot to put in the code to turn it off.

It's not easy to figure out exactly where that should have been done. Turning
off the host controller too soon would rule out booting from USB, but leaving
it running while the OS is starting up has caused a major problem.

So is it wrong to leave the host controller enabled when the OS is booted? If
not, then the error must be in the communication of which memory addresses
are in use by OF. I've got a node /mem...@0 whose "available" property looks
like this:
  0040
 00584000 0007c000
 0092a1d8 4e28
 00a2f000 005d1000
 0180 0e3fd000
 0fbffab4 054c
>From that list, it looks to me like OF is telling the kernel that it should
not attempt to use any address above 0xfbffab4+0x54c == 0xfc0. The
addresses being written to by the OHCI controller are 0xfc5c080 and
0xfc61080. If the kernel is staying within the "available" list, there won't
be a problem.

Later, when the kernel decides it's done using OF, what's supposed to happen?
It closes stdin, but that doesn't help here since the offending device is a
bus node, not an input node. It looks to me like the kernel makes the
assumption that all devices other than stdin and stdout will have been
deactivated already when the kernel starts, and that this assumption has
been violated. Who is wrong, from the perspective of the OF standard, the
assumer or the violator?

Then there's the "quiesce" call, which I don't understand at all since it's
not mentioned in any of the specification documents I've been able to find.
It's been mentioned as an Apple-only thing. Seems like it would be a good
name for a "make all the devices stop puking on the RAM" function. Since the
OF spec doesn't include this function, they must not have thought it was
necessary.

> > /p...@8000/u...@5/assigned-addresses
> >  02002810  8000  1000
> 
> Lovely, incorrect data (it should start with 82002810, i.e.,
> not relocatable -- it is already an assigned address!).

Now you see how I have trouble relating the docs to the reality...

> 
> This means: 32-bit MMIO address space for bus 0 dev 5 fn 0,
> first BAR; assigned to address 8000; size is 1000.

But "address 8000" is a physical address (I think), so do I need to do a
map-in on it before using it?

> 
> You could try a boot script like this:
> 
> 
> dev /pci
> 0 04 DO 0 i config-w! -100 +LOOP
> device-end
> 
> 
> which should disable all PCI devices on all busses, on that

Almost all of my devices are under that PCI node. What will I prove by
disabling them?

-- 
Alan Curry
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [REPOST] [PATCH 0/6] fixes and MPC8308 support for the mpc512x_dma driver

2010-10-27 Thread Ilya Yanok

Hi Piotr,

On 27.10.2010 11:24, Piotr Zięcik wrote:
Currently I am not able to deal with this as I am much involved in 
other development.


I see. Excuse me for disturbing you then.

Guys, anybody can review/test/pull these patches?

Regards, Ilya.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

2010-10-27 Thread Segher Boessenkool
>> 1) Figure out what exactly is going on;
>
> I thought we were past that.

We are not.

> The startup sequence leaves the device in a
> bad
> state (writing 1000 times per second to memory that the kernel believes is
> not in use), so it needs to be given a reset command before the kernel
> tries
> to use that memory.

The question now is what causes the firmware to do that, and then
what is the best way to stop it from doing that.

>> > The big question that I'm still stumbling over is how to access the
>> device
>> > registers. The "reg" property looks like this:
>>
>> You should look at "assigned-addresses", not "reg".  Well,
>> you first need to look at "reg" to figure out what entry
>> in "assigned-addresses" to use.

Ignore this part, I was confused.

> The properties look like this:
>
> /p...@8000/u...@5/assigned-addresses
>  02002810  8000  1000

Lovely, incorrect data (it should start with 82002810, i.e.,
not relocatable -- it is already an assigned address!).

This means: 32-bit MMIO address space for bus 0 dev 5 fn 0,
first BAR; assigned to address 8000; size is 1000.

You could try a boot script like this:


dev /pci
0 04 DO 0 i config-w! -100 +LOOP
device-end


which should disable all PCI devices on all busses, on that
PCI host bus (it disables every device behind pci-pci bridges
separately, as long as every such bridge has a higher secondary
bus number than primary bus number; if you only want to disable
everything on the root bus (which should be sufficient), use
ff04 instead of 04).


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

2010-10-27 Thread pacman
Segher Boessenkool writes:
> 
> >> > |1. How do I locate all usb nodes in the device tree?
> >> > |
> >> > |2. How do I know if a particular usb node is OHCI?
> 
> You look for compatible "usb-ohci".

There is no "compatible" there. I can probably use class-code since the
parent is a PCI bus.

> 
> But this doesn't help you.  You do not know yet if the
> problem happens for all usb-ohci; for example, it could be
> that you have the console output device on usb; or as another
> example, it could be that this firmware leaves all pci devices
> in some active state.
> 
> So as I see it you have only two options:
> 
> 1) Figure out what exactly is going on;

I thought we were past that. The startup sequence leaves the device in a bad
state (writing 1000 times per second to memory that the kernel believes is
not in use), so it needs to be given a reset command before the kernel tries
to use that memory.

> > The big question that I'm still stumbling over is how to access the device
> > registers. The "reg" property looks like this:
> 
> You should look at "assigned-addresses", not "reg".  Well,
> you first need to look at "reg" to figure out what entry
> in "assigned-addresses" to use.

The properties look like this:

/p...@8000/u...@5/assigned-addresses
 02002810  8000  1000
/p...@8000/u...@5/reg
 2800    
 02002810    1000

I'm not sure how I'm supposed to know which entry from "reg" is the right
one. I've been guessing that it's the second one, since that one matches the
only entry in "assigned-addresses". It's supposed to go the other direction?

-- 
Alan Curry
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

2010-10-27 Thread Segher Boessenkool
>> > |1. How do I locate all usb nodes in the device tree?
>> > |
>> > |2. How do I know if a particular usb node is OHCI?

You look for compatible "usb-ohci".

But this doesn't help you.  You do not know yet if the
problem happens for all usb-ohci; for example, it could be
that you have the console output device on usb; or as another
example, it could be that this firmware leaves all pci devices
in some active state.

So as I see it you have only two options:

1) Figure out what exactly is going on;
or 2) make the kernel shut down all pci devices early (either
in actual kernel code, or in an OF boot script).

> The big question that I'm still stumbling over is how to access the device
> registers. The "reg" property looks like this:

You should look at "assigned-addresses", not "reg".  Well,
you first need to look at "reg" to figure out what entry
in "assigned-addresses" to use.


Segher

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug,

2010-10-27 Thread pacman
Olaf Hering writes:
> 
> On Wed, Oct 27, pac...@kosh.dhis.org wrote:
> 
> > |1. How do I locate all usb nodes in the device tree?
> > |
> > |2. How do I know if a particular usb node is OHCI?
> 
> In the installed system, run 'lspci | grep -i usb', this gives the pci
> bus numbers.  Then run 'find /sys -name devspec', and look or the bus

Once the system is running, I have no problem figuring it out. What I meant
was how do I write some code to identify OHCI devices correctly, from within
the limited environment of the Forth interpreter, which will work in the
general case.

I already know that /p...@8000/u...@5 and /p...@8000/u...@5,1 are the
problem nodes on my machine. And I've learned enough about OF to do a full
recursive device tree search to find the USB nodes, so the first question is
answered.

But the UHCI and OHCI nodes look very much alike in the OF properties. "name"
is just "usb" and there's no "compatible".

The big question that I'm still stumbling over is how to access the device
registers. The "reg" property looks like this:
 phys size
 -- -
 2800    
 02002810    1000
so I take the second group of 5 words, which should be the device registers,
and try to map it to a virtual address. The members are unpacked on the stack
like this:
    02002810  1000
which looks like this stack diagram from OF spec:
  map-in ( phys.lo ... phys.hi size -- virt )
and the method call goes like this:
  " map-in" $call-parent
The result: "invalid pointer". But I notice it only popped 4 items. I think
maybe the "size" for map-in is not the same as the "size" found in the reg
property. Maybe #size-cells applies in one place but not the other. Thanks
for not documenting that! Try again:
    02002810 1000 " map-in" $call-parent
This one doesn't complain, but leaves me a 0 on the stack as its answer. The
OHCI registers have been mapped to virtual address 0? Doesn't seem likely.

-- 
Alan Curry
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] drivers/char/hvc_console.c: reduce max idle timeout

2010-10-27 Thread Chris Metcalf
On 10/27/2010 3:21 PM, Alan Cox wrote:
> On Wed, 27 Oct 2010 12:54:27 -0400 Chris Metcalf  wrote:
>> The tile architecture uses this framework for our serial console,
> That may be a mistake unless your console is genuinely only capable of
> polled input.

The console is in fact interrupt-driven within the hypervisor, and data is
buffered there.  However, the current hypervisor console API is only
"write" and "read".  We have a bugzilla to add console interrupts to the
hypervisor API and use them from Linux, but we haven't done it yet.

-- 
Chris Metcalf, Tilera Corp.
http://www.tilera.com

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] drivers/char/hvc_console.c: reduce max idle timeout

2010-10-27 Thread Alan Cox
On Wed, 27 Oct 2010 12:54:27 -0400
Chris Metcalf  wrote:

> The tile architecture uses this framework for our serial console,

That may be a mistake unless your console is genuinely only capable of
polled input.

Alan
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] powerpc/5121: pdm360ng: fix touch irq if 8xxx gpio driver is enabled

2010-10-27 Thread Grant Likely
On Sat, Sep 25, 2010 at 10:22:44PM +0200, Anatolij Gustschin wrote:
> On Wed, 15 Sep 2010 20:38:23 -0600
> Grant Likely  wrote:
> 
> > On Wed, Sep 15, 2010 at 10:12:57PM +0200, Anatolij Gustschin wrote:
> > > Enabling the MPC8xxx GPIO driver with MPC512x GPIO extension
> > > breaks touch screen support on this board since the GPIO
> > > interrupt will be mapped to 8xxx GPIO irq host resulting in
> > > a not requestable interrupt in the touch screen driver. Fix
> > > it by mapping the touch interrupt on 8xxx GPIO irq host.
> > 
> > This looks wrong to me.  The touchscreen code should not go mucking
> > about in the GPIO controller registers; that is the job of the gpio
> > driver.
> 
> But if there is no GPIO driver (as it was the case before adding
> mpc512x support in the 8xxx gpio driver) or if the driver is not
> enabled in the kernel configuration? Then the platform specific
> callback (called from touchscreen driver) returns the pin state
> and acknowlegdes the interrupt.

So, basically the touchscreen device node has an interrupts property
which does not use the gpio controller as the interrupt controller,
but instead points directly and the interrupt controller that the gpio
controller is cascaded from.

Really it sounds like the device tree data is broken.  The preferred
solution is be to fix the device tree to declare the gpio node as
an interrupt controller.

> 
> >  What is the reason that the touch interrupt isn't
> > requestable?
> 
> The 8xxx gpio driver sets up gpio irq host and installs
> the chained irq handler for GPIO interrupt 78 using
> set_irq_chained_handler() which sets the status field of
> the irq_desc structure to IRQ_NOREQUEST | IRQ_NOPROBE.
> Other drivers can't request this GPIO interrupt any more,
> request_threaded_irq() checks the IRQ_NOREQUEST status
> flag and returns -EINVAL if it is set. The gpio interrupts
> for each gpio pin are now handled by the
> mpc8xxx_gpio_irq_cascade() handler as they should.
> 
> >  It looks like the 8xxx gpio driver is designed to hand
> > out a separate virq number for each gpio pin (I've not had time to dig
> > into details, so you'll need to educate me on the problem details)
> 
> Yes, exactly. This patch adds code to request the
> board's pen_down gpio pin and to use it's virq number in
> the touchscreen driver. The touchscreen driver can
> request this virq interrupt and it is now properly handled
> by the chained handler in the gpio driver.

...but it does so by hard coding the irq number (via the GPIO number)
into the board code; a situation which I've tried very hard to avoid.
This is what I'm not okay with.

Another solution is to modify the 8xxx gpio driver to cascade off the
normal request_irq() path instead of via set_irq_chained_handler(),
but that *might* have unacceptable performance impact for other users.

Unfortunately, I've been slow on this patch, so I cannot get anything
into 2.6.37 (sorry).  However, I've not asked Linus to pull the 8xxx
gpio driver changes either so nothing in mainline will get broken.

g.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55)

2010-10-27 Thread Benjamin Herrenschmidt

> Since then, the silence has been deafening.
> 
> My assumption now is that this is not ever getting fixed. I'm certainly not
> able to fix it. I'm not a even kernel programmer! I got far enough to
> diagnose the cause just with the "add more printk's and boot it again"
> technique. Hundreds of reboots trying to figure it out. I was a conscientious
> bug-reporter, I thought.

I'm happy to help you fix it but I'm travelling at the moment and won't
have much time for a couple of weeks.

Cheers,
Ben.

> I could pull the PCI card and be done with it. I never used those USB ports
> anyway. But after all the suffering I went through to find this bug... the
> crashing e2fsck's and consequent filesystem corruption... I hate the idea of
> surrendering to it. There are possibly other affected users who I'd be
> abandoning to suffer similarly in the future.
> 
> For the last week I've studied OpenFirmware as hard as I can. I read the spec
> cover to cover. And the USB annex, and the PCI annex. But I'm still lost in
> all the different address formats.
> 
> I took my best guess on how to handle this problem, and ran with it, ending
> up with a 97-line Forth script, and that was just to get a virtual address,
> not to actually do anything with it, and it used a hardcoded device path. But
> it didn't work, all I got was an "invalid pointer" error. I made another
> guess at something that wasn't documented anywhere (the fact that this stuff
> is insufficiently documented is the one thing I can state with complete
> confidence!) and out came a successful translation to a virtual address: 0.
> 
> If I'm the only one fighting this bug, the bug wins.
> 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: Freescale P2020/ 85xx PCIe: DMA low throughtput

2010-10-27 Thread Jenkins, Clive
>   Hi, 
>   
>   I'm working on bring up for a new board based on
Freescales p2020.
>   I have a programmable FPGA as a PCIe device with a
buffer I can
>   write to and from.
>   I want to test  performence for the PCIe bus. 
>   I encountered a problem while doing a DMA between the
FPGA & DDR. 
>   The whole buffer  moves  to and from  the device  with
out
>   mismatches but with low throughtput. 
>   The thing is that the buffer divided to many
transactions of byte
>   size instead of transferring it in a burst. 
>   I must mention that even a buffer of word size, divided
in to byte
>   transactions by the DMA (the core can read a word so it
seems like
>   the DMA fault.
>   I tried to change the latency timer, max latency, min
latency and
>   cache line in the configuration space of both sides of
the pcie
>   bus. It didn't help.
>   Do you have an idea what can it be? 
>   
>   Thanks,
>   Natalie. 
>   
>
>   
>   Assuming the P2020 has the usual 85xx-style DMA engine, you may
have
>   the Band Width Control cleared to 0. This 4-bit field (BWC)
restricts
>   the transfer size to 2^BWC bytes, for BWC=0,1,..0xa. 0xb-0xe are
>   reserved. 0xf disables bandwidth sharing to allow uninterrupted
>   transfers from each channel, so if you are using several
channels
>   one channel can completely lock out other channels. BWC=0x8 at
reset
>   (2^8 = 256 bytes). See the P2020 manual for more details.
>   
>   BWC is the field with mask 0x0f00 in the MR (Master Reset)
>   register for the channel (0, 1, 2, 3), at offset 0x100, 0x180,
0x200,
>   0x280 relative to the base of the DMA controller.
>   
>   Clive
>   
>   
> 
>
> Hi, Thanks.
> I changed the BWC but  the transactions are still in a byte size
instead
> of burst. Do you have another idea?
>
> Natalie.

Sorry, no.

Are you sure you have modified the FSL-DMA driver in the kernel so it
does not
write zero to BWC?

Clive
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55)

2010-10-27 Thread Olaf Hering
On Wed, Oct 27, pac...@kosh.dhis.org wrote:

> |1. How do I locate all usb nodes in the device tree?
> |
> |2. How do I know if a particular usb node is OHCI?

In the installed system, run 'lspci | grep -i usb', this gives the pci
bus numbers.  Then run 'find /sys -name devspec', and look or the bus
numbers from the lspci output.  Each devspec file contains the firmware
path.  The ohci node may have subdirectories. Run 'words' in each of
them at the firmware prompt. Perhaps there is one to shutdown the
controller?

I just noticed older firmware did not have a node for ohci, newer ones
my have a /p...@8000/u...@5 node.

Good luck.

Olaf
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: Freescale P2020/ 85xx PCIe: DMA low throughtput

2010-10-27 Thread Natalie Shapira

Jenkins, Clive wrote:
Hi, 


I'm working on bring up for a new board based on Freescales p2020.
I have a programmable FPGA as a PCIe device with a buffer I can
write to and from.
I want to test  performence for the PCIe bus. 
I encountered a problem while doing a DMA between the FPGA & DDR. 
The whole buffer  moves  to and from  the device  with out
mismatches but with low throughtput. 
The thing is that the buffer divided to many transactions of byte
size instead of transferring it in a burst. 
I must mention that even a buffer of word size, divided in to byte

transactions by the DMA (the core can read a word so it seems like
the DMA fault.
I tried to change the latency timer, max latency, min latency and
cache line in the configuration space of both sides of the pcie
bus. It didn't help.
Do you have an idea what can it be? 


Thanks,
Natalie. 



Assuming the P2020 has the usual 85xx-style DMA engine, you may have
the Band Width Control cleared to 0. This 4-bit field (BWC) restricts
the transfer size to 2^BWC bytes, for BWC=0,1,..0xa. 0xb-0xe are
reserved. 0xf disables bandwidth sharing to allow uninterrupted
transfers from each channel, so if you are using several channels
one channel can completely lock out other channels. BWC=0x8 at reset
(2^8 = 256 bytes). See the P2020 manual for more details.

BWC is the field with mask 0x0f00 in the MR (Master Reset)
register for the channel (0, 1, 2, 3), at offset 0x100, 0x180, 0x200,
0x280 relative to the base of the DMA controller.

Clive


  

Hi, Thanks.
I changed the BWC but the transactions are still in a byte size instead 
of burst.

Do you have another idea?

Natalie.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Pegasos OHCI bug (was Re: PROBLEM: memory corrupting bug, bisected to 6dda9d55)

2010-10-27 Thread pacman
Benjamin Herrenschmidt writes:
> 
> Ok so you'll have to make up a "workaround" in prom_init that looks for
> OHCI's in the device-tree and disable them.
> 
> Check if the OHCI node has some existing f-code words you can use for
> that with "dev /path-to-ohci words" in OF for example. If not, you may
> need to use the low level register accessors. Use OF client interface
> "interpret" to run forth code from C.

I responded with a long list of reasons that I'm not qualified to do that
work myself:
|Here are the major problems:
|
|1. How do I locate all usb nodes in the device tree?
|
|2. How do I know if a particular usb node is OHCI?
|
|3. Knowing that a node is OHCI, how do I know where its control registers
|are? I'm sure this is calculated from the "reg" property but I don't see how.
|
|4. Knowing where the control registers are, how do I access them? Do I need
|to request a virt-to-phys mapping or can I assume that it's already mapped,
|or that the "rl!" command will do the right thing with a physical address?
|
|5. Which control register should I use to tell the OHCI to be quiet? Just do
|a general reset, or is there something that specifically turns off the
|counter that's been causing the trouble?

Since then, the silence has been deafening.

My assumption now is that this is not ever getting fixed. I'm certainly not
able to fix it. I'm not a even kernel programmer! I got far enough to
diagnose the cause just with the "add more printk's and boot it again"
technique. Hundreds of reboots trying to figure it out. I was a conscientious
bug-reporter, I thought.

I could pull the PCI card and be done with it. I never used those USB ports
anyway. But after all the suffering I went through to find this bug... the
crashing e2fsck's and consequent filesystem corruption... I hate the idea of
surrendering to it. There are possibly other affected users who I'd be
abandoning to suffer similarly in the future.

For the last week I've studied OpenFirmware as hard as I can. I read the spec
cover to cover. And the USB annex, and the PCI annex. But I'm still lost in
all the different address formats.

I took my best guess on how to handle this problem, and ran with it, ending
up with a 97-line Forth script, and that was just to get a virtual address,
not to actually do anything with it, and it used a hardcoded device path. But
it didn't work, all I got was an "invalid pointer" error. I made another
guess at something that wasn't documented anywhere (the fact that this stuff
is insufficiently documented is the one thing I can state with complete
confidence!) and out came a successful translation to a virtual address: 0.

If I'm the only one fighting this bug, the bug wins.

-- 
Alan Curry
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [REPOST] [PATCH 0/6] fixes and MPC8308 support for the mpc512x_dma driver

2010-10-27 Thread Piotr Zięcik
On Wednesday 27 October 2010 01:52:54 Ilya Yanok wrote:
> Hello everybody,
> 
> meanwhile I've fixed one more issue in mpc512x_dma driver.
> 
> Any comments? Anybody interrested in this driver? Piotr?
> 
> Still unsure how to deal with bitfield structures in IO space...
> 

Currently I am not able to deal with this as I am much involved in other 
development.

-- 
Best Regards,
Piotr Zięcik.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev