Re: [PATCH RFC v2] Add support for Janz ICAN3 CAN module

Ira W. Snyder Tue, 16 Feb 2010 15:47:42 -0800

On Tue, Feb 16, 2010 at 11:20:26PM +0100, Wolfgang Grandegger wrote:
> Ira W. Snyder wrote:
> > On Mon, Feb 15, 2010 at 06:13:11PM +0100, Wolfgang Grandegger wrote:
> >> Hi Ira,
> >>
> >> Ira W. Snyder wrote:
> 
> >>> +/* Get the MSYNC bits from the "old-style" interface control registers */
> >>> +static void janz_get_msync(struct janz_ican3 *mod, u8 *locl, u8 *peer)
> >>> +{
> >>> + janz_set_page(mod, 0);
> >>> + *peer = ioread8(mod->regs + MSYNC_PEER);
> >>> + *locl = ioread8(mod->regs + MSYNC_LOCL);
> >>> +}
> >> What are your arguments against using structures to describe the
> >> register layout?
> >>
> > 
> > I don't have any strong arguments against using a structure to describe
> > register layout, except:
> > 
> > 1) all datasheets I've ever used have registers described by hexadecimal
> > offsets. Using a structure, you have to manually calculate offsets when
> > checking that the structure is defined correctly. Once it is correct,
> > this isn't a big deal.
> 
> In general structs are preferred as they allow type checking and result
> usually in more readable code. #defines are more handy if you deal with
> offset instead of addresses.
> 
> > 2) These registers only exist in a single DPM page, page #0. They are
> > not "always accessible", as in a device which uses mmio. You must switch
> > the DPM page before accessing them.
> > 
> > Does it really make sense to create a structure just for calculating the
> > offsets in one DPM page? For the "janz onboard registers" (described
> > with "JANZ_OB_* defines), I agree. I'll make a structure for those.
> 
> Please check if it results in better code.
> 
> [snip]


Ok, rethinking my earlier decision for the "onboard registers", I think
that a structure makes things much harder to read. The registers have
completely different meaning depending on the type of access. Meaning,
reads have one meaning, and writes have another. This cannot be
expressed with structures.

For example, the register at offset 0x3.

Read: MODULbus number (the value from the hex switch on the board)
Write: MODULbus interrupt enable (enables a certain daughterboard's
interrupt line)

Those are two completely different functions. If I were to use a
structure, one of those meanings will be lost in the name.

It is much more clear that this code is correct, and reads the hex
switch:

/* read the hexadecimal switch */
hex = ioread8(regs + JANZ_OB_MBUS_NUM);

Than this:

/* read the hexadecimal switch */
hex = ioread8(&regs->interrupt_enable);

I'd bet that someone will question the latter code example. The naming
just seems wrong for the function the comment describes.

> >>> + if ((xord & 0x30) == 0x30) {
> >>> +         dev_err(mod->dev, "no mbox for writing\n");
> >>> +         return -ENOMEM;
> >>> + }
> >> Here and in many other places macro definitions would make the code more
> >> readable.
> >>
> > 
> > Yep, There are a few of these. My problem is this: what names do I use?
> > The way this device works is stupid. You'll want to read pages C-333 to
> > C-335 in the manual.
> > 
> > The basics are:
> > You have two registers, MSYNC_PEER and MSYNC_LOCL. The firmware can
> > only write to MSYNC_PEER, and the driver can only write to MSYNC_LOCL.
> > 
> > The registers are laid out like this:
> > 
> > MSYNC_PEER:
> > bit0: RB0
> > bit1: RB1
> > bit2: LW
> > bit4: WB0
> > bit5: WB1
> > 
> > MSYNC_LOCL:
> > bit0: RB0
> > bit1: RB1
> > bit4: WB0
> > bit5: WB1
> > bit6: LW
> > 
> > When you get an interrupt, and you want to read a message from the
> > controller, you must XOR the RBn bits. If you get a 1 back for a certain
> > bit position, then you have a buffer to read. The same is true for the
> > WBn bits, except that the XOR will be 0 when you have a buffer to read.
> > 
> > In addition, you must read/set the LW (last written) bit to correspond
> > to the buffer you last wrote to.
> > 
> > I would be happy to use defines to make this easier to read, but I have
> > *no idea* what to name them. Suggestions please.
> 
> #define MSYNC_PEER_RB0 0x01
> #define MSYNC_PEER_RB1 0x02
> 
> and so on.
> 
> BTW: an inline function for that bit gymnastics would be useful as well.
> 
> [snip]
> >>> + memset(&msg, 0, sizeof(msg));
> >>> + msg.control = 0x00;
> >>> + msg.spec    = MSG_INITFDPMQUEUE;
> >>> + msg.len     = cpu_to_le16(8);
> >> Please do not align expressions. Just use *one* space before and after
> >> "=". Please fix globally.
> >>
> > 
> > Ok. This will be in the next version of the patch. I thought it made the
> > code easier to read.
> 
> It's required by the coding style.
> 
> >>> +static void can_to_janz(struct janz_ican3 *mod, struct can_frame *cf,
> >>> +                 struct janz_fast_desc *desc)
> >> Use a better name please.
> >>
> > 
> > I'll try to think something up. All these two functions do is convert a
> > struct can_frame to the appropriate structure for the firmware, and
> > back. How about "janz_convert_tocan() and janz_convert_fromcan()"?
> 
> can_frame_to_janz and janz_to_can_frame would already be fine. "can" is
> to general.
> 
> [snip]
> >>> +
> >>> + cf->can_id |= idflags;
> >>> + cf->data[1] = d1;
> >> Hm, the data field to be used depends on the error type.
> >>
> > 
> > This code is copied from esd_pci331.c. If I don't have a good example,
> > how can I be expected to do things correctly!
> > 
> > How do I find out exactly which field belongs to which data type?
> 
> See include/linux/can/error.h
> 
> >>> +
> >>> + netif_rx(skb);
> >>> +
> >>> + stats->rx_packets++;
> >>> + stats->rx_bytes += cf->can_dlc;
> >>> + return 0;
> >>> +}
> 
> [snip]
> >>> + rxerr = msg->data[4];
> >>> + txerr = msg->data[5];
> >> Should go to field 6 and 7.
> >>
> > 
> > You're confused. These come from the firmware, they are not going into a
> > struct can_frame. These are the buffer locations in the message sent by
> > the firmware for rx/tx error counter registers.
> 
> Right. Sorry for the noise.
> 
> >>> +
> >>> + /* state is error-active by default */
> >>> + state = CAN_STATE_ERROR_ACTIVE;
> >>> +
> >>> + if (rxerr >= 96 || txerr >= 96)
> >>> +         state = CAN_STATE_ERROR_WARNING;
> >>> +
> >>> + if (rxerr >= 128 || txerr >= 128)
> >>> +         state = CAN_STATE_ERROR_PASSIVE;
> >>> +
> >>> + if (rxerr >= 255 || txerr >= 255)
> >>> +         state = CAN_STATE_BUS_OFF;
> >> You could use "else if" if you revert the order.
> > 
> > Ok, I'll try that in the next version.
> > 
> >> Also, 255 does not yet
> >> mean bus-error, strictly speaking. Have you seen the device going bus-off?
> >>
> > 
> > Well, these are byte-sized registers. They can never exceed 255 (0xff),
> > they would exceed their bit-width. What *should* I do here?
> > 
> > I don't think I've seen the device go bus-off. How would I cause that
> > situation?
> 
> Short-circuit the CAN low and high wires and send a message. Do you get
> any notification?
> 

Yep, this gave the following:

r...@labslcor3 ~/socketcan/can-utils # ./candump -t d any,0:0,#FFFFFFFF
 (0.000000)  can0  20000004  [8] 00 20 00 00 00 00 00 00   ERRORFRAME
 (0.000000)  can0  20000004  [8] 00 08 00 00 00 00 00 00   ERRORFRAME

In the other window:

r...@labslcor3 ~/socketcan/can-utils # ./cansend can0 5a1#aabbccdd

After I get these two messages, the controller doesn't send anything
else until I reset it. I'll dig through the datasheet and see if there
is something else I need to enable to get more/better error messages.

I tried enabling the "bus-error" feature, but it does not change the
messages the firmware sends to me, both without a cable connected, and
with the CAN low and high wires shorted together.

> >>> +
> >>> + /* check if we should generate an error frame at all */
> >>> + if (state == mod->can.state || mod->can.state == CAN_STATE_STOPPED) {
> >>> +         dev_dbg(mod->dev, "no error frame needed: state %d\n", state);
> >>> +         return;
> >>> + }
> >> Shouldn't this check be done first?
> >>
> > 
> > The variable "state" isn't set then. This code is basically copied from
> > esd_pci331.c, but I moved the checking of priv->can.state out of each
> > 'case', into one place.
> > 
> >>> + dev_dbg(mod->dev, "state change: state %d\n", state);
> >>> + mod->can.state = state;
> >>> +
> >>> + if (state == CAN_STATE_BUS_OFF) {
> >>> +         janz_err_frame(mod, CAN_ERR_BUSOFF, 0);
> >>> +         return;
> >>> + }
> >>> +
> >>> + if (state == CAN_STATE_ERROR_PASSIVE) {
> >>> +         err = (rxerr >= 128) ? CAN_ERR_CRTL_RX_PASSIVE
> >>> +                              : CAN_ERR_CRTL_TX_PASSIVE;
> >>> +         janz_err_frame(mod, CAN_ERR_CRTL, err);
> >>> +         return;
> >>> + }
> >>> +
> >>> + if (state == CAN_STATE_ERROR_WARNING) {
> >>> +         err = (rxerr >= 96) ? CAN_ERR_CRTL_RX_WARNING
> >>> +                             : CAN_ERR_CRTL_TX_WARNING;
> >>> +         janz_err_frame(mod, CAN_ERR_CRTL, err);
> >>> +         return;
> >>> + }
> >> If you use "else if" as suggested, the code could be shortened a lot, I
> >> think (by doing everything within the if/else block).
> >>
> >> Just for curiosity, what does "candump -t d any,0:0,#FFFFFFFF" report
> >> when you trigger a bus-error or when you send a message with no cable
> >> connected.
> >>
> > 
> > Nothing. The device just eventually runs out of send buffers when no
> > cable is connected. It doesn't send any error messages.
> 
> Hm, if you do not get any notification this code is useless.
> 

Actually, I must have been running something incorrectly. See the output
from candump provided above. I now get this in both cases:
1) no cable connected
2) CAN high and low wires shorted together

> >>> + /* nothing needed for error-active state */
> >>> +}
> >>> +
> >>> +static void janz_handle_unknown(struct janz_ican3 *mod, struct janz_msg 
> >>> *msg)
> >> Handle what?
> >>
> > 
> > The firmware has about 50 different messages it can send to the driver.
> > I only handled the few that we expect to see. I put this in place so we
> > get a log of message types that the driver doesn't handle yet.
> > 
> > Should I remove it, and just drop all messages we don't handle without
> > printing anything? Seems like a hindrance to debugging problems.
> 
> I have just a problem with the name, which is not specific enough. A
> better name would be "janz_handle_unknown_message".
> 
> [snip]
> >>> + switch (msg->spec) {
> >>> + case MSG_IDVERS:
> >>> +         janz_handle_idvers(mod, msg);
> >>> +         break;
> >>> + case MSG_MSGLOST:
> >>> + case MSG_FMSGLOST:
> >>> +         janz_handle_msglost(mod, msg);
> >>> +         break;
> >> You shoud create error messages for msglost as well.
> >>
> > 
> > Ok. What kind of messages? This message means that the driver is not
> > reading CAN frames fast enough, and the firmware ran out of locations to
> > store them. It is exactly the same as this driver's xmit() routine
> > running out of buffers to place CAN messages for sending.
> 
> CAN_ERR_CRTL_RX_OVERFLOW and CAN_ERR_CRTL_RX_OVERFLOW from
> http://lxr.linux.no/#linux+v2.6.32/include/linux/can/error.h.
> 
> [snip]
> >>> +         /* write back the control bits with IVALID unset */
> >>> +         control &= ~DESC_IVALID;
> >>> +         iowrite8(control, addr);
> >> This seems to be duplicated code. Here a helper function would make
> >> sense in contrast to your *one* line functions, e.g. to enable the
> >> interrupts, which just increases code size.
> >>
> > 
> > The one-liners are marked inline, so they should get compiled to
> > io(read|write)8(), without any function call overhead. I used them
> > because their *names* are descriptive.
> 
> You do that in a few cases but in many other you use ioread8() directly.
> I'm not worried about overhead.
> 
> > It is much easier to tell that this clears interrupts:
> > janz_clr_int(mod);
> > 
> > Than this:
> > ioread8(mod->regs + DPM_REG_INT);
> > 
> > Don't you think?
> 
> /* Clear interrupt */
> ioread8(mod->regs + DPM_REG_INT);
> 
> does make pretty clear what the call does and I do not need to lookup
> how it's implemented. If the interrupt is cleared in more than one place
> or if it's a complex action, a function is useful, of course.
> 
> > I'm not sure how to clearly function-ize the clearing of the IVALID bit
> > in descriptors, but I'll try for the next version.
> 
> Yes, a set of functions would be handy, I think.
> 
> [snip]
> >>> + /* bring the bus online */
> >>> + ret = janz_set_bus_state(mod, true);
> >>> + if (ret) {
> >>> +         dev_err(mod->dev, "unable to set bus-on\n");
> >>> +         return ret;
> >>> + }
> >> How is bus-off recovery supposed to work? In general, if the card/hw
> >> recovery automatically, we use the following procedure:
> >>
> >> restart_ms == 0: the device should be *stopped* on bus-off allowing
> >>                  to user/app to restart it manually using this function.
> >>
> >> restart_ms  > 0: the device is allowed to recover from bus-off
> >>                  automatically.
> >>
> >> Could that be implemented?
> >>
> > 
> > The datasheet doesn't mention the word "restart" or "recover". I guess
> > it doesn't recover automatically. Again, this code was copied from the
> > esd_pci331.c driver. Is that driver's error recovery broken? What should
> > I be doing here?
> 
> Makes sense as the SJA1000 does not recover automatically. But then some
> kind of notification should be send otherwise the app does not known
> what's going on.
> 

What kind of notification should be sent? Are the error messages
provided in the candump output above sufficient?

> >>> + /* disable our IRQ, then hookup the IRQ handler */
> >>> + janz_disable_interrupts(mod);
> >>> + ret = request_irq(mod->irq, janz_irq, IRQF_SHARED, DRV_NAME, mod);
> >>> + if (ret) {
> >>> +         dev_err(dev, "unable to request IRQ\n");
> >>> +         goto out_iounmap_ctrl;
> >>> + }
> >> Is this interrupt exclisively for CAN? ... or do you need a dispatcher
> >> in the MODULbus driver?
> >>
> > 
> > The interrupt is a function of the PCI bridge chip on the CMOD-IO
> > carrier board. However, it uses a "write-1 to enable" and "write-1 to
> > disable" interface, meaning each MODULbus module can have their own
> > interrupt handler. The interrupt handler must check that its module is
> > interrupting, just like any handler on a shared IRQ line.
> > 
> > I don't need a dispatcher, but each module does need to know it's module
> > number. See the janz_irq() function, and notice how I use a bitwise and
> > with (1 << mod->num) to determine if this module is interrupting.
> > 
> > AFAIK, Linux always calls all interrupt handlers attached to a shared
> > interrupt line.
> 
> OK, that's fine.
> 

Thanks again for the review. I've started incorporating your comments,
and I'll have another version soon.

Ira
_______________________________________________
Socketcan-core mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/socketcan-core

Re: [PATCH RFC v2] Add support for Janz ICAN3 CAN module

Reply via email to