Hi Bram, Nicolai, I'm unable to grasp the description ( attachment ) given in the regxp.c file. For a moment they seem like NFA fragments for operators |,+,* and so on, but then again I'm in doubt ( specially i don't understand what a node in this context is ). A little help is greatly appreciated ( perhaps a pointer to some other information ). I believe this is a very simple thing, sorry for my incompetence...
- Asiri
/* * Structure for regexp "program". This is essentially a linear encoding * of a nondeterministic finite-state machine (aka syntax charts or * "railroad normal form" in parsing technology). Each node is an opcode * plus a "next" pointer, possibly plus an operand. "Next" pointers of * all nodes except BRANCH and BRACES_COMPLEX implement concatenation; a "next" * pointer with a BRANCH on both ends of it is connecting two alternatives. * (Here we have one of the subtle syntax dependencies: an individual BRANCH * (as opposed to a collection of them) is never concatenated with anything * because of operator precedence). The "next" pointer of a BRACES_COMPLEX * node points to the node after the stuff to be repeated. * The operand of some types of node is a literal string; for others, it is a * node leading into a sub-FSM. In particular, the operand of a BRANCH node * is the first node of the branch. * (NB this is *not* a tree structure: the tail of the branch connects to the * thing following the set of BRANCHes.) * * pattern is coded like: * * +-----------------+ * | V * <aa>\|<bb> BRANCH <aa> BRANCH <bb> --> END * | ^ | ^ * +------+ +----------+ * * * +------------------+ * V | * <aa>* BRANCH BRANCH <aa> --> BACK BRANCH --> NOTHING --> END * | | ^ ^ * | +---------------+ | * +---------------------------------------------+ * * * +----------------------+ * V | * <aa>\+ BRANCH <aa> --> BRANCH --> BACK BRANCH --> NOTHING --> END * | | ^ ^ * | +-----------+ | * +--------------------------------------------------+ * * * +-------------------------+ * V | * <aa>\{} BRANCH BRACE_LIMITS --> BRACE_COMPLEX <aa> --> BACK END * | | ^ * | +----------------+ * +-----------------------------------------------+ * * * <aa>[EMAIL PROTECTED]<bb> BRANCH NOMATCH <aa> --> END <bb> --> END * | | ^ ^ * | +----------------+ | * +--------------------------------+ * * +---------+ * | V * \z[abc] BRANCH BRANCH a BRANCH b BRANCH c BRANCH NOTHING --> END * | | | | ^ ^ * | | | +-----+ | * | | +----------------+ | * | +---------------------------+ | * +------------------------------------------------------+ * * They all start with a BRANCH for "\|" alternaties, even when there is only * one alternative. */