Hi,
as a way to learn J, I'm working on a sample project, implementing a C/C++
preprocessor in J. Being not very familiar with J libraries, I tend to
implement everything in J directly; fortunately the task doesn't seem to be too
large.
However, this area seem to be somewhat complicated for parallel approach which
is exhibited by array-oriented language. The input is read left to right, and
the processing of subsequent parts depends on results of processing of previous
parts. One of preprocessor parts ought to be the extraction of preprocessing
tokens - which may be header-names, identifiers, pp-numbers, char-constants,
string-literals, punctuators or non-white-spaces.
Suppose the program reads a character, determines the type of the pp-token, and
then it needs to complete reading that token. This is a rather typical task.
How to read a string literal, after initial " was read? It might be possible -
assuming the whole file is in memory - to do this, thinking in vectors, but
that will probably be too slow; a file may be huge, comparing to the length of
the literal. So, a char-by-char solution is sought. It may be done using ^:_
construct, with properly chosen arguments. The problems are associated with
escape sequences - like \a or \t, which need to be replaced by corresponding
characters, or \\, \", \q, which need to be replaced with \, ", q
correspondingly.
x u ^:_ y means x u x u x u ... x u y, with iterations going on until result
stops changing. If the input file, as a string, is y, then the x u y has to be
another string, a different one, otherwise iterations will stop. It's not clear
how to handle escape sequences - we have to also maintain the state of
processing, something like, at least, a flag which says the \ was read. At this
point the approach to have x as the input file and y as a vector of integers,
describing the position in the input and escape state, may be tried. Also, the
input may suddenly end, which is a syntax error, but still has to be handled
gracefully.
Sequential machine also doesn't seem to be quite a tool. First, it's not clear
how to construct the input for that; it's either a manual effort, which seem to
be hard to debug, or a program, which converts some other notation to the
proper argument. Second, the sudden EOF issue is still not addressed.
What does it mean? Just a task, which is not very good for J? Or some libraries
are practically required for tasks like this? Or I'm missing some solution?
Alexander
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm