Just one additional comment in line below:
On 11/13/2008 1:44 PM, Duncan Murdoch wrote:
On 11/13/2008 11:51 AM, Simon Urbanek wrote:
Duncan,
I had a quick look at the parsers differences and I'm worried about
points 1. and 2. (on p.6) -- does that imply that \R{} is illegal and
so is any \foo{} for any macro \foo that doesn't take any arguments?
IMHO that would be fatal (if I understand it correctly), since that
construct is very often used (and I know of no alternatives) in cases
where you are referencing a macro that is followed by something that
is not a space. E.g.: 1\foo{}2 cannot be written as 1\foo2 as per 6.
so if \foo{} is disallowed there is no way to call \foo between 1 and
2 when you don't want any spaces to be generated).
Maybe I'm just interpreting is incorrectly, so I just wanted to point
out that issue.
Thanks for the comment. You are interpreting it correctly, and that is
something that probably needs to change.
The reasoning behind the current choice is that macros with optional
arguments are ambiguous: for example, in R code, {} might be part of
the code, not something for the Rd parser. We currently have \eqn and
\deqn that have one or two args, but they're not going to occur in R
code, so things currently work. (But if you want to see ugly Bison
coding, look at how those VERBMACRO2 macros are handled. The Rd format
is not easy to parse, being a mix of latex-like stuff, R code, and just
about anything else in verbatim sections.)
So I'd really strongly prefer to say that \foo *always* requires an arg,
rather than let it be optional, if there are circumstances where it
needs one.
If we say that \foo never takes an arg, we'll need a way to distinguish
between the following space being significant or not. One way is to
allow {} or some other marker that signals a break without inserting
anything, and is only interpreted in Latex-like mode. Another way (that
I prefer) is described below.
I should say that allowing {} to immediately follow one of the 5 no-arg
macros, and having it gobbled up by the lexer, would be relatively easy
to implement. So then the two examples below could be coded as
"1\dots{}10" versus "1\dots 10", which is I think what you were asking
for. I have a mild preference for adding \sp (I don't like special
cases), but not a strong one.
Duncan Murdoch
We could relax things a lot, and allow balanced braces as no-ops in
Latex-like mode, but that will miss some typos. I fixed typos in 10
files in r46908, and at least one of those was caught this way, in
methods/man/Classes.Rd. It would also introduce an ambiguity, because
\eqn and \deqn *are* going to occur in Latex-like mode. So
\eqn{foo}{}bar
could be either the two-arg version or the one-arg version followed by a
no-op before the bar. (The default handling in Bison is that it would
be the two-op version.) And I think it would be tricky to write the
parser so that {} was handled differently in Latex-like mode from the
way it's handled in the other modes. (The other modes count braces and
echo them out.)
There are currently only 5 macros which take no args: \cr, \dots,
\ldots, \R, and \tab. I think the issue will only arise with \dots and
\ldots. So my preferred decision would be to push this up a level:
when the code is interpreted, \dots and \ldots are not followed by a
space. To allow for a user who wants a space, we should introduce a 6th
no-argument macro, \sp. Then "1\dots 10" will be rendered as "1...10"
and "1\dots\sp 10" will be rendered as "1... 10".
Duncan Murdoch
Thanks,
Simon
On Nov 13, 2008, at 11:02 , Duncan Murdoch wrote:
I've just committed the parse_Rd() function to R-devel. This is a
parser for Rd files, described in
http://developer.r-project.org/parseRd.pdf
It is not identical to the current parser, and about a dozen of the
base man pages currently signal syntax errors. It also detected
errors in 10 files that were errors according to both definitions,
but were missed by the current system, and I've already fixed
those. I plan to patch the rest so that they work in both systems
soon. The differences between the two systems are described in the
document above.
I would like to hear comments about the changes -- some of them are
still optional. I will be continuing to work on support functions
for the parser, e.g. the print routine is currently quite primitive.
I expect there may be incompatibilities with platforms on which I
haven't tested. I developed the parser on Windows, and have tested
it on a Linux system. There may be problems handling Rd files with
unusual encodings (UTF-8 and Latin1 should be supported, but I don't
know about others, and haven't even tested those yet).
Duncan Murdoch
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel