Just one additional comment in line below:

On 11/13/2008 1:44 PM, Duncan Murdoch wrote:
On 11/13/2008 11:51 AM, Simon Urbanek wrote:
Duncan,

I had a quick look at the parsers differences and I'm worried about points 1. and 2. (on p.6) -- does that imply that \R{} is illegal and so is any \foo{} for any macro \foo that doesn't take any arguments? IMHO that would be fatal (if I understand it correctly), since that construct is very often used (and I know of no alternatives) in cases where you are referencing a macro that is followed by something that is not a space. E.g.: 1\foo{}2 cannot be written as 1\foo2 as per 6. so if \foo{} is disallowed there is no way to call \foo between 1 and 2 when you don't want any spaces to be generated). Maybe I'm just interpreting is incorrectly, so I just wanted to point out that issue.

Thanks for the comment. You are interpreting it correctly, and that is something that probably needs to change.

The reasoning behind the current choice is that macros with optional arguments are ambiguous: for example, in R code, {} might be part of the code, not something for the Rd parser. We currently have \eqn and \deqn that have one or two args, but they're not going to occur in R code, so things currently work. (But if you want to see ugly Bison coding, look at how those VERBMACRO2 macros are handled. The Rd format is not easy to parse, being a mix of latex-like stuff, R code, and just about anything else in verbatim sections.)

So I'd really strongly prefer to say that \foo *always* requires an arg, rather than let it be optional, if there are circumstances where it needs one.

If we say that \foo never takes an arg, we'll need a way to distinguish between the following space being significant or not. One way is to allow {} or some other marker that signals a break without inserting anything, and is only interpreted in Latex-like mode. Another way (that I prefer) is described below.

I should say that allowing {} to immediately follow one of the 5 no-arg macros, and having it gobbled up by the lexer, would be relatively easy to implement. So then the two examples below could be coded as "1\dots{}10" versus "1\dots 10", which is I think what you were asking for. I have a mild preference for adding \sp (I don't like special cases), but not a strong one.

Duncan Murdoch


We could relax things a lot, and allow balanced braces as no-ops in Latex-like mode, but that will miss some typos. I fixed typos in 10 files in r46908, and at least one of those was caught this way, in methods/man/Classes.Rd. It would also introduce an ambiguity, because \eqn and \deqn *are* going to occur in Latex-like mode. So

\eqn{foo}{}bar

could be either the two-arg version or the one-arg version followed by a no-op before the bar. (The default handling in Bison is that it would be the two-op version.) And I think it would be tricky to write the parser so that {} was handled differently in Latex-like mode from the way it's handled in the other modes. (The other modes count braces and echo them out.)

There are currently only 5 macros which take no args: \cr, \dots, \ldots, \R, and \tab. I think the issue will only arise with \dots and \ldots. So my preferred decision would be to push this up a level: when the code is interpreted, \dots and \ldots are not followed by a space. To allow for a user who wants a space, we should introduce a 6th no-argument macro, \sp. Then "1\dots 10" will be rendered as "1...10"
and "1\dots\sp 10" will be rendered as "1... 10".

Duncan Murdoch


Thanks,
Simon


On Nov 13, 2008, at 11:02 , Duncan Murdoch wrote:

I've just committed the parse_Rd() function to R-devel. This is a parser for Rd files, described in

http://developer.r-project.org/parseRd.pdf

It is not identical to the current parser, and about a dozen of the base man pages currently signal syntax errors. It also detected errors in 10 files that were errors according to both definitions, but were missed by the current system, and I've already fixed those. I plan to patch the rest so that they work in both systems soon. The differences between the two systems are described in the document above.

I would like to hear comments about the changes -- some of them are still optional. I will be continuing to work on support functions for the parser, e.g. the print routine is currently quite primitive.

I expect there may be incompatibilities with platforms on which I haven't tested. I developed the parser on Windows, and have tested it on a Linux system. There may be problems handling Rd files with unusual encodings (UTF-8 and Latin1 should be supported, but I don't know about others, and haven't even tested those yet).

Duncan Murdoch

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to