Re: [Rd] Suggestion: Custom filename patterns for non-Sweave vignettes

2013-02-16 Thread Henrik Bengtsson
Hi,

as said at the end, all comments are now in the light of R 3.x.0 (x  0).


On Fri, Feb 15, 2013 at 11:30 AM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 13-02-15 1:53 PM, Henrik Bengtsson wrote:

 Hi Duncan,

 thanks you for your prompt reply.


 On Fri, Feb 15, 2013 at 1:15 AM, Duncan Murdoch
 murdoch.dun...@gmail.com wrote:

 There are several reasons I decided against that:

- two packages may request overlapping patterns, making it much
 messier to
 do the matching, checking etc, since the matching would have to depend on
 the package being processed.


 So, isn't that somewhat already taken care of by the 'VignetteBuilder'
 field in DESCRIPTION?  It specifies additional builders in addition to
 the default/builtin Sweave builder.


 No, it specifies additional packages besides utils. Packages may specify
 multiple engines.

I think we're on the same page here - by builders I meant packages
that provide engine for building vignettes.

 For example, knitr can handle Sweave-like knitr
 vignettes, and markdown-based vignettes.  Yihui chose to use the same engine
 for both, but it might make more sense to specify different engines.

Just to add a tiny FYI related to this comment; RSP markup is
independent of the output format, so in that case it makes sense to
have a single engine regardless of output format.


 So a user might say they want a knitr vignette and a .html.rsp vignette.
 But perhaps in the meantime, Yihui added an engine that can handle .rsp
 files.  So the user would have to list both packages, and there would be an
 ambiguity as to which one should be run.  You might say that's the user's
 problem, but they wouldn't complain to themselves, they'd complain to me, so
 it's my problem.

As I understand it, currently the rule is that R will take a .Rnw, /
Rmd file, scan its content for \VignetteEngine{engine} to infer the
vignette engine, and then apply that vignette engine to the source
file.  If no \VignetteEngine{} is found, the default is to use Sweave
(as before).  The exact same strategy can be applied with support
custom filename patterns, with the default to give an error (or
alternatively silently skip it) if no \VignetteEngine{} is found (*).
This would remove any ambiguities between an R.rsp and knitr 'rsp'
engine, just as it does for *.Rnw currently.

(*) Ideally, I'd like the default to be inferred from the file's
content type, which in turn could be guessed from the filename
extension and possibly some content-type markup (e.g.
\VignetteContentType{...}), but I'm willing to step back from that.



 It would be possible to design all of this to work:  the engine could check
 the file and say oops, that's not my kind of .rsp file, try the next
 engine.  I just don't think it's worth it.  I certainly don't have time to
 design and program it or even to check your offered patch before feature
 freeze.  I can make small tweaks, but big changes that need lots of testing
 aren't going to happen.

I definitely hear you and I fully understand.





  Conflicts would only happen if a

 package developer (e.g. PkgA) includes a pattern that either (A)
 overrides the builtin in [.][RrSs](nw|tex)$ / [.]Rmd$ patterns, or
 (B) specifies to builders with the same patterns.  First of all, there
 are not that many builder packages, so this is something that could be
 negotiated among those to minimize conflicts.  Second, case (A) can be
 protected against by not allowing builder packages (e.g. knitr, rsp,
 ...) to add/register those patterns (tricky but possible to test for)


 I don't think it's feasible to check for overlap in regular expression
 patterns.

Here I was only thinking of testing for overlaps with
[.][RrSs](nw|tex)$ / [.]Rmd$, which can be done as:

illegalPattern - function(pattern) {
  files - c(outer(c(R, r, S, s), c(nw, tex), FUN=paste,
sep=), Rmd)
  files - paste(., files, sep=)
  any(regexpr(pattern, files) != -1L)
}

But yes, checking for overlapping patterns in general would be very hard.




 (but only default to them if that is what they wish to use).  For case
 (B), the developer of package PkgA has the power to avoid conflicts.
 One could also imagine the ordering of packages listed in
 'VignetteBuilder' would provide a prioritization.


 Sure, but it would be confusing to get an error from knitr when you didn't
 know knitr was handling .rsp.

See above reply on \VignetteEngine{}.



 BTW, case (A) is basically what the new design is already providing;
 all builder packages use the same patterns.

 So, from a package building point of view, I don't see how this would
 make it messier.  I can see that when checking a package it is harder
 to validate matches between input and output formats (is that done?).
 Let me know if I simplifying things too much - then I'll read up on
 the 'R CMD *' source code.


- one package may request a pattern that another package uses for
 auxiliary files, e.g. .bib.  If a user has both types of vignette it
 would
 just be 

Re: [Rd] Suggestion: Custom filename patterns for non-Sweave vignettes

2013-02-15 Thread Duncan Murdoch

There are several reasons I decided against that:

  - two packages may request overlapping patterns, making it much 
messier to do the matching, checking etc, since the matching would have 
to depend on the package being processed.


  - one package may request a pattern that another package uses for 
auxiliary files, e.g. .bib.  If a user has both types of vignette it 
would just be a mess.


  - the extension is also used to determine the output format.  We only 
support LaTeX (which will be converted to PDF) and HTML output.  It 
would be reasonable to support direct PDF output, but I don't think any 
other output formats should be supported.


I understand that forcing you to use .Rmd instead of .html.rsp may look 
unsightly, but I think the extensions need to be fixed, not customizable.


Duncan Murdoch

On 13-02-14 10:29 PM, Henrik Bengtsson wrote:

Hi,

as far as I understand it, the new R devel feature of processing
non-Sweave vignettes will (a) locate any [.][RrSs](nw|tex)$ or
.Rmd files, (b) check for a registered vignette engine, (c) process
the file using the registered weave function, (d) and possibly post
process the generated weave artifact (e.g. a *.tex file).

I'd like to propose to extend this non-Sweave mechanism to allow for
any filename patterns still using a very similar setup.  Here is how
I'd like it to see it work with RSP vignettes (cf. the R.rsp package):

   tools::vignetteEngine(rsp, weave=rspWeave, tangle=rspTangle,
patterns=[.]rsp$)

Argument 'patterns' could default to patterns=c([.][RrSs](nw|tex)$,
[.]Rmd$).

This is just a sketch/mock up and it may be that there are better
solutions.  However, the idea is that when specify 'VignetteBuilder:
R.rsp' in DESCRIPTION of a package, R locates all engines registered
by the builder package.  In this case it finds 'rsp'.  (An alternative
to this lookup would be to use a DESCRIPTION field 'VignetteEngines:
R.rsp:rsp, knitr:knitr'.)  It next looks for custom filename patterns
and use those to scan for vignette source files.  With this approach,
the '%\VignetteEngine{knitr}' specifier would become optional.  (I can
see how R now scans for Rnw and Rmd files, checks them for a
\VignetteEngine{} markup, and then looks up the corresponding engine).

Continuing, the above would make it possible to process RSP vignettes
that have filenames:

   reportA.tex.rsp
   reportB.html.rsp
   reportC.md.rsp
   reportD.Rnw.rsp

where rspWeave() will produce the following files:

   reportA.tex
   reportB.html
   reportC.html
   reportD.tex

I included the latter case just to illustrate a special case where
rspWeave() first generates a reportC.Rnw (Sweave or knitr) which is
the processed using the corresponding weaver to generate reportC.tex.

My point is that restricting vignette filenames to .[RrSs](nw|tex)$
or .Rmd is unnecessary and conceptually it would not be too hard to
extend it to handle any filename patterns.

I am aware that implementing this would require updates in several
place.  If R core would approve on the above extended functionality, I
would be happy to dig into the source code and provide minimal and
backward compatible patches.

Finally, without knowing the details of all the other report
generating packages, my guess is that this extended feature would be
useful also for some of those packages, which in the long run
hopefully results in more packages having more vignettes (regardless
of the vignette format).

All the best,

Henrik

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Suggestion: Custom filename patterns for non-Sweave vignettes

2013-02-15 Thread Henrik Bengtsson
Hi Duncan,

thanks you for your prompt reply.


On Fri, Feb 15, 2013 at 1:15 AM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 There are several reasons I decided against that:

   - two packages may request overlapping patterns, making it much messier to
 do the matching, checking etc, since the matching would have to depend on
 the package being processed.

So, isn't that somewhat already taken care of by the 'VignetteBuilder'
field in DESCRIPTION?  It specifies additional builders in addition to
the default/builtin Sweave builder.  Conflicts would only happen if a
package developer (e.g. PkgA) includes a pattern that either (A)
overrides the builtin in [.][RrSs](nw|tex)$ / [.]Rmd$ patterns, or
(B) specifies to builders with the same patterns.  First of all, there
are not that many builder packages, so this is something that could be
negotiated among those to minimize conflicts.  Second, case (A) can be
protected against by not allowing builder packages (e.g. knitr, rsp,
...) to add/register those patterns (tricky but possible to test for)
(but only default to them if that is what they wish to use).  For case
(B), the developer of package PkgA has the power to avoid conflicts.
One could also imagine the ordering of packages listed in
'VignetteBuilder' would provide a prioritization.

BTW, case (A) is basically what the new design is already providing;
all builder packages use the same patterns.

So, from a package building point of view, I don't see how this would
make it messier.  I can see that when checking a package it is harder
to validate matches between input and output formats (is that done?).
Let me know if I simplifying things too much - then I'll read up on
the 'R CMD *' source code.


   - one package may request a pattern that another package uses for
 auxiliary files, e.g. .bib.  If a user has both types of vignette it would
 just be a mess.

I see your concern, but is there really a significant risk for this?
And if it would occur, (i) it would be contained to PkgA, (ii) the
developer of package PkgA would quickly detect it, and (iii) the
badly behaving builder package would rather soon flagged as doing
something bad (and its developer would be informed and so on).


   - the extension is also used to determine the output format.  We only
 support LaTeX (which will be converted to PDF) and HTML output.  It would be
 reasonable to support direct PDF output, but I don't think any other output
 formats should be supported.

Yes, supporting PDF output makes sense.  One may also consider
generation of plain *.txt files (think README.txt and similar).  As I
see it, the restriction on supported *output* formats are given by
what the R help system wish to support (which is basically *.pdf and
*.html documents).  It's clear that the decision on what to support is
up to the maintainer of the R system (i.e. R core).

When it comes to input/source files for generating those output files,
it's harder to argue for restrictions.  As I understand it, the new
support for non-Sweave vignettes is moving away from such restriction,
which is great.  Despite the restrictions on file extension, it is
possible to hijack (my words) any of the supported extension for
whatever reason you want, as long as you produce a *.pdf or *.html
document in the end.  More below...


 I understand that forcing you to use .Rmd instead of .html.rsp may look
 unsightly, but I think the extensions need to be fixed, not customizable.

I still find it unfortunate that the R system opens up for processing
any type of input files but enforces those to have certain filename
extensions.

As a real example, today Sweave and knitr both use *.Rnw.  This means
that if I send someone a standalone *.Rnw file, they will not be able
to tell how to compile it without further instructions from me or by
inspecting the content type, or by trial and error.  I believe that
makes reproducible research a bit more tedious.  With unique filename
extensions, life is easier.  It's easy to imagine that if other
builder packages (e.g. R.rsp, brew, ...) also start using *.Rnw,
things are not going to become better.  The current rules are
pushing things in that direction.  To take an extreme stand, it's a
little bit like using *.txt for all your C, C++, Erlang, Fortran,
Simula, ... code, because it in the end of the day they all compile to
binaries anyway.

One may argue that the Rnw/Rtex/Rmd extensions only apply to the R
package vignettes and you can still use other extensions when you work
with standalone vignette source files.  That's of course also
unfortunate, because that will add additional confusion, e.g You can
find the vignette in my package, but by the way you should really
rename it because   The exact same source file will have
different extensions depending on context.  (In my own case, I found
*.tex.rsp, *.html.rsp, *.md.rsp, *.Rnw.rsp, ... to be much less
ambiguous and I prefer not to introduce ambiguity in mapping those to

Re: [Rd] Suggestion: Custom filename patterns for non-Sweave vignettes

2013-02-15 Thread Duncan Murdoch

On 13-02-15 1:53 PM, Henrik Bengtsson wrote:

Hi Duncan,

thanks you for your prompt reply.


On Fri, Feb 15, 2013 at 1:15 AM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:

There are several reasons I decided against that:

   - two packages may request overlapping patterns, making it much messier to
do the matching, checking etc, since the matching would have to depend on
the package being processed.


So, isn't that somewhat already taken care of by the 'VignetteBuilder'
field in DESCRIPTION?  It specifies additional builders in addition to
the default/builtin Sweave builder.


No, it specifies additional packages besides utils.  Packages may 
specify multiple engines.  For example, knitr can handle Sweave-like 
knitr vignettes, and markdown-based vignettes.  Yihui chose to use the 
same engine for both, but it might make more sense to specify different 
engines.


So a user might say they want a knitr vignette and a .html.rsp vignette. 
 But perhaps in the meantime, Yihui added an engine that can handle 
.rsp files.  So the user would have to list both packages, and there 
would be an ambiguity as to which one should be run.  You might say 
that's the user's problem, but they wouldn't complain to themselves, 
they'd complain to me, so it's my problem.


It would be possible to design all of this to work:  the engine could 
check the file and say oops, that's not my kind of .rsp file, try the 
next engine.  I just don't think it's worth it.  I certainly don't have 
time to design and program it or even to check your offered patch before 
feature freeze.  I can make small tweaks, but big changes that need lots 
of testing aren't going to happen.



 Conflicts would only happen if a

package developer (e.g. PkgA) includes a pattern that either (A)
overrides the builtin in [.][RrSs](nw|tex)$ / [.]Rmd$ patterns, or
(B) specifies to builders with the same patterns.  First of all, there
are not that many builder packages, so this is something that could be
negotiated among those to minimize conflicts.  Second, case (A) can be
protected against by not allowing builder packages (e.g. knitr, rsp,
...) to add/register those patterns (tricky but possible to test for)


I don't think it's feasible to check for overlap in regular expression 
patterns.



(but only default to them if that is what they wish to use).  For case
(B), the developer of package PkgA has the power to avoid conflicts.
One could also imagine the ordering of packages listed in
'VignetteBuilder' would provide a prioritization.


Sure, but it would be confusing to get an error from knitr when you 
didn't know knitr was handling .rsp.



BTW, case (A) is basically what the new design is already providing;
all builder packages use the same patterns.

So, from a package building point of view, I don't see how this would
make it messier.  I can see that when checking a package it is harder
to validate matches between input and output formats (is that done?).
Let me know if I simplifying things too much - then I'll read up on
the 'R CMD *' source code.



   - one package may request a pattern that another package uses for
auxiliary files, e.g. .bib.  If a user has both types of vignette it would
just be a mess.


I see your concern, but is there really a significant risk for this?


If you look through CRAN, you'll see packages that do very weird things. 
 If it's legal, someone will try it.




And if it would occur, (i) it would be contained to PkgA, (ii) the
developer of package PkgA would quickly detect it, and (iii) the
badly behaving builder package would rather soon flagged as doing
something bad (and its developer would be informed and so on).



   - the extension is also used to determine the output format.  We only
support LaTeX (which will be converted to PDF) and HTML output.  It would be
reasonable to support direct PDF output, but I don't think any other output
formats should be supported.


Yes, supporting PDF output makes sense.  One may also consider
generation of plain *.txt files (think README.txt and similar).  As I
see it, the restriction on supported *output* formats are given by
what the R help system wish to support (which is basically *.pdf and
*.html documents).  It's clear that the decision on what to support is
up to the maintainer of the R system (i.e. R core).

When it comes to input/source files for generating those output files,
it's harder to argue for restrictions.  As I understand it, the new
support for non-Sweave vignettes is moving away from such restriction,
which is great.  Despite the restrictions on file extension, it is
possible to hijack (my words) any of the supported extension for
whatever reason you want, as long as you produce a *.pdf or *.html
document in the end.  More below...


The issue is that the supplier of a custom input extension would also 
need to specify what kind of output it produced, so R knows how to 
handle it.  That makes it more complicated, harder to test, etc.




I understand