Re: [Rd] capture "->"

2024-03-02 Thread Adrian Dușa
That would have been an elegant solution, but it doesn't seem to work:

> `->` <- `+`
> 1 -> 3 # expecting 4
Error in 3 <- 1 : invalid (do_set) left-hand side to assignment

It is possible to reassign other multiple character operators:
> `%%` <- `+`
> 1 %% 3
[1] 4

The assignment operator `->` is so special for the R parser, that it seems
impossible to change.

On Fri, Mar 1, 2024 at 11:30 PM  wrote:

> Adrian,
>
> That is indeed a specialized need albeit not necessarily one that cannot
> be done by requiring an alternate way of typing a formula that avoids being
> something the parser sees as needed to do at that level.
>
> In this case, my other questions become moot as I assume the global
> assignment operator and somethings like assign(“xyz”, 5) will not be in the
> way.
>
> What I was wondering about is what happens if you temporarily disable the
> meaning of the assignment operator <- and turn it back on after.
>
> In the following code, for no reason, I redefine + to mean – and then undo
> it:
>
>
> > temp <- `+`
> > `+` <- `-`
> > 5 + 3
> [1] 2
> > `+` <- temp
> > 5 + 3
> [1] 8
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] capture "->"

2024-03-01 Thread Adrian Dușa
I would also be interested in that.
For me, this is interesting for my QCA package, over which Dmitri and I
have exchanged a couple of messages.
The "<-" operator is used to denote necessity, and the "->" is used for
sufficiency.

Users often make use of Boolean expressions such as A*B + C -> Y
(to calculate if the expression A*B + C is sufficient for the outcome Y)

The parser inverses it into Y <- A*B + C, as if the outcome Y is necessary
for the expression A*B + C, which changes the nature of the expression.

Quoting such expressions is already possible and it works as expected. We
were trying to avoid the quotes, if at all possible, to simplify the
command use in the manuals.

Best wishes,
Adrian

On Fri, Mar 1, 2024 at 4:33 PM  wrote:

> I am wondering what the specific need for this is or is it just an
> exercise?
>
> Where does it matter if a chunk of code assigns using "<-" beforehand or
> "->" after hand, or for that matter assigns indirectly without a symbol?
>
> And whatever you come up with, will it also support the global assignment
> of "->>" as compared to ""<<-" too?
>
> I do wonder if you can re-declare the assignment operators or would that
> mess up the parser.
>
> -Original Message-
> From: R-devel  On Behalf Of Duncan Murdoch
> Sent: Friday, March 1, 2024 9:23 AM
> To: Dmitri Popavenko 
> Cc: r-devel 
> Subject: Re: [Rd] capture "->"
>
> On 01/03/2024 8:51 a.m., Dmitri Popavenko wrote:
> > On Fri, Mar 1, 2024 at 1:00 PM Duncan Murdoch  > > wrote:
> >
> > ...
> > I was thinking more of you doing something like
> >
> >parse(text = "A -> B", keep.source = TRUE)
> >
> > I forget what the exact rules are for attaching srcrefs to arguments
> of
> > functions, but I do remember they are a little strange, because not
> > every possible argument can accept a srcref attribute.  For example,
> > you
> > can't attach one to NULL, or to a name.
> >
> > Srcrefs are also fairly big and building them is slow, so I think we
> > tried to limit them to where they were needed, we didn't try to
> attach
> > them to every subexpression, just one per statement.  Each expression
> > within {} is a separate statement, so we get srcrefs attached to the
> {.
> > But in "foo(A -> B)" probably you only get one on the foo call.
> >
> > In some circumstances you could get the srcref on that call by
> looking
> > at sys.call().  But then things are complicated again, because R
> > doesn't
> > attach srcrefs to things typed at the console, only to things that
> are
> > sourced from files or text strings (and parsed with
> keep.source=TRUE).
> >
> > So I think you should probably require input from a string or a
> > file, or
> > not expect foo(A -> B) to work without some decoration.
> >
> >
> > Indeed, the more challenging task is to identify "->" at the console
> > (from a script or a string, seems trivial now).
> >
> > I would be willing to decorate as much as it takes to make this work, I
> > am just empty on more ideas how to persuade the parser.
>
> By "decorate", I meant putting it in quotes and parsing it using
> parse(text=...), or putting it in braces as you found.  I think parsing
> a string is most likely to be reliable because someone might turn off
> `keep.source` and then the braced approach would fail.  But you have
> control over it when you call parse() yourself.
>
> Duncan Murdoch
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [R-pkg-devel] [Rd] static html vignette

2024-01-04 Thread Adrian Dușa
On Thu, Jan 4, 2024 at 10:44 PM Uwe Ligges 
wrote:

> On 04.01.2024 21:23, Duncan Murdoch wrote:[...]
> > Users aren't forced to install "Suggests" packages.  That's a choice
> > they make.  The default for `install.packages()` is `dependencies = NA`,
> > which says to install hard dependencies (Imports, Depends, LinkingTo).
> > Users have to choose a non-default setting to include Suggests.
>
> Also note that the maintainer builds the vignette whe calling
> R CMD build
> CRAN checks whether the vignette can be build.
> If a user installs a package, the already produced vignette (on the
> maintainers machine by R CMD build) is instaled. There is no need for
> the user to install any extra package for being able to look at the
> vignettes.
>

I see... then I must have tested with dependencies = TRUE thinking this
refers to hard dependencies (one more reason to read the documentation
properly).

Thank you,
Adrian

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


Re: [R-pkg-devel] [Rd] static html vignette

2024-01-04 Thread Adrian Dușa
(Moved here following Ivan's suggestion)

On Thu, Jan 4, 2024 at 12:55 PM Ivan Krylov  wrote:

> On Thu, 4 Jan 2024 11:57:15 +0200
> Adrian Dușa  wrote:
>
> > I wonder if it would be possible to include an html static vignette.
>
> I would say that static vignettes are against the spirit of vignettes:
> the idea is to provide another layer of unit testing to the package by
> providing a deeper executable example than is possible with just Rd
> examples. I think that Bioconductor will even refuse a package with a
> vignette with no executable code in it.
>

I understand that perfectly, but for instance my package declared already
has over 800 tests and 100% code coverage. More unit testing in the
vignettes really strikes as unnecessary.

One other reason to use a static vignette, in my case, is that package
Sweave is not available for my version of R (on MacOS, M2 version)



> Still, you can use the R.rsp package to provide static vignettes in
> both PDF and HTML formats:
>
> https://cran.r-project.org/package=R.rsp/vignettes/R_packages-Static_PDF_and_HTML_vignettes.pdf
>
> This will add 6 packages to your total Suggests budget:
>
> setdiff(
>  unlist(package_dependencies('R.rsp', recursive=TRUE)),
>  unlist(standard_package_names())
> )
> # [1] "R.methodsS3" "R.oo""R.utils" "R.cache" "digest"


Yes indeed, I know about R.rsp.
To me at least, zero dependency means that users install that package and
that package alone, the reason for which I am now looking for static
(preferably html) vignettes.

I guess another question is why should the "Suggests" packages need to be
installed by end users. I understand CRAN checks need to make sure the
Vignettes can be processed and the code inside runs fine (just like the
examples in the Rd files) but it is very unlikely that end-users will want
to compile the vignettes themselves.

>From my own experience of almost 20 years of using R, I never-ever build
the vignettes of a certain package because it is much simpler to read them
on CRAN. I wonder, then, why are end users forced to install
Vignette-building "Suggests" packages (with long dependency chains) when
they practically never do that.

Life would be much simpler if the Suggests packages would not be
(automatically) installed, or if CRAN provided a way to include static
Vignettes to avoid the heavy dependencies of building them.

Best regards,
Adrian

[[alternative HTML version deleted]]

__
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel


[Rd] static html vignette

2024-01-04 Thread Adrian Dușa
Dear All,

I learned how to include a static pdf vignette into an R package, using a
dummy .Rnw file to include an already produced "vignette.pdf" file:

\documentclass{article}
\usepackage{pdfpages}
\begin{document}
\includepdf[pages=-, fitpaper=true]{vignette.pdf}
\end{document}

I wonder if it would be possible to include an html static vignette. Such
Rmarkdown (to html) vignettes can be produced using package "knitr", which
users are forced to install (along with dozens of knitr-dependent packages
from tidyverse) despite having nothing to do with the package itself.

If at all possible, I would like to avoid having users install "knitr" via
the Suggests field. I love that package, but I don't like its dependency
chain.

Thank you for any suggestions,
Adrian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] capture error messages from loading shared objects

2023-11-28 Thread Adrian Dușa
Thanks Henrik and Bill,

Indeed, but I do have a function called tryCatchWEM() in package admisc
that captures all that.

My use case was to test for different architectures (for instance, arm64 vs
Intel MacOS) embedding R in cross-platform applications.
I needed to test if the package could be loaded, and previously used
requireNamespace() being unaware, as Ivan pointed out, that internally that
function already does tryCatch() with loadNamespace().

That was the reason why my own function tryCatchWEM() could not capture
that specific error message.
I now do:
> admisc::tryCatchWEM(loadNamespace("foobar"))
$error
[1] "there is no package called ‘foobar’"

(and the error message is captured just fine).

All the best,
Adrian

On Tue, Nov 28, 2023 at 7:45 PM Henrik Bengtsson 
wrote:

> Careful; tryCatch() on non-error conditions will break out of what's
> evaluated, e.g.
>
> res <- tryCatch({
>   cat("1\n")
>   message("2")
>   cat("3\n")
>   42
> }, message = identity)
>
> will output '1' but not '3', because it returns as soon as the first
> message() is called.
>
> To "record" messages (same for warnings), use withCallingHandlers()
> instead, e.g.
>
> msgs <- list()
> res <- withCallingHandlers({
>   cat("1\n")
>   message("2")
>   cat("3\n")
>   42
> }, message = function(m) {
>   msgs <<- c(msgs, list(m))
>   invokeRestart("muffleMessage")
> })
>
> This will output '1', muffle '2', output '3', and return 42, and 'msgs'
> holds
>
> > msgs
> [[1]]
> 
> /Henrik
>
> On Tue, Nov 28, 2023 at 10:34 AM Bill Dunlap 
> wrote:
> >
> > If you would like to save the error message instead of suppressing it,
> you
> > can use tryCatch(message=function(e)e, ...).
> >
> > -BIll
> >
> > On Tue, Nov 28, 2023 at 3:55 AM Adrian Dusa 
> wrote:
> >
> > > Once again, Ivan, many thanks.
> > > Yes, that does solve it.
> > > Best wishes,
> > > Adrian
> > >
> > > On Tue, Nov 28, 2023 at 11:28 AM Ivan Krylov 
> > > wrote:
> > >
> > > > В Tue, 28 Nov 2023 10:46:45 +0100
> > > > Adrian Dusa  пишет:
> > > >
> > > > > tryCatch(requireNamespace("foobar"), error = function(e) e)
> > > >
> > > > I think you meant loadNamespace() (which throws errors), not
> > > > requireNamespace() (which internally uses
> tryCatch(loadNamespace(...))
> > > > and may or may not print the error depending on the `quietly`
> argument).
> > > >
> > > > --
> > > > Best regards,
> > > > Ivan
> > > >
> > >
> > > [[alternative HTML version deleted]]
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] capturing multiple warnings in tryCatch()

2021-12-03 Thread Adrian Dușa
Dear John,

The logical argument capture is already in production use by other
packages, but I think this is easily solved by:

if (!is.null(output$value) & output$visible) {
if (capture) {
toreturn$output <- capture.output(output$value)
}
toreturn$value <- output$value
}

so that value is always part of the return list, if visible.

This is a very good suggestion, and I've already incorporated it into this
function.

All the best,
Adrian

On Fri, 3 Dec 2021 at 21:42, Fox, John  wrote:

> Dear Adrian,
>
> Here's my slightly modified version of your function, which serves my
> purpose:
>
> --- snip ---
>
> tryCatchWEM <- function (expr, capture = TRUE) {
> toreturn <- list()
> output <- withVisible(withCallingHandlers(
> tryCatch(expr,
>  error = function(e) {
>  toreturn$error <<- e$message
>  NULL
>  }), warning = function(w) {
>  toreturn$warning <<- c(toreturn$warning, w$message)
>  invokeRestart("muffleWarning")
>  }, message = function(m) {
>  toreturn$message <<- paste(toreturn$message,
> m$message,
> sep = "")
>  invokeRestart("muffleMessage")
>  }))
> if (capture & output$visible) {
> if (!is.null(output$value)) {
> toreturn$result <- output$value
> }
> }
> if (length(toreturn) > 0) {
> return(toreturn)
> }
> }
>
> --- snip ---
>
> The two small modifications are to change the default of capture to TRUE
> and to return output$value rather than capture.output(output$value). So a
> suggestion would be to modify the capture argument to, say, capture=c("no",
> "output", "value") and then something like
>
> . . .
> capture <- match.arg(capture)
> . . .
> if (capture == "output"){
> toreturn$output <- capture.output(output$value)
> } else if (capture == "value"){
> toreturn$value <- output$value
> }
> . . .
>
> Best,
>  John
>
> On 2021-12-03, 1:56 PM, "R-devel on behalf of Adrian Dușa" <
> r-devel-boun...@r-project.org on behalf of dusa.adr...@gmail.com> wrote:
>
> On Fri, 3 Dec 2021 at 00:37, Fox, John  wrote:
>
> > Dear Henrik, Simon, and Adrian,
> >
> > As it turns out Adrian's admisc::tryCatchWEM() *almost* does what I
> want,
> > which is both to capture all messages and the result of the
> expression
> > (rather than the visible representation of the result). I was easily
> able
> > to modify tryCatchWEM() to return the result.
> >
>
> Glad it helps.
> I would be happy to improve the function, should you send a reprex
> with the
> desired final result.
>
> Best wishes,
> Adrian
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] capturing multiple warnings in tryCatch()

2021-12-03 Thread Adrian Dușa
On Fri, 3 Dec 2021 at 00:37, Fox, John  wrote:

> Dear Henrik, Simon, and Adrian,
>
> As it turns out Adrian's admisc::tryCatchWEM() *almost* does what I want,
> which is both to capture all messages and the result of the expression
> (rather than the visible representation of the result). I was easily able
> to modify tryCatchWEM() to return the result.
>

Glad it helps.
I would be happy to improve the function, should you send a reprex with the
desired final result.

Best wishes,
Adrian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] capturing multiple warnings in tryCatch()

2021-12-02 Thread Adrian Dușa
Dear John,

I have a function in package admisc called tryCatchWEM (catches warnings,
errors and messages):

> tryCatchWEM(foo())
$warning
[1] "warning1" "warning2"

Hope this helps,
Adrian

On Thu, 2 Dec 2021 at 23:04, Fox, John  wrote:

> Dear R-devel list members,
>
> Is it possible to capture more than one warning message using tryCatch()?
> The answer may be in ?conditions, but, if it is, I can't locate it.
>
> For example, in the following only the first warning message is captured
> and reported:
>
> > foo <- function(){
> +   warning("warning 1")
> +   warning("warning 2")
> + }
>
> > foo()
> Warning messages:
> 1: In foo() : warning 1
> 2: In foo() : warning 2
>
> > bar <- function(){
> +   tryCatch(foo(), warning=function(w) print(w))
> + }
>
> > bar()
> 
>
> Is there a way to capture "warning 2" as well?
>
> Any help would be appreciated.
>
> John
>
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> Web: http://socserv.mcmaster.ca/jfox/
>
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] substitute

2021-11-15 Thread Adrian Dușa
Thank you, I was given a deadline of two weeks to respond, hopefully this
will be settled by then.
Best wishes,
Adrian

On Mon, 15 Nov 2021 at 19:28, Duncan Murdoch 
wrote:

> This looks as though it is related to the recent patch in
>
>https://bugs.r-project.org/show_bug.cgi?id=18232
>
> I think you should probably wait until that settles down before worrying
> about it.
>
> Duncan Murdoch
>
> On 15/11/2021 12:18 p.m., Adrian Dușa wrote:
> > Dear R wizards,
> >
> > I have recently been informed about some build errors of my package QCA,
> > which I was able to trace down to the base function substitute(), with
> the
> > following replication example:
> >
> > foo <- function(x) return(substitute(x))
> >
> > In the stable R version 4.0.5, I get the expected result:
> >> foo(A + ~B + C~D)
> > A + ~B + C ~ D
> >
> > A different result (the culprit for the build error) occurs under Fedora
> > with R devel:
> >
> >> foo(A + ~B + C~D)
> > A + (~B + C) ~ D
> >
> > The Fedora machine is the rhub docker image from:
> > https://hub.docker.com/r/rhub/fedora-gcc-devel
> >
> > probably very similar to the one signalling the CRAN build error:
> > https://cran.r-project.org/web/checks/check_results_QCA.html
> >
> > The first (expected) command is from the stable R version installed on
> the
> > same Fedora machine, and I get an identical result on Windows and MacOS.
> >
> > For some reason, substitute() gives a different result on Debian using
> gcc,
> > and on both Fedora systems. I would be grateful for any hint, I am not
> > entirely certain what I should do about this.
> >
> > Thank you very much in advance,
> > Adrian
> >
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] substitute

2021-11-15 Thread Adrian Dușa
Dear R wizards,

I have recently been informed about some build errors of my package QCA,
which I was able to trace down to the base function substitute(), with the
following replication example:

foo <- function(x) return(substitute(x))

In the stable R version 4.0.5, I get the expected result:
> foo(A + ~B + C~D)
A + ~B + C ~ D

A different result (the culprit for the build error) occurs under Fedora
with R devel:

> foo(A + ~B + C~D)
A + (~B + C) ~ D

The Fedora machine is the rhub docker image from:
https://hub.docker.com/r/rhub/fedora-gcc-devel

probably very similar to the one signalling the CRAN build error:
https://cran.r-project.org/web/checks/check_results_QCA.html

The first (expected) command is from the stable R version installed on the
same Fedora machine, and I get an identical result on Windows and MacOS.

For some reason, substitute() gives a different result on Debian using gcc,
and on both Fedora systems. I would be grateful for any hint, I am not
entirely certain what I should do about this.

Thank you very much in advance,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: 1954 from NA

2021-05-26 Thread Adrian Dușa
Yes, that is even better.
Best,
Adrian

On Wed, May 26, 2021 at 7:05 PM Duncan Murdoch 
wrote:

> After 5 minutes more thought:
>
> - code non-missing as missingKind = NA, not 0, so that missingKind could
> be a character vector, or missingKind = 0 could be supported.
>
> - print methods should return the main argument, so mine should be
>
> print.MultiMissing <- function(x, ...) {
>vals <- as.character(x)
>if (!is.character(x) || inherits(x, "noquote"))
>  print(noquote(vals))
>else
>  print(vals)
>invisible(x)
> }
>
> This still needs a lot of improvement to be a good print method, but
> I'll leave that to you.
>
> Duncan Murdoch
>
> On 26/05/2021 11:43 a.m., Duncan Murdoch wrote:
> > On 26/05/2021 10:22 a.m., Adrian Dușa wrote:
> >> Dear Duncan,
> >>
> >> On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch <
> murdoch.dun...@gmail.com
> >> <mailto:murdoch.dun...@gmail.com>> wrote:
> >>
> >>  You've already been told how to solve this:  just add attributes
> to the
> >>  objects. Use the standard NA to indicate that there is some kind of
> >>  missingness, and the attribute to describe exactly what it is.
> Stick a
> >>  class on those objects and define methods so that subsetting and
> >>  arithmetic preserves the extra info you've added. If you do some
> >>  operation that turns those NAs into NaNs, big deal:  the attribute
> will
> >>  still be there, and is.na <http://is.na>(NaN) still returns TRUE.
> >>
> >>
> >> I've already tried the attributes way, it is not so easy.
> >
> > If you have specific operations that are needed but that you can't get
> > to work, post the issue here.
> >
> >> In the best case scenario, it unnecessarily triples the size of the
> >> data, but perhaps this is the only way forward.
> >
> > I don't see how it could triple the size.  Surely an integer has enough
> > values to cover all possible kinds of missingness.  So on integer or
> > factor data you'd double the size, on real or character data you'd
> > increase it by 50%.  (This is assuming you're on a 64 bit platform with
> > 32 bit integers and 64 bit reals and pointers.)
> >
> > Here's a tiny implementation to show what I'm talking about:
> >
> > asMultiMissing <- function(x) {
> > if (isMultiMissing(x))
> >   return(x)
> > missingKind <- ifelse(is.na(x), 1, 0)
> > structure(x,
> >   missingKind = missingKind,
> >   class = c("MultiMissing", class(x)))
> > }
> >
> > isMultiMissing <- function(x)
> > inherits(x, "MultiMissing")
> >
> > missingKind <- function(x) {
> > if (isMultiMissing(x))
> >   attr(x, "missingKind")
> > else
> >   ifelse(is.na(x), 1, 0)
> > }
> >
> > `missingKind<-` <- function(x, value) {
> > class(x) <- setdiff(class(x), "MultiMissing")
> > x[value != 0] <- NA
> > x <- asMultiMissing(x)
> > attr(x, "missingKind") <- value
> > x
> > }
> >
> > `[.MultiMissing` <- function(x, i, ...) {
> > missings <- missingKind(x)
> > x <- NextMethod()
> > missings <- missings[i]
> > missingKind(x) <- missings
> > x
> > }
> >
> > print.MultiMissing <- function(x, ...) {
> > vals <- as.character(x)
> > if (!is.character(x) || inherits(x, "noquote"))
> >   print(noquote(vals))
> > else
> >   print(vals)
> > }
> >
> > `[<-.MultiMissing` <- function(x, i, value, ...) {
> > missings <- missingKind(x)
> > class(x) <- setdiff(class(x), "MultiMissing")
> > x[i] <- value
> > missings[i] <- missingKind(value)
> > missingKind(x) <- missings
> > x
> > }
> >
> > as.character.MultiMissing <- function(x, ...) {
> > missings <- missingKind(x)
> > result <- NextMethod()
> > ifelse(missings != 0,
> >paste0("NA.", missings), result)
> >
> > }
> >
> > This is incomplete.  It doesn't do printing very well, and it doesn't
> > handle the case of assigning a MultiMissing value to a regular vector at
> > all.  (I think you'd need an S4 implementation if you want to support
> > that.)  But it does the basics:
> >
> >   > x <- 1:1

Re: [Rd] [External] Re: 1954 from NA

2021-05-26 Thread Adrian Dușa
On Wed, May 26, 2021 at 6:43 PM Duncan Murdoch 
wrote:

> [...]
> > In the best case scenario, it unnecessarily triples the size of the
> > data, but perhaps this is the only way forward.
>
> I don't see how it could triple the size.  Surely an integer has enough
> values to cover all possible kinds of missingness.  So on integer or
> factor data you'd double the size, on real or character data you'd
> increase it by 50%.  (This is assuming you're on a 64 bit platform with
> 32 bit integers and 64 bit reals and pointers.)


Apologies, that was supposed to be double the size not triple, 99% of the
survey data are integers.
But I suppose that is alright, space doesn't seem to be a problem.

Thank you very much for the examples, they do seem to cover the basics
indeed.
(that is what I meant when I wrote there might be a way without tagging
NAs).

Will take it from there, best wishes,
Adrian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: 1954 from NA

2021-05-26 Thread Adrian Dușa
On Wed, May 26, 2021 at 4:13 AM Gregory Warnes  wrote:

> As a side note, for floating point values, the IEEE 754 standard provides
> for a large set of NaN values, making it possible to have multiple types of
> NAs for floating point values...
>

That is interesting, but how does one use different NaN values from within
R?
Tagging such values wa already signaled as a dead end, is there another way?

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: 1954 from NA

2021-05-26 Thread Adrian Dușa
Dear Duncan,

On Wed, May 26, 2021 at 2:27 AM Duncan Murdoch 
wrote:

> You've already been told how to solve this:  just add attributes to the
> objects. Use the standard NA to indicate that there is some kind of
> missingness, and the attribute to describe exactly what it is.  Stick a
> class on those objects and define methods so that subsetting and
> arithmetic preserves the extra info you've added. If you do some
> operation that turns those NAs into NaNs, big deal:  the attribute will
> still be there, and is.na(NaN) still returns TRUE.
>

I've already tried the attributes way, it is not so easy.
In the best case scenario, it unnecessarily triples the size of the data,
but perhaps this is the only way forward.



> Base R doesn't need anything else.
>
> You complained that users shouldn't need to know about attributes, and
> they won't:  you, as the author of the package that does this, will
> handle all those details.  Working in your subject area you know all the
> different kinds of NAs that people care about, and how they code them in
> input data, so you can make it all totally transparent.  If you do it
> well, someone in some other subject area with a completely different set
> of kinds of missingness will be able to adapt your code to their use.
>

But that is the whole point: the package author does not define possible
NAs (the possibilities are infinite), users do that.
The package should only provide a simple method to achieve that.


I imagine this has all been done in one of the thousands of packages on
> CRAN, but if it hasn't been done well enough for you, do it better.
>

If it were, I would have found it by now...

Best wishes,
Adrian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread Adrian Dușa
Dear Avi,

That was quite a lengthy email...
What you write makes sense of course. I try hard not to deviate from the
base R, and thought my solution does just that but apparently no such luck.

I suspect, however, that something will have to eventually change: since
one of the R building blocks (such as an NA) is questioned by compilers, it
is serious enough to attract attention from the R core and maintainers.
And if that happens, my fingers are crossed the solution would allow users
to declare existing values as missing.

The importance of that, for the social sciences, cannot be stressed enough.

Best wishes, thanks once again to everyone,
Adrian

On Tue, May 25, 2021 at 10:03 PM Avi Gross via R-devel <
r-devel@r-project.org> wrote:

> That helps get more understanding of what you want to do, Adrian. Getting
> anyone to switch is always a challenge but changing R enough to tempt them
> may be a bigger challenge. His is an old story. I was the first adopter for
> C++ in my area and at first had to have my code be built with an all C
> project making me reinvent some wheels so the same “make” system knew how
> to build the two compatibly and link them. Of course, they all eventually
> had to join me in a later release but I had moved forward by then.
>
>
>
> I have changed (or more accurately added) lots of languages in my life and
> continue to do so. The biggest challenge is not to just adapt and use it
> similarly to the previous ones already mastered but to understand WHY
> someone designed the language this way and what kind of idioms are common
> and useful even if that means a new way of thinking. But, of course, any
> “older” language has evolved and often drifted in multiple directions. Many
> now borrow heavily from others even when the philosophy is different and
> often the results are not pretty. Making major changes in R might have
> serious impacts on existing programs including just by making them fail as
> they run out of memory.
>
>
>
> If you look at R, there is plenty you can do in base R, sometimes by
> standing on your head. Yet you see package after package coming along that
> offers not just new things but sometimes a reworking and even remodeling of
> old things. R has a base graphics system I now rarely use and another
> called lattice I have no reason to use again because I can do so much quite
> easily in ggplot. Similarly, the evolving tidyverse group of packages
> approaches things from an interesting direction to the point where many
> people mainly use it and not base R. So if they were to teach a class in
> how to gather your data and analyze it and draw pretty pictures, the
> students might walk away thinking they had learned R but actually have
> learned these packages.
>
>
>
> Your scenario seems related to a common scenario of how we can have values
> that signal beyond some range in an out-of-band manner. Years ago we had
> functions in languages like C that would return a -1 on failure when only
> non-negative results were otherwise possible. That can work fine but fails
> in cases when any possible value in the range can be returned. We have
> languages that deal with this kind of thing using error handling constructs
> like exceptions.  Sometimes you bundle up multiple items into a structure
> and return that with one element of the structure holding some kind of
> return status and another holding the payload. A variation on this theme,
> as in languages like GO is to have function that return multiple values
> with one of them containing nil on success and an error structure on
> failure.
>
>
>
> The situation we have here that seems to be of concern to you is that you
> would like each item in a structure to have attributes that are recognized
> and propagated as it is being processed. Older languages tended not to even
> have a concept so basic types simply existed and two instances of the
> number 5 might even be the same underlying one or two strings with the same
> contents and so on. You could of course play the game of making a struct,
> as mentioned above, but then you needed your own code to do all the
> handling as nothing else knew it contained multiple items and which ones
> had which purpose.
>
>
>
> R did add generalized attributes and some are fairly well integrated or at
> least partially. “Names” were discussed as not being easy to keep around.
> Factors used their own tagging method that seems to work fairly well but
> probably not everywhere. But what you want may be more general and not
> built on similar foundations.
>
>
>
> I look at languages like Python that are arguably more object-oriented now
> than R is and in some ways can be extended better, albeit not in others. If
> I wanted to create an object to hold the number 5 and I add methods to the
> object that allow it to participate in various ways with other objects
> using the hidden payload but also sometimes using the hidden payload, then
> I might pair it with the string “five” 

Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread Adrian Dușa
On Tue, May 25, 2021 at 4:14 PM  wrote:

> [...]
>
> Yes, it should be discarded.
>
> You can of course do what you like in code you keep to yourself. But
> please do not distribute code that does this. via CRAN or any other
> means. It will only create problems for those maintaining R.
>
> > After all, the NA is nothing but a tagged NaN.
>
> And we are now paying a price for what was, in hindsight, an
> unfortunate decision.
>

I (only now) understand that. That code is based on the R sources and (mind
you) an almost identical one from package haven.

Regardless, it was not the code I was trying to show, but the vignette: the
end result, the functionality of the software.
That is, automatically treat declared missing values as NAs, without users
being required to explicitly deal with attributes.

Now that I think about it, there might be a way to do this without tagging
NAs, so back to square one.

Best wishes,
Adrian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [External] Re: 1954 from NA

2021-05-25 Thread Adrian Dușa
calculation such as dividing by Zero (albeit maybe that
> might be a NaN) and so on. Maybe I could annotate integers with whether
> they are prime or even  versus odd  or a factor of 144 or anything else I
> can imagine. But at some point, the overhead from allowing all this can
> become substantial. I was amused at how python allows a function to be
> annotated including by itself since it is an object. So it can store such
> metadata perhaps in an attached dictionary so a complex costly calculation
> can have the results cached and when you ask for the same thing in the same
> session, it checks if it has done it and just returns the result in linear
> time. But after a while, how many cached results can there be?
>
> -Original Message-
> From: R-devel  On Behalf Of
> luke-tier...@uiowa.edu
> Sent: Monday, May 24, 2021 9:15 AM
> To: Adrian Dușa 
> Cc: Greg Minshall ; r-devel 
> Subject: Re: [Rd] [External] Re: 1954 from NA
>
> On Mon, 24 May 2021, Adrian Dușa wrote:
>
> > On Mon, May 24, 2021 at 2:11 PM Greg Minshall 
> wrote:
> >
> >> [...]
> >> if you have 500 columns of possibly-NA'd variables, you could have
> >> one column of 500 "bits", where each bit has one of N values, N being
> >> the number of explanations the corresponding column has for why the
> >> NA exists.
> >>
>
> PLEASE DO NOT DO THIS!
>
> It will not work reliably, as has been explained to you ad nauseam in this
> thread.
>
> If you distribute code that does this it will only lead to bug reports on
> R that will waste R-core time.
>
> As Alex explained, you can use attributes for this. If you need operations
> to preserve attributes across subsetting you can define subsetting methods
> that do that.
>
> If you are dead set on doing something in C you can try to develop an
> ALTREP class that provides augmented missing value information.
>
> Best,
>
> luke
>
>
>
> >
> > The mere thought of implementing something like that gives me shivers.
> > Not to mention such a solution should also be robust when subsetting,
> > splitting, column and row binding, etc. and everything can be lost if
> > the user deletes that particular column without realising its importance.
> >
> > Social science datasets are much more alive and complex than one might
> > first think: there are multi-wave studies with tens of countries, and
> > aggregating such data is already a complex process to add even more
> > complexity on top of that.
> >
> > As undocumented as they may be, or even subject to change, I think the
> > R internals are much more reliable that this.
> >
> > Best wishes,
> > Adrian
> >
> >
>
> --
> Luke Tierney
> Ralph E. Wareham Professor of Mathematical Sciences
> University of Iowa  Phone: 319-335-3386
> Department of Statistics andFax:   319-335-3017
> Actuarial Science
> 241 Schaeffer Hall  email:   luke-tier...@uiowa.edu
> Iowa City, IA 52242 WWW:  http://www.stat.uiowa.edu
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 5:47 PM Gabriel Becker 
wrote:

> Hi Adrian,
>
> I had the same thought as Luke. It is possible that you can develop an
> ALTREP that carries around the tagging information you're looking for in a
> way that is more persistent (in some cases) than R-level attributes and
> more hidden than additional user-visible columns.
>
> The downsides to this, of course, is that you'll in some sense be doing
> the same "extra vector for each vector you want tagged NA-s within" under
> the hood, and that only custom machinery you write will recognize things as
> something other than bog-standard NAs/NaNs.  You'll also have some problems
> with the fact that data in ALTREPs isn't currently modifiable without
> losing ALTREPness. That said, ALTREPs are allowed to carry around arbitrary
> persistent information with them, so from that perspective making an ALTREP
> that carries around a "meaning of my NAs" vector of tags in its metadata
> would be pretty straightforward.
>

Oh... now that is extremely interesting.
It is the first time I came across the ALTREP concept, so I need to study
the way it works before saying anything, but definitely something to
consider.

Thanks so much for the pointer,
Adrian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 4:40 PM Bertram, Alexander via R-devel <
r-devel@r-project.org> wrote:

> Dear Adrian,
> SPSS and other packages handle this problem in a very similar way to what I
> described: they store additional metadata for each variable. You can see
> this in the way that SPSS organizes it's file format: each "variable" has
> additional metadata that indicate how specific values of the variable,
> encoded as an integer or a floating point should be handled in analysis.
> Before you actually run a crosstab in SPSS, the metadata is (presumably)
> applied to the raw data to arrive at an in memory buffer on which the
> actual model is fitted, etc.
>

As far as I am aware, SAS and Stata use "very high" and "very low" values
to signal a missing value. Basically, the same solution using a different
sign bit (not creating attributes metadata, though).

Something similar to the IEEE-754 representation for the NaN:
0x7ff0

only using some other "high" word:
0x7fe0

If I understand this correctly, compilers are likely to mess around with
the payload from the 0x7ff0... stuff, which endangers even the most basic R
structure like a real NA.
Perhaps using a different high word such as 0x7fe would be stable, since
compilers won't confuse it with a NaN. And then any payload would be "safe"
for any specific purpose.

Not sure how SPSS manage its internals, but if they do it that way they
manage it in a standard procedural way. Now, since R's NA payload is at
risk, and if your solution is "good" for specific social science missing
data, would you recommend R creators to adopt it for a regular NA...?

We're looking for a general purpose solution that would create as little
additional work as possible for the end users. Your solution is already
implemented in the package "labelled" with the function user_na_to_na()
before doing any statistical analysis. That still requires users to pay
attention to details which the software should take care of automatically.

Best,
Adrian

The 20 line solution in R looks like this:
>
>
> df <- data.frame(q1 = c(1, 10, 50, 999), q2 = c("Yes", "No", "Don't know",
> "Interviewer napping"), stringsAsFactors = FALSE)
> attr(df$q1, 'missing') <- 999
> attr(df$q2, 'missing') <- c("Don't know", "Interviewer napping")
>
> excludeMissing <- function(df) {
>   for(q in names(df)) {
> v <- df[[q]]
> mv <- attr(v, 'missing')
> if(!is.null(mv)) {
>   df[[q]] <- ifelse(v %in% mv, NA, v)
> }
>   }
>   df
> }
>
> table(excludeMissing(df))
>
> If you want to preserve the missing attribute when subsetting the vectors
> then you will have to take the example further by adding a class and
> `[.withMissing` functions. This might bring the whole project to a few
> hundred lines, but the rules that apply here are well defined and well
> understood, giving you a proper basis on which to build. And perhaps the
> vctrs package might make this even simpler, take a look.
>
> Best,
> Alex
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
Hi Taras,

On Mon, May 24, 2021 at 4:20 PM Taras Zakharko 
wrote:

> Hi Adrian,
>
> Have a look at vctrs package — they have low-level primitives that might
> simplify your life a bit. I think you can get quite far by creating a
> custom type that stores NAs in an attribute and utilizes vctrs proxy
> functionality to preserve these attributes across different operations.
> Going that route will likely to give you a much more flexible and robust
> solution.
>

Yes I am well aware of the primitives from package vctrs, since package
haven itself uses the vctrs_vctr class.
They're doing a very interesting work, albeit not a solution for this
particular problem.

A.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
Dear Alex,

Thanks for piping in, I am learning with each new message.
The problem is clear, the solution escapes me though. I've already tried
the attributes route: it is going to triple the data size: along with the
additional (logical) variable that specifies which level is missing, one
also needs to store an index such that sorting the data would still
maintain the correct information.

One also needs to think about subsetting (subset the attributes as well),
splitting (the same), aggregating multiple datasets (even more attention),
creating custom vectors out of multiple variables... complexity quickly
grows towards infinity.

R factors are nice indeed, but:
- there are numerical variables which can hold multiple missing values (for
instance income)
- factors convert the original questionnaire values: if a missing value was
coded 999, turning that into a factor would convert that value into
something else

I really, and wholeheartedly, do appreciate all advice: but please be
assured that I have been thinking about this for more than 10 years and
still haven't found a satisfactory solution.

Which makes it even more intriguing, since other software like SAS or Stata
have solved this for decades: what is their implementation, and how come
they don't seem to be affected by the new M1 architecture?
When package "haven" introduced the tagged NA values I said: ah-haa... so
that is how it's done... only to learn that implementation is just as
fragile as the R internals.

There really should be a robust solution for this seemingly mundane
problem, but apparently is far from mundane...

Best wishes,
Adrian


On Mon, May 24, 2021 at 3:29 PM Bertram, Alexander 
wrote:

> Dear Adrian,
> I just wanted to pipe in and underscore Thomas' point: the payload bits of
> IEEE 754 floating point values are no place to store data that you care
> about or need to keep. That is not only related to the R APIs, but also how
> processors handle floating point values and signaling and non-signaling
> NaNs. It is very difficult to reason about when and under which
> circumstances these bits are preserved. I spent a lot of time working on
> Renjin's handling of these values and I can assure that any such scheme
> will end in tears.
>
> A far, far better option is to use R's attributes to store this kind of
> metadata. This is exactly what this language feature is for. There is
> already a standard 'levels' attribute that holds the labels of factors like
> "Yes", "No" , "Refused", "Interviewer error'' etc. In the past, I've worked
> on projects where we stored an additional attribute like "missingLevels"
> that stores extra metadata on which levels should be used in which kind of
> analysis. That way, you can preserve all the information, and then write a
> utility function which automatically applies certain logic to a whole
> dataframe just before passing the data to an analysis function. This is
> also important because in surveys like this, different values should be
> excluded at different times. For example, you might want to include all
> responses in a data quality report, but exclude interviewer error and
> refusals when conducting a PCA or fitting a model.
>
> Best,
> Alex
>
> On Mon, May 24, 2021 at 2:03 PM Adrian Dușa  wrote:
>
>> On Mon, May 24, 2021 at 1:31 PM Tomas Kalibera 
>> wrote:
>>
>> > [...]
>> >
>> > For the reasons I explained, I would be against such a change. Keeping
>> the
>> > data on the side, as also recommended by others on this list, would
>> allow
>> > you for a reliable implementation. I don't want to support fragile
>> package
>> > code building on unspecified R internals, and in this case particularly
>> > internals that themselves have not stood the test of time, so are at
>> high
>> > risk of change.
>> >
>> I understand, and it makes sense.
>> We'll have to wait for the R internals to settle (this really is
>> surprising, I wonder how other software have solved this). In the
>> meantime,
>> I will probably go ahead with NaNs.
>>
>> Thank you again,
>> Adrian
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>
>
> --
> Alexander Bertram
> Technical Director
> *BeDataDriven BV*
>
> Web: http://bedatadriven.com
> Email: a...@bedatadriven.com
> Tel. Nederlands: +31(0)647205388
> Skype: akbertram
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 2:11 PM Greg Minshall  wrote:

> [...]
> if you have 500 columns of possibly-NA'd variables, you could have one
> column of 500 "bits", where each bit has one of N values, N being the
> number of explanations the corresponding column has for why the NA
> exists.
>

The mere thought of implementing something like that gives me shivers. Not
to mention such a solution should also be robust when subsetting,
splitting, column and row binding, etc. and everything can be lost if the
user deletes that particular column without realising its importance.

Social science datasets are much more alive and complex than one might
first think: there are multi-wave studies with tens of countries, and
aggregating such data is already a complex process to add even more
complexity on top of that.

As undocumented as they may be, or even subject to change, I think the R
internals are much more reliable that this.

Best wishes,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Mon, May 24, 2021 at 1:31 PM Tomas Kalibera 
wrote:

> [...]
>
> For the reasons I explained, I would be against such a change. Keeping the
> data on the side, as also recommended by others on this list, would allow
> you for a reliable implementation. I don't want to support fragile package
> code building on unspecified R internals, and in this case particularly
> internals that themselves have not stood the test of time, so are at high
> risk of change.
>
I understand, and it makes sense.
We'll have to wait for the R internals to settle (this really is
surprising, I wonder how other software have solved this). In the meantime,
I will probably go ahead with NaNs.

Thank you again,
Adrian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
Hmm...
If it was only one column then your solution is neat. But with 5-600
variables, each of which can contain multiple missing values, to double
this number of variables just to describe NA values seems to me excessive.
Not to mention we should be able to quickly convert / import / export from
one software package to another. This would imply maintaining some sort of
metadata reference of which explanatory additional factor describes which
original variable.

All of this strikes me as a lot of hassle compared to storing some
information within a tagged NA value... I just need a little bit more bits
to play with.

Best wishes,
Adrian

On Sun, May 23, 2021 at 10:21 PM Avi Gross via R-devel <
r-devel@r-project.org> wrote:

> Arguably, R was not developed to satisfy some needs in the way intended.
>
> When I have had to work with datasets from some of the social sciences I
> have had to adapt to subtleties in how they did things with software like
> SPSS in which an NA was done using an out of bounds marker like 999 or "."
> or even a blank cell. The problem is that R has a concept where data such
> as integers or floating point numbers is not stored as text normally but in
> their own formats and a vector by definition can only contain ONE data
> type. So the various forms of NA as well as Nan and Inf had to be grafted
> on to be considered VALID to share the same storage area as if they sort of
> were an integer or floating point number or text or whatever.
>
> It does strike me as possible to simply have a column that is something
> like a factor that can contain as many NA excuses as you wish such as "NOT
> ANSWERED" to "CANNOT READ THE SQUIGLE" to "NOT SURE" to "WILL BE FILLED IN
> LATER" to "I DON'T SPEAK ENGLISH AND CANNOT ANSWER STUPID QUESTIONS". This
> additional column would presumably only have content when the other column
> has an NA. Your queries and other changes would work on something like a
> data.frame where both such columns coexisted.
>
> Note reading in data with multiple NA reasons may take extra work. If your
> errors codes are text, it will all become text. If the errors are 999 and
> 998 and 997, it may all be treated as numeric and you may not want to
> convert all such codes to an NA immediately. Rather, you would use the
> first vector/column to make the second vector and THEN replace everything
> that should be an NA with an actual NA and reparse the entire vector to
> become properly numeric unless you like working with text and will convert
> to numbers as needed on the fly.
>
> Now this form of annotation may not be pleasing but I suggest that an
> implementation that does allow annotation may use up space too. Of course,
> if your NA values are rare and space is only used then, you might save
> space. But if you could make a factor column and have it use the smallest
> int it can get as a basis, it may be a way to save on space.
>
> People who have done work with R, especially those using the tidyverse,
> are quite used to using one column to explain another. So if you are asked
> to say tabulate what percent of missing values are due to reasons A/B/C
> then the added columns works fine for that calculation too.
>

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-24 Thread Adrian Dușa
On Sun, May 23, 2021 at 10:14 PM Tomas Kalibera 
wrote:

> [...]
>
> Good, but unfortunately the delineation between computation and
> non-computation is not always transparent. Even if an operation doesn't
> look like "computation" on the high-level, it may internally involve
> computation - so, really, an R NA can become R NaN and vice versa, at any
> point (this is not a "feature", but it is how things are now).
>

I see.
Well, this is a risk we'll have to consider when the time comes. For the
moment, storing some metadata within the payload seems to work.



> [...]
>
> Ok, then I would probably keep the meta-data on the missing values on the
> side to implement such missing values in such code, and treat them
> explicitly in supported operations.
>
> But. in principle, you can use the floating-point NaN payloads, and you
> can pass such values to R. You just need to be prepared that not only you
> would loose your payloads/tags, but also the difference between R NA and R
> NaNs. Thanks to value semantics of R, you would not loose the tags in input
> values with proper reference counts (e.g. marked immutable), because those
> values will not be modified.
>
NaNs are fine of course, but then some (social science?) users might get
confused about the difference between NAs and NaNs, and for this reason
only I would still like to preserve the 1954 payload.
If at all possible, however, the extra 16 bits from this payload would make
a whole lot of a difference.

Please forgive my persistence, but would it be possible to use an unsigned
short instead of an unsigned int for the 1954 payload?
That is, if it doesn't break anything, but I don't really see what it
could. The corresponding check function seems to work just fine and it
doesn't need to be changed at all:

int R_IsNA(double x)
{
if (isnan(x)) {
ieee_double y;
y.value = x;
return (y.word[lw] == 1954);
}
return 0;
}

Best wishes,
Adrian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-23 Thread Adrian Dușa
Dear Tomas,

I understand that perfectly, but that is fine.
The payload is not going to be used in any computations anyways, it is
strictly an information carrier that differentiates between different types
of (tagged) NA values.

Having only one NA value in R is extremely limiting for the social
sciences, where multiple missing values may exist, because respondents:
- did not know what to respond, or
- did not want to respond, or perhaps
- the question did not apply in a given situation etc.

All of these need to be captured, stored, and most importantly treated as
if they would be regular missing values. Whether the payload might be lost
in computations makes no difference: they were supposed to be "missing
values" anyways.

The original question is how the payload is currently stored: as an
unsigned int of 32 bits, or as an unsigned short of 16 bits. If the R
internals would not be affected (and I see no reason why they would be), it
would allow an entire universe for the social sciences that is not
currently available and which all other major statistical packages do offer.

Thank you very much, your attention is greatly appreciated,
Adrian

On Sun, May 23, 2021 at 7:59 PM Tomas Kalibera 
wrote:

> TLDR: tagging R NAs is not possible.
>
> External software should not depend on how R currently implements NA,
> this may change at any time. Tagging of NA is not supported in R (if it
> were, it would have been documented). It would not be possible to
> implement such tagging reliably with the current implementation of NA in R.
>
> NaN payload propagation is not standardized. Compilers are free to and
> do optimize code not preserving/achieving any specific propagation.
> CPUs/FPUs differ in how they propagate in binary operations, some zero
> the payload on any operation. Virtualized environments, binary
> translations, etc, may not preserve it in any way, either. ?NA has
> disclaimers about this, an NA may become NaN (payload lost) even in
> unary operations and also in binary operations not involving other NaN/NAs.
>
> Writing any new software that would depend on that anything specific
> happens to the NaN payloads would not be a good idea. One can only
> reliably use the NaN payload bits for storage, that is if one avoids any
> computation at all, avoids passing the values to any external code
> unaware of such tagging (including R), etc. If such software wants any
> NaN to be understood as NA by R, it would have to use the documented R
> API for this (so essentially translating) - but given the problems
> mentioned above, there is really no point in doing that, because such
> NAs become NaNs at any time.
>
> Best
> Tomas
>
> On 5/23/21 9:56 AM, Adrian Dușa wrote:
> > Dear R devs,
> >
> > I am probably missing something obvious, but still trying to understand
> why
> > the 1954 from the definition of an NA has to fill 32 bits when it
> normally
> > doesn't need more than 16.
> >
> > Wouldn't the code below achieve exactly the same thing?
> >
> > typedef union
> > {
> >  double value;
> >  unsigned short word[4];
> > } ieee_double;
> >
> >
> > #ifdef WORDS_BIGENDIAN
> > static CONST int hw = 0;
> > static CONST int lw = 3;
> > #else  /* !WORDS_BIGENDIAN */
> > static CONST int hw = 3;
> > static CONST int lw = 0;
> > #endif /* WORDS_BIGENDIAN */
> >
> >
> > static double R_ValueOfNA(void)
> > {
> >  volatile ieee_double x;
> >  x.word[hw] = 0x7ff0;
> >  x.word[lw] = 1954;
> >  return x.value;
> > }
> >
> > This question has to do with the tagged NA values from package haven, on
> > which I want to improve. Every available bit counts, especially if
> > multi-byte characters are going to be involved.
> >
> > Best wishes,
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] 1954 from NA

2021-05-23 Thread Adrian Dușa
On Sun, May 23, 2021 at 4:33 PM brodie gaslam via R-devel <
r-devel@r-project.org> wrote:

> I should add, I don't know that you can rely on this
> particular encoding of R's NA.  If I were trying to restore
> an NA from some external format, I would just generate an
> R NA via e.g NA_real_ in the R session I'm restoring the
> external data into, and not try to hand assemble one.
>

Thanks for your answer, Brodie, especially on Sunday (much appreciated).
The aim is not to reconstruct an NA, but to "tag" an NA (and yes, I was
referring to an NA_real_ of course), as seen in action here:
https://github.com/tidyverse/haven/blob/master/src/tagged_na.c

That code:
- preserves the first part 0x7ff0
- preserves the last part 1954
- adds one additional byte to store (tag) a character provided in the SEXP
vector

That is precisely my understanding, that doubles starting with 0x7ff are
all NaNs. My question was related to the additional part 1954 from the low
bits: why does it need 32 bits?

The binary value of 1954 is 0100010, which is represented by 11 bits
occupying at most 2 bytes... So why does it need 4 bytes?

Re. the possible overflow, I am not sure: 0x7ff0 is the decimal 32752, or
the binary 111.
That is just about enough to fit in the available 16 bits (actually 15 to
leave one for the sign bit), so I don't really understand why it would. And
in any case, the union definition uses an unsigned short which (if my
understanding is correct) should certainly not overflow:

typedef union
{
double value;
unsigned short word[4];
} ieee_double;

What is gained with this proposal: 16 additional bits to do something with.
For the moment, only 16 are available (from the lower part of the high 32
bits). If the value 1954 would be checked as a short instead of an int, the
other 16 bits would become available. And those bits could be extremely
valuable to tag multi-byte characters, for instance, but also higher
numbers than 32767.

Best wishes,
Adrian

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] 1954 from NA

2021-05-23 Thread Adrian Dușa
Dear R devs,

I am probably missing something obvious, but still trying to understand why
the 1954 from the definition of an NA has to fill 32 bits when it normally
doesn't need more than 16.

Wouldn't the code below achieve exactly the same thing?

typedef union
{
double value;
unsigned short word[4];
} ieee_double;


#ifdef WORDS_BIGENDIAN
static CONST int hw = 0;
static CONST int lw = 3;
#else  /* !WORDS_BIGENDIAN */
static CONST int hw = 3;
static CONST int lw = 0;
#endif /* WORDS_BIGENDIAN */


static double R_ValueOfNA(void)
{
volatile ieee_double x;
x.word[hw] = 0x7ff0;
x.word[lw] = 1954;
return x.value;
}

This question has to do with the tagged NA values from package haven, on
which I want to improve. Every available bit counts, especially if
multi-byte characters are going to be involved.

Best wishes,
-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] GCC warning

2020-05-23 Thread Adrian Dușa
On Sat, May 23, 2020 at 10:01 AM Prof Brian Ripley 
wrote:

> On 23/05/2020 07:38, Simon Urbanek wrote:
> > Adrian,
> >
> > newer compilers are better at finding bugs - you may want to read the
> full trace of the error, it tells you that you likely have a memory
> overflow when using strncpy() in your package. You should check whether it
> is right. Unfortunately we can’t help you more specifically, because I
> don't see any link to what you submitted so can’t look at the code involved.
>
> NB: debian-gcc on CRAN does have the latest version of gcc (10.1) and
> the link would likely have given fuller details (such links are provided
> on CRAN report pages but I do not know for submissions).
>
> gcc does sometimes give false alarms here (there is one for R with gcc
>  >= 9 and another for gcc >= 10) but see
>
> https://developers.redhat.com/blog/2018/05/24/detecting-string-truncation-with-gcc-8/
> .  Most can easily be workaround by cleaner code.
>

Oh, of course, apologies for the oversight, these are the links:

"package QCA_3.8.tar.gz does not pass the incoming checks automatically,
please see the following pre-tests:
Windows: <
https://win-builder.r-project.org/incoming_pretest/QCA_3.8_20200521_185504/Windows/00check.log
>
Status: 1 NOTE
Debian: <
https://win-builder.r-project.org/incoming_pretest/QCA_3.8_20200521_185504/Debian/00check.log
>
Status: 1 WARNING, 1 NOTE"

I only now realised the most relevant link was down below, after "last
released version's CRAN status" (where I mistakenly stopped reading
further):

"More details are given in the directory:
"

This is indeed very informative, and points to specific parts of the code
that seem to be responsible with the warning.
Thank you both very much for the pointers, I now at least know what should
I try to fix.

Best wishes,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] GCC warning

2020-05-22 Thread Adrian Dușa
I am trying to submit a package on CRAN, and everything passes ok on all 
platforms but Debian, where CRAN responds with an automatic "significant" 
warning:

* checking whether package ‘QCA’ can be installed ... [35s/35s] WARNING
Found the following significant warnings:
  /usr/include/x86_64-linux-gnu/bits/string_fortified.h:106:10: warning: 
‘__builtin_strncpy’ output may be truncated copying 12 bytes from a string of 
length 79 [-Wstringop-truncation]
See ‘/srv/hornik/tmp/CRAN/QCA.Rcheck/00install.out’ for details.


I know the cause of this: using a cursomized version of some external C 
library, coupled with  in the Description.

But I do not know hot to get past this warning, since it refers to a builtin 
GCC function strncpy. As far as I read, this should be solved by a simple GCC 
upgrade to the newest version, but that is something outside my code base, 
since GCC resides on the CRAN servers.

In the meantime, to get the package published, did anyone encountered a similar 
problem? If so, is there a workaround?

—
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] detect ->

2020-04-15 Thread Adrian Dușa


> On 15 Apr 2020, at 18:13, William Dunlap  wrote:
> 
> You are right.  >= is not as evocative as =>.  Perhaps > and < would do?  
> %=>% and %<=% would work.

I thought about > and < too, but as you rightly observed > is way less 
evocative as =>
There is a certain level of clarity which an arrow like sign offers, and people 
consistently use an arrow in quoted expressions.

In addition anything else, whether > or %=>%, would make an unquoted expression 
different from the standard quoted one, and that could potentially create 
confusion.

Best wishes,
Adrian

—
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] detect ->

2020-04-15 Thread Adrian Dușa
> On 15 Apr 2020, at 13:20, Ivan Krylov  wrote:
> 
> On Wed, 15 Apr 2020 10:41:41 +0300
> Adrian Dușa  wrote:
> 
>> Now, if I could find a way to define "=>" as a standalone operator,
>> and convince the R parser to bypass that error, it would solve
>> everything. If this is not possible, I am back to detecting "->".
> 
> Just to confirm, are you avoiding custom %operators% because of two
> extra percent characters one would have to type per operator?

Yes, that's right. The "->" operator is the standard way to signal sufficiency 
into a SOP (sum of products) expression, something like:

A~BC + BC~D + E -> Y

This is consistently used in many books and articles, and currently the 
standard when quoted:

"A~BC + BC~D + E -> Y"

To require using the %% notation in unquoted strings would make the SOP 
expression different from the quoted one, which is likely to create confusion:

A~BC + BC~D + E %->% Y

If detecting "->" proves to be impossible, then I will probably have no choice 
but to stick with quoted expressions. I am just hoping the collective knowledge 
here would make this possible, though.

Best,
Adrian

—
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] detect ->

2020-04-15 Thread Adrian Dușa
Dear Bill,

I already tried this, and it would have been great as (currently) the 
sufficiency relation is precisely "=>"... but:

foo <- function(x) return(substitute(x))
foo(A => B)
Error: unexpected '>' in "foo(A =>"

It seems that "=>" is a syntactic error for the R parser, while "<=" is not 
because it denotes less than or equal.

Now, if I could find a way to define "=>" as a standalone operator, and 
convince the R parser to bypass that error, it would solve everything. If this 
is not possible, I am back to detecting "->".

Best,
Adrian


> On 13 Apr 2020, at 19:19, William Dunlap  wrote:
> 
> Using => and <= instead of -> and <- would make things easier, although the 
> precedence would be different.
> 
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
> 
> 
> On Mon, Apr 13, 2020 at 1:43 AM Adrian Dușa  wrote:
> Thank you for your replies, this actually has little to do with the regular R 
> code but more to signal what in my package QCA is referred to as a necessity 
> relation A <- B (A is necessary for B) and sufficiency A -> B (A is 
> sufficient for B).
> 
> If switched by the parser, A -> B becomes B <- A which makes B necessary for 
> A, while the intention is to signal sufficiency for B.
> 
> Capturing in a quoted string is trivial, but I am now experimenting with 
> substitute() to allow unquoted expressions.
> 
> This is especially useful when selecting A and B from the columns of a data 
> frame, using: c(A, B) instead of c("A", "B") with a lot more quotes for more 
> complex expressions using more columns.
> 
> I would be grateful for any pointer to a project that processes the code 
> while it is still raw text. I could maybe learn from their code and adapt to 
> my use case.
> 
> Best wishes,
> Adrian
> 
> > On 13 Apr 2020, at 11:23, Gabriel Becker  wrote:
> > 
> > Adrian,
> > 
> > Indeed, this has come up in a few places, but as Gabor says, there is no 
> > such thing as right hand assignment at any point after parsing is complete.
> > 
> > This means the only feasible way to detect it, which a few projects do I 
> > believe, is process the code while it is still raw text, before it goes 
> > into the parser, and have clever enough regular expressions.
> > 
> > The next question, then, is why are you trying to detect right assignment. 
> > Doing so can be arguably useful fo linting, its true. Otherwise, though, 
> > because its not really a "real thing" when the R code is being executed, 
> > its not something thats generally meaningful to detect in most cases.
> > 
> > Best,
> > ~G
> > 
> > On Mon, Apr 13, 2020 at 12:52 AM Gábor Csárdi  
> > wrote:
> > That parser already flips -> to <- before creating the parse tree.
> > 
> > Gabor
> > 
> > On Mon, Apr 13, 2020 at 8:39 AM Adrian Dușa  wrote:
> > >
> > > I searched and tried for hours, to no avail although it looks simple.
> > >
> > > (function(x) substitute(x))(A <- B)
> > > #A <- B
> > >
> > > (function(x) substitute(x))(A -> B)
> > > # B <- A
> > >
> > > In the first example, A occurs on the LHS, but in the second example A is 
> > > somehow evaluated as if it occured on the RHS, despite my understanding 
> > > that substitute() returns the unevaluated parse tree.
> > >
> > > Is there any way, or is it even possible to detect the right hand 
> > > assignment, to determine whether A occurs on the LHS?
> > >
> > > Thanks in advance for any hint,
> > > Adrian
> > >
> > > —
> > > Adrian Dusa
> > > University of Bucharest
> > > Romanian Social Data Archive
> > > Soseaua Panduri nr. 90-92
> > > 050663 Bucharest sector 5
> > > Romania
> > > https://adriandusa.eu
> > >
> > > __
> > > R-devel@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-devel
> > 
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> —
> Adrian Dusa
> University of Bucharest
> Romanian Social Data Archive
> Soseaua Panduri nr. 90-92
> 050663 Bucharest sector 5
> Romania
> https://adriandusa.eu
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

—
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] detect ->

2020-04-13 Thread Adrian Dușa
Thank you for your replies, this actually has little to do with the regular R 
code but more to signal what in my package QCA is referred to as a necessity 
relation A <- B (A is necessary for B) and sufficiency A -> B (A is sufficient 
for B).

If switched by the parser, A -> B becomes B <- A which makes B necessary for A, 
while the intention is to signal sufficiency for B.

Capturing in a quoted string is trivial, but I am now experimenting with 
substitute() to allow unquoted expressions.

This is especially useful when selecting A and B from the columns of a data 
frame, using: c(A, B) instead of c("A", "B") with a lot more quotes for more 
complex expressions using more columns.

I would be grateful for any pointer to a project that processes the code while 
it is still raw text. I could maybe learn from their code and adapt to my use 
case.

Best wishes,
Adrian

> On 13 Apr 2020, at 11:23, Gabriel Becker  wrote:
> 
> Adrian,
> 
> Indeed, this has come up in a few places, but as Gabor says, there is no such 
> thing as right hand assignment at any point after parsing is complete.
> 
> This means the only feasible way to detect it, which a few projects do I 
> believe, is process the code while it is still raw text, before it goes into 
> the parser, and have clever enough regular expressions.
> 
> The next question, then, is why are you trying to detect right assignment. 
> Doing so can be arguably useful fo linting, its true. Otherwise, though, 
> because its not really a "real thing" when the R code is being executed, its 
> not something thats generally meaningful to detect in most cases.
> 
> Best,
> ~G
> 
> On Mon, Apr 13, 2020 at 12:52 AM Gábor Csárdi  wrote:
> That parser already flips -> to <- before creating the parse tree.
> 
> Gabor
> 
> On Mon, Apr 13, 2020 at 8:39 AM Adrian Dușa  wrote:
> >
> > I searched and tried for hours, to no avail although it looks simple.
> >
> > (function(x) substitute(x))(A <- B)
> > #A <- B
> >
> > (function(x) substitute(x))(A -> B)
> > # B <- A
> >
> > In the first example, A occurs on the LHS, but in the second example A is 
> > somehow evaluated as if it occured on the RHS, despite my understanding 
> > that substitute() returns the unevaluated parse tree.
> >
> > Is there any way, or is it even possible to detect the right hand 
> > assignment, to determine whether A occurs on the LHS?
> >
> > Thanks in advance for any hint,
> > Adrian
> >
> > —
> > Adrian Dusa
> > University of Bucharest
> > Romanian Social Data Archive
> > Soseaua Panduri nr. 90-92
> > 050663 Bucharest sector 5
> > Romania
> > https://adriandusa.eu
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

—
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] detect ->

2020-04-13 Thread Adrian Dușa
I searched and tried for hours, to no avail although it looks simple.

(function(x) substitute(x))(A <- B)
#A <- B

(function(x) substitute(x))(A -> B)
# B <- A

In the first example, A occurs on the LHS, but in the second example A is 
somehow evaluated as if it occured on the RHS, despite my understanding that 
substitute() returns the unevaluated parse tree.

Is there any way, or is it even possible to detect the right hand assignment, 
to determine whether A occurs on the LHS?

Thanks in advance for any hint,
Adrian

—
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] importing namespaces from base packages

2018-03-13 Thread Adrian Dușa
On Mon, Mar 12, 2018 at 2:18 PM, Martin Maechler 
wrote:
> [...]
> Is that so?   Not according to my reading of the 'Writing R
> Extensions' manual, nor according to what I have been doing in
> all of my packages for ca. 2 years:
>
> The rule I have in my mind is
>
>  1) NAMESPACE Import(s|From) \
>   <==>  DESCRIPTION -> 'Imports:'
>  2) .. using "::" in  R code /
>
>
> If you really found that you did not have to import from say
> 'utils', I think this was a *un*lucky coincidence.

Of course, the importFrom() is mandatory in NAMESPACE otherwise the package
does not pass the checks.
The question was related to the relation between the packages mentioned in
the NAMESPACE and the packages mentioned in the Imports: field from
DESCRIPTION.

For instance, the current version 3.1 of package QCA on CRAN mentions in
the DESCRIPTION:

Imports: venn (≥ 1.2), shiny, methods, fastdigest

while the NAMESPACE file has:

import(shiny)
import(venn)
import(fastdigest)
importFrom("utils", "packageDescription", "remove.packages",
"capture.output")
importFrom("stats", "glm", "predict", "quasibinomial", "binom.test",
"cutree", "dist", "hclust", "na.omit", "dbinom", "setNames")
importFrom("grDevices", "dev.cur", "dev.new", "dev.list")
importFrom("graphics", "abline", "axis", "box", "mtext", "par", "title",
"text")
importFrom("methods", "is")

There are functions from packages utils, stats, grDevices and graphics for
which the R checks do not require a specific entry in the Imports: field.
I suspect because all of these packages are part of the base R, but so is
package methods. The question is why is it not mandatory for those packages
to be mentioned in the Imports: field from DESCRIPTION, while removing
package methods from that field runs into an error, despite maintaining the
package in the NAMESPACE's importFrom().



> [...]
> There are places in the R source where it is treated specially,
> indeed, part of 'methods' may be needed when it is neither
> loaded nor attached (e.g., when R runs with only base, say, and
> suddenly encounters an S4 object), and there still are
> situations where 'methods' needs to be in the search() path and
> not just loaded, but these cases should be unrelated to the
> above DESCRIPTION-Imports vs NAMESPACE-Imports correspondence.

This is what I had expected myself, then the above behavior has to have
another explanation.
It is just a curiosity, there is naturally nothing wrong with maintaining
package methods in the Imports: field. Only odd why some base R packages
are treated differently than other base R packages, at the package checks
stage.

Thank you,
Adrian

--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] importing namespaces from base packages

2018-03-09 Thread Adrian Dușa
Dear All,

I understand the R CMD checks with only the base package attached,
everything else (including the other packages bundled with the base R)
should be imported and most importantly declared in the Imports field from
the DESCRIPTION file.

However, I do use functions from other packages than base (utils,
grDevices, stats, graphics), for which it is sufficient to declare
importFrom() in the NAMESPACE file.

For instance, it is not required to specify utils in the Imports: field
from DESCRIPTION, when using importFrom("utils", "packageDescription") in
NAMESPACE.

The opposite happens for importFrom("methods", "is"), which ends up in the
error: Namespace dependency not required: ‘methods’

Is there a special reason for which package methods is treated differently
from all other packages bundled with the base R?

Thank you,
Adrian

--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania
https://adriandusa.eu

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] paste strings in C

2017-06-27 Thread Adrian Dușa
Hi Michael,

On Tue, Jun 27, 2017 at 5:31 PM, Michael Lawrence 
wrote:
>
> To do this in C, it would probably be easier and faster to just do the
> string manipulation directly. Luckily, there are already packages that
> have done this for you. See an example below using the S4Vectors
> package.

Thank you for your reply.
The goal is to obtain those strings in C, not in R...

The result is returned to R just for printing purposes, but I need the
strings in C.
And if at all possible, using C functions directly rather than calling R
functions from C.

My first instinct is to try using the do_paste() C function in the
src/main/ directory from the R sources, but I haven't been successful so
far.
Adrian

--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] paste strings in C

2017-06-27 Thread Adrian Dușa
Dear R-devs,

Below is a small example of what I am trying to achieve, that is trivial in
R and I would like to learn how to do in C, for very large matrices:

> (mymat <- matrix(c(1,0,0,2,2,1), nrow = 2))
 [,1] [,2] [,3]
[1,]102
[2,]021

And I would like to produce:
[1] "a*C" "B*c"


Which can be trivially done in R via something like:

foo <- function(mymat, colnms, tilde = FALSE) {
apply(mymat, 1, function(x) {
if (tilde) {
colnms[x == 1] <- paste0("~", colnms[x == 1])
} else {
colnms[x == 1] <- tolower(colnms[x == 1])
}
paste(colnms[x > 0], collapse = "*")
})
}

> foo(mymat, LETTERS[1:3])
[1] "a*C" "B*c"

> foo(mymat, LETTERS[1:3], tilde = TRUE)
[1] "~A*C" "B*~C"


I know that strings in C are far from trivial (encodings being one
important issue), and this is the sort of thing much easier to do in R. On
the other hand I found that, for a large matrix of say 1 million rows and
25 columns, setting the rownames of colnames in R copies the matrix and
costs a lot of memory and time in the process.

Having all necessary headers in C, the solution I came up with involves
calling the function foo() from within C:

SEXP test(SEXP mymat, SEXP colnms, SEXP tilde) {

SEXP call = PROTECT(LCONS(install("foo"),
LCONS(mymat,
LCONS(colnms,
LCONS(tilde, R_NilValue);

SEXP out = PROTECT(eval(call, R_GlobalEnv));

UNPROTECT(2);
return(out);
}


After compilation, say in a file called test.c, back in R I get:

> dyn.load("test.so")

> .Call("test", mymat, LETTERS[1:3], FALSE)
[1] "a*C" "B*c"

> .Call("test", mymat, LETTERS[1:3], TRUE)
[1] "~A*C" "B*~C"


In my real situation, the matrix I am working on is produced in the C code
(and it's much larger).
I don't know for sure, when calling the R function foo(), if the matrix is
copied: if not, this might be the best solution for me.

Otherwise I know there is a function do_paste() in C, and wondered whether
I could use that directly instead of calling R from C.

I hope this explains what I would like to do, many thanks in advance for
any hint,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr. 90-92
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] The use of match.fun

2016-09-06 Thread Adrian Dușa
I am not able to replicate this:

> center <- function(x,FUN) FUN(x)
> center(1:10, mean)
[1] 5.5
> mean <- 4
> center(1:10, mean)
Error in center(1:10, mean) : could not find function "FUN"

Using a fresh install of version 3.3.1 under MacOS, and tested before with
3.3.0 with the same result.


On Tue, Sep 6, 2016 at 4:25 PM, Joris Meys  wrote:

> Dear gurus,
>
> I was utterly surprised to learn that one of my examples illustrating the
> need of match.fun() doesn't give me the expected result.
>
> center <- function(x,FUN) FUN(x)
> center(1:10, mean)
> mean <- 4
> center(1:10, mean)
>
> Used to give me the error message "could not find function FUN". Now it
> just works, even though I didn't expect it to. I believe this is at least
> partially linked to a change in how R finds functions.
>
> Now I'm not sure any more whether match.fun() actually has any use any
> longer, and if so, in which cases it prevents things going wrong.
>
> I've tried to find an example where this went wrong, but couldn't find one.
> Any pointer to what happened here is greatly appreciated. I've checked the
> NEWS, but I'm not smart enough to find the relevant bits and piece it
> together.
>
> Thank you in advance
> Cheers
> Joris
>
> --
> Joris Meys
> Statistical consultant
>
> Ghent University
> Faculty of Bioscience Engineering
> Department of Mathematical Modelling, Statistics and Bio-Informatics
>
> tel :  +32 (0)9 264 61 79
> joris.m...@ugent.be
> ---
> Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (no) circular dependency

2016-04-09 Thread Adrian Dușa
On Fri, Apr 8, 2016 at 10:34 PM, Hadley Wickham  wrote:

> In that scenario, I would expect that QCA would suggest Venn and Venn
> would suggest QCA. Then there's no circular dependency problem.
>

Right, this is exactly what I was pointing myself in the first email:

- make package A dependent on package B (so that the namespace of B is
automatically available when loading package A)
- make package B "Suggest" package A (not "Depend" which leads to circular
dependency), and that if I am not mistaken will lead to automatically
install package A when package B is installed
- use requireNamespace("A") inside the function(s) of package B which uses
functions of package A
- directly use A::foo() inside those functions

The only trouble with "Suggest" is the namespace of A is not automatically
loaded with package B (the reverse would work because package A depends on
package B).
So the only other option that I found was to make use of requireNamespace()
and use A::foo() inside the functions of B.

Or as Hadley advices, make both packages A and B suggest each other and use
requireNamespace() inside the functions of both. That would also work.

Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] (no) circular dependency

2016-04-08 Thread Adrian Dușa
Hi Greg,

That's interesting but I assume those are self-contained functions.
In my case, the truthTable() function from package QCA depends on numerous
other functions in the QCA package so I'm not sure how feasible it is to
copy everything from each package to every other package.

Best,
Adrian

On Fri, Apr 8, 2016 at 8:04 PM, Gregory Warnes <g...@warnes.net> wrote:

> A third possibility, which I use in my gtools and gdata packages, is to
> use soft-links to create a copy of the relevant functions from one package
> in the other.  I make sure these functions are *not* exported, so no
> conflicts are created, and the use of soft-links mean the code never gets
> out of sync.
>
> -Greg
>
> *--  *
> *Change your thoughts and you change the world.*
> --Dr. Norman Vincent Peale
>
> On Apr 8, 2016, at 11:37 AM, Gabriel Becker <gmbec...@ucdavis.edu> wrote:
>
> Another, perhaps slightly off the wall reframing of the 3-package
> possibility:
>
> Have packages B, a, and UserFacingA, as follows
>
> *a* contains all the functionality in your A package that
> *does not depend on B*
> *B* *imports from* *a* and is essentially unchanged
> *UserFacingA* *Depends* on *a* and *imports from* *B*, it implements all
> functionality from your package A that *does depend on* *B*, and gets the
> rest from package *a*
>
>
> Users, then would only ever install or load B and UserFacingA. They
> wouldn't need to care much,if at all, about package a.
>
> ~G
>
> On Fri, Apr 8, 2016 at 7:36 AM, Dmitri Popavenko <
> dmitri.popave...@gmail.com
>
> wrote:
>
>
> Thanks all, I don't know either (for the moment).
>
> It's all in the design phase still. Generally, I would also like to keep
>
> specific functions in specific packages, if at all possible.
>
>
> On Fri, Apr 8, 2016 at 3:03 PM, Mark van der Loo <mark.vander...@gmail.com
>
>
> wrote:
>
>
> Well, I'm not saying that Dmitri _should_ do it. I merely mention it as
>
> an
>
> option that I think is worth thinking about -- it is easy to overlook the
>
> obvious :-). Since we have no further info on the package's structure we
>
> can't be sure..
>
>
>
>
>
> Op vr 8 apr. 2016 om 13:59 schreef Adrian Dușa <dusa.adr...@unibuc.ro>:
>
>
> Hi Mark,
>
>
> Uhm... sometimes this is not always possible.
>
> For example I have a package QCA which produces truth tables (all
>
> combinations of presence / absence of causal conditions), and it uses
>
> the
>
> venn package to draw a Venn diagram.
>
> It is debatable if one should assimilate the "venn" package into the QCA
>
> package (other people might want Venn diagrams but not necessarily the
>
> other QCA functions).
>
>
> On the other hand, the package venn would like to use the QCA package to
>
> demonstrate its abilities to plot Venn diagrams based on truth tables
>
> produced by the QCA package. Both have very different purposes, yet both
>
> use functions from each other.
>
>
> So I'm with Bill Dunlap here that several smaller packages are
>
> preferable
>
> to one larger one, but on the other hand I can't separate those
>
> functions
>
> into a third package: the truth table production is very specific to the
>
> QCA package, while plotting Venn diagrams is very specific to the venn
>
> package. I don't see how to separate those functions from their main
>
> packages and create a third one that each would depend on.
>
>
> This is just an example, there could be others as well, reason for which
>
> I am (still) looking for a solution to:
>
> - preserve the current functionalities in packages A and B (to follow
>
> Dmitri's original post)
>
> - be able to use functions from each other
>
> - yet avoid circular dependency
>
>
> I hope this explains it,
>
> Adrian
>
>
>
> On Thu, Apr 7, 2016 at 11:36 PM, Mark van der Loo <
>
> mark.vander...@gmail.com> wrote:
>
>
> At the risk of stating the over-obvious: there's also the option of
>
> creating just a single package containing all functions. None of the
>
> functions that create the interdependencies need to be exported that
>
> way.
>
>
> Btw, his question is probably better at home at the r-package-devel
>
> list.
>
>
>
> Best,
>
>
> M
>
>
>
>
>
> On Thu, Apr 7, 2016, 22:24 Dmitri Popavenko <
>
> dmitri.popave...@gmail.com>
>
> wrote:
>
>
> Hi Thierry,
>
>
> Thanks for that, the trouble is functions are package specific so
>
> moving
>
> from one package to another could be a solution, but I would rather
>
> save
>

Re: [Rd] (no) circular dependency

2016-04-08 Thread Adrian Dușa
Hi Mark,

Uhm... sometimes this is not always possible.
For example I have a package QCA which produces truth tables (all
combinations of presence / absence of causal conditions), and it uses the
venn package to draw a Venn diagram.
It is debatable if one should assimilate the "venn" package into the QCA
package (other people might want Venn diagrams but not necessarily the
other QCA functions).

On the other hand, the package venn would like to use the QCA package to
demonstrate its abilities to plot Venn diagrams based on truth tables
produced by the QCA package. Both have very different purposes, yet both
use functions from each other.

So I'm with Bill Dunlap here that several smaller packages are preferable
to one larger one, but on the other hand I can't separate those functions
into a third package: the truth table production is very specific to the
QCA package, while plotting Venn diagrams is very specific to the venn
package. I don't see how to separate those functions from their main
packages and create a third one that each would depend on.

This is just an example, there could be others as well, reason for which I
am (still) looking for a solution to:
- preserve the current functionalities in packages A and B (to follow
Dmitri's original post)
- be able to use functions from each other
- yet avoid circular dependency

I hope this explains it,
Adrian


On Thu, Apr 7, 2016 at 11:36 PM, Mark van der Loo 
wrote:

> At the risk of stating the over-obvious: there's also the option of
> creating just a single package containing all functions. None of the
> functions that create the interdependencies need to be exported that way.
>
> Btw, his question is probably better at home at the r-package-devel list.
>
>
> Best,
>
> M
>
>
>
>
> On Thu, Apr 7, 2016, 22:24 Dmitri Popavenko 
> wrote:
>
>> Hi Thierry,
>>
>> Thanks for that, the trouble is functions are package specific so moving
>> from one package to another could be a solution, but I would rather save
>> that as a last resort.
>>
>> As mentioned, creating a package C with all the common functions could
>> also
>> be an option, but this strategy quickly inflates the number of packages on
>> CRAN. If no other option is possible, that could be the way but I was
>> still
>> thinking about a more direct solution if possible.
>>
>> Best,
>> Dmitri
>>
>> On Thu, Apr 7, 2016 at 3:47 PM, Thierry Onkelinx <
>> thierry.onkel...@inbo.be>
>> wrote:
>>
>> > Dear Dmitri,
>> >
>> > If it's only a small number of functions then move them the relevant
>> > functions for A to B so that B works without A. Then Import these
>> functions
>> > from B in A. Hence A depends on B but B is independent of A.
>> >
>> > It is requires to move a lot of functions than you better create a
>> package
>> > C with all the common functions. Then A and B import those functions
>> from C.
>> >
>> > Best regards,
>> >
>> > ir. Thierry Onkelinx
>> > Instituut voor natuur- en bosonderzoek / Research Institute for Nature
>> and
>> > Forest
>> > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
>> > Kliniekstraat 25
>> > 1070 Anderlecht
>> > Belgium
>> >
>> > To call in the statistician after the experiment is done may be no more
>> > than asking him to perform a post-mortem examination: he may be able to
>> say
>> > what the experiment died of. ~ Sir Ronald Aylmer Fisher
>> > The plural of anecdote is not data. ~ Roger Brinner
>> > The combination of some data and an aching desire for an answer does not
>> > ensure that a reasonable answer can be extracted from a given body of
>> data.
>> > ~ John Tukey
>> >
>> > 2016-04-06 8:42 GMT+02:00 Dmitri Popavenko > >:
>> >
>> >> Hello all,
>> >>
>> >> I would like to build two packages (say A and B), for two different
>> >> purposes.
>> >> Each of them need one or two functions from the other, which leads to
>> the
>> >> problem of circular dependency.
>> >>
>> >> Is there a way for package A to import a function from package B, and
>> >> package B to import a function from package A, without arriving to
>> >> circular
>> >> dependency?
>> >> Other suggestions in the archive mention building a third package that
>> >> both
>> >> A and B should depend on, but this seems less attractive.
>> >>
>> >> I read about importFrom() into the NAMESPACE file, but I don't know
>> how to
>> >> relate this with the information in the DESCRIPTION file (other than
>> >> adding
>> >> each package to the Depends: field).
>> >>
>> >> Thank you,
>> >> Dmitri
>> >>
>> >> [[alternative HTML version deleted]]
>> >>
>> >> __
>> >> R-devel@r-project.org mailing list
>> >> https://stat.ethz.ch/mailman/listinfo/r-devel
>> >>
>> >
>> >
>>
>> [[alternative HTML version deleted]]
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] (no) circular dependency

2016-04-07 Thread Adrian Dușa
Hi Dmitri,

I was thinking about something similar for my packages. There might be
other (more clever) ways, but one way is to:
- make package A dependent on package B (so that the namespace of B is
automatically available when loading package A)
- make package B "Suggest" package A (not "Depend" which leads to circular
dependency), and that if I am not mistaken will lead to automatically
install package A when package B is installed
- use requireNamespace("A") inside the function(s) of package B which uses
functions of package A
- directly use A::foo() inside those functions

Didn't try this yet, but in theory it should work (I might try this
approach myself actually). I would also be curious if there are more clever
ways to deal with this.

I hope it helps,
Adrian

On Thu, Apr 7, 2016 at 11:22 PM, Dmitri Popavenko <
dmitri.popave...@gmail.com> wrote:

> Hi Thierry,
>
> Thanks for that, the trouble is functions are package specific so moving
> from one package to another could be a solution, but I would rather save
> that as a last resort.
>
> As mentioned, creating a package C with all the common functions could also
> be an option, but this strategy quickly inflates the number of packages on
> CRAN. If no other option is possible, that could be the way but I was still
> thinking about a more direct solution if possible.
>
> Best,
> Dmitri
>
> On Thu, Apr 7, 2016 at 3:47 PM, Thierry Onkelinx  >
> wrote:
>
> > Dear Dmitri,
> >
> > If it's only a small number of functions then move them the relevant
> > functions for A to B so that B works without A. Then Import these
> functions
> > from B in A. Hence A depends on B but B is independent of A.
> >
> > It is requires to move a lot of functions than you better create a
> package
> > C with all the common functions. Then A and B import those functions
> from C.
> >
> > Best regards,
> >
> > ir. Thierry Onkelinx
> > Instituut voor natuur- en bosonderzoek / Research Institute for Nature
> and
> > Forest
> > team Biometrie & Kwaliteitszorg / team Biometrics & Quality Assurance
> > Kliniekstraat 25
> > 1070 Anderlecht
> > Belgium
> >
> > To call in the statistician after the experiment is done may be no more
> > than asking him to perform a post-mortem examination: he may be able to
> say
> > what the experiment died of. ~ Sir Ronald Aylmer Fisher
> > The plural of anecdote is not data. ~ Roger Brinner
> > The combination of some data and an aching desire for an answer does not
> > ensure that a reasonable answer can be extracted from a given body of
> data.
> > ~ John Tukey
> >
> > 2016-04-06 8:42 GMT+02:00 Dmitri Popavenko :
> >
> >> Hello all,
> >>
> >> I would like to build two packages (say A and B), for two different
> >> purposes.
> >> Each of them need one or two functions from the other, which leads to
> the
> >> problem of circular dependency.
> >>
> >> Is there a way for package A to import a function from package B, and
> >> package B to import a function from package A, without arriving to
> >> circular
> >> dependency?
> >> Other suggestions in the archive mention building a third package that
> >> both
> >> A and B should depend on, but this seems less attractive.
> >>
> >> I read about importFrom() into the NAMESPACE file, but I don't know how
> to
> >> relate this with the information in the DESCRIPTION file (other than
> >> adding
> >> each package to the Depends: field).
> >>
> >> Thank you,
> >> Dmitri
> >>
> >> [[alternative HTML version deleted]]
> >>
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> >>
> >
> >
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] author field in Rd documentation

2016-02-15 Thread Adrian Dușa
Apologies if this is a non question, but I am trying to document some
functions, where the author of the function itself is different from the
author of the Rd file (one person did the programming, the other dealt with
the documentation).

In the Writing R Extensions document, section 2.1.1 Documenting functions,
it is written:

\author{...} Information about the author(s) of the Rd file.

Is there another field for the author of the function? Usually, they are
one and the same, but (as in this case) they might be different.

Any suggestion is welcome,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] integer

2015-12-17 Thread Adrian Dușa
In the help page for ?is.integer, there is this function

is.wholenumber <-
function(x, tol = .Machine$double.eps^0.5)  abs(x - round(x)) < tol

A quick question: is there a case where this alternative function will not
work?
function(x) x %% 1 == 0

Best,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] authorship and citation

2015-10-06 Thread Adrian Dușa
Hi Gabriel,

On Tue, Oct 6, 2015 at 10:59 PM, Gabriel Becker 
wrote:

> [...]
>
> At the very least, this is seems to be a flagrant violation of the
> *spirit* of the CRAN policy, which AFAIK is intended to enforce
> acknowledgement of the contributions of all copyright holders in the
> package. The fact that you are trying to bypass the policy by suggesting
> users use an unofficial citation which would not comply with the policy
> while maintaining an official one which complies, but which you don't want
> users to see  is probably a suggestion that you shouldn't do that.
>


But that is the very point: I read the CRAN policies twice, and there is no
official guideline on how to compile the citation.
Regarding the Source packages, the policies mention:

##
The ownership of copyright and intellectual property rights of all
components of the package must be clear and unambiguous (including from the
authors specification in the DESCRIPTION file). Where code is copied (or
derived) from the work of others (including from R itself), care must be
taken that any copyright/license statements are preserved and authorship is
not misrepresented.
Preferably, an ‘Authors@R’ would be used with ‘ctb’ roles for the authors
of such code. Alternatively, the ‘Author’ field should list these authors
as contributors.

Where copyrights are held by an entity other than the package authors, this
should preferably be indicated via ‘cph’ roles in the ‘Authors@R’ field, or
using a ‘Copyright’ field (if necessary referring to an inst/COPYRIGHTS
file).

Trademarks must be respected.
##

Now, that requirement is already met: the former author is still in the
authors' list. So the contribution of the former author is duly
acknowledged, but the fundamental issue of my question related to the
citation file, for which the CRAN policies doesn't offer any other
information.

If the spirit of the CRAN policies is to enforce citing each and every one
of the authors, then I don't understand why the citation from package Rcmdr
meets this spirit, while my suggestion doesn't.

I apologize for pushing this topic to the limit, but I haven't got an
answer to this question yet...

Best wishes,
Adrian

PS: @Thierry: I did take a look at RODBC, but the citation information is
generated automatically upon package installation (no other special file on
CRAN)

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] authorship and citation

2015-10-06 Thread Adrian Dușa
On Tue, Oct 6, 2015 at 11:58 PM, Adrian Dușa <dusa.adr...@unibuc.ro> wrote:
>
> [...]
> If the spirit of the CRAN policies is to enforce citing each and every
one of the authors, then I don't understand why the citation from package
Rcmdr meets this spirit, while my suggestion doesn't.
>
> I apologize for pushing this topic to the limit, but I haven't got an
answer to this question yet...


Out of curiosity, upon random checks there seem to be many other packages
in similar situations (which have multiple authors, but cite only a subset):
SamplingStrata
sandwich
SAVE
seawave

What all of these cases seem to have in common is an older published
journal article, and the citation adhered to that article irrespective of
how many authors subsequently contributed to that package.

Would it suffice to provide such an article, then?
Adrian

--
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] authorship and citation

2015-10-06 Thread Adrian Dușa
Dear Gabriel,

On Wed, Oct 7, 2015 at 12:39 AM, Gabriel Becker 
wrote:

> [...]
>
>>
>> I apologize for pushing this topic to the limit, but I haven't got an
>> answer to this question yet...
>>
>
> With respect, not receiving the answer you wanted isn't the same as not
> receiving an answer.
>

I very much appreciate your patience with me, and I am grateful for it.
The question is I believe very important, for I would like to avoid
submitting a new version of the package only to be told that I did
something wrong.

As so many other packages seem to have a lot of flexibility in compiling
the citation file, what I am inquiring is: will I be prosecuted for
submitting a new version which doesn't include all the authors in the
citation file, especially since the other author is no longer contributing?
(let's say I will provide a single author, published journal article,
referring specifically to this package).

The work of the other author is duly acknowledged in his position in the
authors' list.
As I previously wrote, citing Dusa and Other (2015) implies equal citation
rights for unequal work, a thing that I am uncomfortable with. There is a
huge amount of work being involved in this subsequent version, to which the
former author didn't contribute anything at all...

Best wishes,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] authorship and citation

2015-10-06 Thread Adrian Dușa
On Wed, Oct 7, 2015 at 1:07 AM, Gabriel Becker  wrote:

>
> [...]
>
>>
>> The work of the other author is duly acknowledged in his position in the
>> authors' list.
>> As I previously wrote, citing Dusa and Other (2015) implies equal
>> citation rights for unequal work, a thing that I am uncomfortable with.
>> There is a huge amount of work being involved in this subsequent version,
>> to which the former author didn't contribute anything at all...
>>
>
> It really doesn't imply this at all, at least to me (and I don't think I'm
> alone here).  In most authorship-listing schemes first author is the one
> who did the most direct work (wrote the draft, in the case of an article).
> On the other hand, citing Dusa (2015) implies NO work by Other in the
> entity being cited. That is clearly and concretely not the case.
>

That is another way of looking at things, but I don't necessarily agree
with you. It doesn't imply NO work, since that is duly recognised by the
presence of the Other in the authors' list.
And I have nevertheless changed my suggestion from Dusa (2015) with a
citation of a previous (older, 2007) article, one that laid the very
foundations for the package, and an article that the Other had also no
contribution at all.

Given that many other packages seem to have the liberty to do so, the
question is under what conditions do I have this liberty as well.

Best wishes,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] authorship and citation

2015-10-06 Thread Adrian Dușa
On Tue, Oct 6, 2015 at 3:06 AM, Simon Urbanek 
wrote:

>
> [...]
>
> To clarify, legally, you can fork a standard GPL package and make any
> changes you want, including changing authors fields etc. If you don't own
> copyright for the entire work then you cannot change the license without
> consent from the other copyright holders, otherwise you have all the rights
> as anyone else granted by the license.
>
> However, CRAN policies go beyond that and say
>
> "Where code is copied (or derived) from the work of others (including from
> R itself), care must be taken that any copyright/license statements are
> preserved and authorship is not misrepresented.
> Preferably, an ‘Authors@R’ would be used with ‘ctb’ roles for the authors
> of such code. Alternatively, the ‘Author’ field should list these authors
> as contributors.
> Where copyrights are held by an entity other than the package authors,
> this should preferably be indicated via ‘cph’ roles in the ‘Authors@R’
> field, or using a ‘Copyright’ field (if necessary referring to an
> inst/COPYRIGHTS file)."
>
> This means that CRAN will not accept a package where you did not list all
> copyright holders in one of the Author roles, although it is legal for you
> to do so outside of CRAN.
>


Please pardon my delay, I am writing from California and it's still morning
here.
I understand very well that I need to keep the previous co-author in the
list of authors, and duly acknowledge his contribution.
I would still be interested in the formal rules of compiling the citation
file (example package Rcmdr), but for the moment it can be automatically
generated via citation("QCA").

Both of these are perfectly compliant with the CRAN policies.

As another attempt to solve the matter, I wonder if any rules would be
broken if I used the .onAttach(...) function to print a message in the line
of:

> library(QCA)

Users are encouraged to cite this package as:

  Dusa, Adrian (2015). QCA: Qualitative Comparative Analysis. R Package
Version 1.2-0,
  URL: http://cran.r-project.org/package=QCA

This is just an encouragement, not a requirement, and the official citation
file meets the CRAN policies. Would that be acceptable?

Best wishes,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] authorship and citation

2015-10-05 Thread Adrian Dușa
Dear R developers,

This is a rather peculiar question, but nevertheless I would still need an
answer for.
It is about an R package which I created (namely QCA), and from versions
1.0-0 to 1.1-4 I had a co-author.
The co-author recently withdrawn from the package development, but still
requires to be left in the authors list and be cited for the package in the
CITATION file.

Obviously, one could not require citations for further developments, but
don't know how exactly to proceed (I would like to be fair and comply to
rules).

I have three options:

1. Since the co-author withdrawn from the package development, erase his
name from the list of authors (but duly recognising his past contribution
in the package description file)

2. Preserve his name in the list of authors (with the comment "up to
version 1.1-4"), but erasing his name from the citation file

3. Keep his name both in the authors list and in the citation file
indefinitely, even though he doesn't do any development work anymore (I
have been threatened with a legal process for plagiarism if I did
otherwise).

My gut feeling is, since his name is related to the previous versions,
anyone using those versions would cite him as well, but otherwise I don't
feel comfortable citing my former co-author for the current work he hasn't
contributed to.

At this point, I would really use an advice, as on the other hand I
wouldn't want to break any regulation I might not be aware of.

Best wishes,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] single quotes in strings, example block

2014-07-31 Thread Adrian Dușa
Dear R-devel,

In the example block of the documentation for a package, I need to use
a single quote in a string:

foo - Don't know

After building the package, it gets printed as:

foo - Dont know


I read the Writing R Extensions and Parsing Rd files from top to
bottom, but didn't find any solution.

Using \verb{} doesn't help, as:
Tag \verb is invalid in a \examples block

Neither \sQuote doesn't help, as:
'\s' is an unrecognized escape in character string starting Don\s

I found the only macros which are interpreted within text are \var and
\link ... therefore I am stuck.

Any hint to other documentation I might read?
Thank you,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] single quotes in strings, example block

2014-07-31 Thread Adrian Dușa
On Thu, Jul 31, 2014 at 12:55 PM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 31/07/2014, 4:27 AM, Adrian Dușa wrote:
 Dear R-devel,
 [...]

 I don't see this.  Can you give more details, i.e. R version, how you
 printed it, etc.?

 It may be that you're not using an ascii single quote character, your
 editor has slipped in something else.

It must be related to my computer, then... it appears in the PDF file
created after the R CMD check.
It helps a lot, now that I know it is a possible MacOS matter, I will
re-post this in the R-SIG-MAC

Just in case, here are the details:

 R.version
   _
platform   x86_64-apple-darwin13.1.0
arch   x86_64
os darwin13.1.0
system x86_64, darwin13.1.0
status RC
major  3
minor  1.1
year   2014
month  07
day08
svn rev66100
language   R
version.string R version 3.1.1 RC (2014-07-08 r66100)
nickname   Sock it to Me

Thank you,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] single quotes in strings, example block

2014-07-31 Thread Adrian Dușa
On Thu, Jul 31, 2014 at 2:01 PM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 31/07/2014, 6:40 AM, Adrian Dușa wrote:
 [...]
 Okay, now I see it in the PDF output of R CMD Rd2pdf on  MacOS.

 This is an Inconsolata font issue.  If you look at the .tex file
 (available with R CMD Rd2pdf --no-clean) you'll see the quote there,
 but LaTeX doesn't display it.

 I don't remember what the fix is...

The good news is that CRAN built packages display the correct
version... it's only a local (hence not critical) issue.

Best wishes,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Soseaua Panduri nr.90
050663 Bucharest sector 5
Romania

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] reserving a package name

2014-07-11 Thread Adrian Dușa
On Fri, Jul 11, 2014 at 1:59 AM, Uwe Ligges
lig...@statistik.tu-dortmund.de wrote:
 On 10.07.2014 23:46, Adrian Dușa wrote:
 Dear All,
 [...]

 Well, you cannot reserve a package name. Actually you can choose any legal 
 name.

 The story is different if you want to submit it to BioConductor or CRAN, then 
 the name must be unique in the BioC + CRAN world. More details in the CRAN 
 policies, for example.
 And yes, it works on n a first come, first served basis there.


Thanks Uwe, this is just as I had expected.
Best wishes,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Sos. Panduri nr.90
050663 Bucharest sector 5
Romania

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] grouping list of objects in the help system

2014-07-11 Thread Adrian Dușa
On Fri, Jul 11, 2014 at 10:01 PM, Duncan Murdoch
murdoch.dun...@gmail.com wrote:
 On 11/07/2014 12:11 PM, Adrian Dușa wrote:
 [...]

 You have a little bit of control of the order of topics in the PDF reference
 manual (e.g. see the lattice package), but the HTML help page is produced by
 R, not by you, and is presented in a standard alphabetical order.  I would
 object quite strongly to people messing with that.

I understand...
Thanks Gábor as well, already knew about Hadley's (very useful) set of
packages but I thought there was something I was missing in the base R
package structure.

Best,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Sos. Panduri nr.90
050663 Bucharest sector 5
Romania

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] reserving a package name

2014-07-10 Thread Adrian Dușa
Dear All,

While wanting to create a package using the name DDI (which stands for Data
Documentation Initiative), I sent an email to the DDI Alliance and ask if
they would mind, knowing they are now in the process of copyrighting this
brand.

The answer I got was negative (due to possible infringements on copyright
issues), but they thanked me for raising this matter and now asking if a
package name can actually be reserved.

As far as I know, it works on a first come, first served basis, but I
think it is still worth asking.
Since this is not an r-help issue, I have used this list instead.

Thanks in advance for any answer,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
Sos. Panduri nr.90
050663 Bucharest sector 5
Romania

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] index.search

2014-06-16 Thread Adrian Dușa
Oh my... this is so simple, why didn't I think of that...?
Thanks a lot Martin, beautiful,
Adrian


On Mon, Jun 16, 2014 at 10:32 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 Adrian Duºa dusa.adr...@unibuc.ro
 on Mon, 16 Jun 2014 08:33:59 +0300 writes:

  On Mon, Jun 16, 2014 at 6:37 AM, Gabriel Becker
  gmbec...@ucdavis.edu wrote:
  [...]  You can. This is valid R source, so the parser
  will understand it
 
  expr = parse(text= example(deMorgan, package=QCA,
  give.lines=TRUE))
 
  You can then evaluate some or all of that expression
  using either R's own eval package or, e.g. Hadley
  Wickham's evaluate package (for your particular usecase
  evaluate will be easier I think).

  Oh, I see...! In that case I can use it, of course.  Did
  install the evaluate package, although one would expect
  some better documentation (no examples at all, especially
  at the main evaluate function).


  [...]
  index.search is an unexported function, which means that
  it is subject to change in how it behaves without notice
  or even externally available reasons. You can get it via
  :::, but again, it's really not the right tool here, and
  not safe to use in general in code you expect to keep
  working.

  Yes, I figured that much.  Of course it's not meant to be
  used in any decently working code, but I learn heavily by
  simply looking at these sort of (hidden) R functions.

  Thanks again, Adrian

 Apropos not the right tool.  I'm a bit astonished that nobody
 mentioned the fact R already provides the tool to
 automatically compare all example outputs with a previous
 version (of the packages example outputs):

 *THE* manual (every package writer should know about,
  re-read/browse about once a year, and search in for such questions):

 Writing R Extensions, section Package subdirectories
 
 (e.g. on the CRAN master in Vienna,
  http://cran.r-project.org/doc/manuals/R-exts.html#Package-subdirectories )
 says

 |If directory 'tests' has a subdirectory 'Examples' containing a file
 |'PKG-Ex.Rout.save', this is compared to the output file for running the
 |examples when the latter are checked.

 So: After an 'R CMD check PKG' you only need to take and
 keep the  PKG-Ex.Rout  file that is produced (in the
 PKG.Rcheck/ directory), and save it into PKG/tests/PKG-Ex.Rout.save
 and from then on, every time you run R CMD check PKG  the
 comparison will be made.

 Martin Maechler, ETH Zurich



-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] index.search

2014-06-16 Thread Adrian Dușa
On Mon, Jun 16, 2014 at 10:32 AM, Martin Maechler
maech...@stat.math.ethz.ch wrote:
 [...]

 Apropos not the right tool.  I'm a bit astonished that nobody
 mentioned the fact R already provides the tool to
 automatically compare all example outputs with a previous
 version (of the packages example outputs):

As appealing as this is, while trying to figure out a solution of my
own (until Martin's email), I think I've succeeded in creating a
rather useful function which allows fine grained control over each and
every line of code in the examples sections:

#
helpfiles - c(
allExpressions,
calibrate,
createMatrix,
deMorgan,
demoChart,
eqmcc,
factorize,
findSubsets,
findSupersets,
findTh,
getRow,
pof,
solveChart,
superSubset,
truthTable
)

testQCAmaybe - function() {
results - vector(mode=list, length=length(helpfiles))
names(results) - helpfiles

for (i in seq(length(helpfiles))) {
Rdfile - file.path(find.package(QCA), paste(helpfiles[i],
.Rd, sep=))
commands - parse(text=capture.output(tools::Rd2ex(Rdfile)))

results[[i]] - vector(mode=list, length=length(commands))
names(results[[i]]) - commands
for (j in seq(length(commands))) {
results[[i]][[j]] -
suppressWarnings(capture.output(eval(commands[j])))
}
}
return(results)
}
#

Using all.equal(), over the entire list or sequentially over parts of
it quickly identifies sources of difference.

I hope this helps anyone,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] index.search

2014-06-15 Thread Adrian Dușa
Dear r-devel,

I am trying to automatically check if two successive versions of a
package have the same results (i.e. code not broken), by parsing the
example sections for each function against a previously tested
version.

While trying to replicate the code from example(), I am facing an
error related with te index.search function (line 7 in the example()
code).
This is the code I am using:

example2 - function (topic, package = NULL, lib.loc = NULL,
character.only = FALSE,
give.lines = FALSE, local = FALSE, echo = TRUE, verbose =
getOption(verbose),
setRNG = FALSE, ask = getOption(example.ask), prompt.prefix =
abbreviate(topic, 6), run.dontrun = FALSE)
{
if (!character.only) {
topic - substitute(topic)
if (!is.character(topic))
topic - deparse(topic)[1L]
}
pkgpaths - find.package(package, lib.loc, verbose = verbose)
file - index.search(topic, pkgpaths, TRUE)
return(file)
}

 example2(deMorgan, package=QCA)
Error in example2(deMorgan, package = QCA) :
  could not find function index.search


I've tried an explicit library(utils), with the same result.
?index.search doesn't yield anything better...

Could anyone point me in the right direction, please?
Thank you,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] index.search

2014-06-15 Thread Adrian Dușa
Hi Gabriel,

Actually, I am not going to use multiple versions of the same package,
but always the last version. What I would like to do is:
- to run all examples from all functions in the package, saving the
results into a list
- compare the list to an already saved one (from the previous version
of the package), possibly using all.equal()

I tried using the example() function, like this:
example(deMorgan, package=QCA, give.lines=TRUE)

That returns the commands from the examples section, but if a command
is split over multiple rows I cannot use it automatically.

I could also use:
capture.output(example(deMorgan, package=QCA, ask=FALSE))

That would indeed work for the printed output, but I would also like
to compare the objects saved by the deMorgan() function. If that is
not possible, I'll probably be happy with the printed output.

I'd still be curious as to why the index.search() function cannot be
used... (it seems useful for other purposes).

Best wishes,
Adrian


On Mon, Jun 16, 2014 at 5:46 AM, Gabriel Becker gmbec...@ucdavis.edu wrote:
 Adrian,

 R isn't really designed to use multiple versions of the same package in the
 same R session. To do what you want you'll need to unload one version of the
 package before you load the next, which will work some percentage of the
 time between 50 and 100 (usually), but when it can be done it is
 relatively easy to do.

  Packages with C code will give you problems, or at least they used to. I
 haven't tried recently.  See Prof Ripley's response here: See
 https://stat.ethz.ch/pipermail/r-devel/2009-February/052229.html

 For packages that can be unloaded/reloaded safely, is there a reason you
 can't just use the existing example function with two different library
 locations (lib.loc argument) with the two package versions installed?

 ~G


 On Sun, Jun 15, 2014 at 6:22 PM, Adrian Dușa dusa.adr...@unibuc.ro wrote:

 Dear r-devel,

 I am trying to automatically check if two successive versions of a
 package have the same results (i.e. code not broken), by parsing the
 example sections for each function against a previously tested
 version.

 While trying to replicate the code from example(), I am facing an
 error related with te index.search function (line 7 in the example()
 code).
 This is the code I am using:

 example2 - function (topic, package = NULL, lib.loc = NULL,
 character.only = FALSE,
 give.lines = FALSE, local = FALSE, echo = TRUE, verbose =
 getOption(verbose),
 setRNG = FALSE, ask = getOption(example.ask), prompt.prefix =
 abbreviate(topic, 6), run.dontrun = FALSE)
 {
 if (!character.only) {
 topic - substitute(topic)
 if (!is.character(topic))
 topic - deparse(topic)[1L]
 }
 pkgpaths - find.package(package, lib.loc, verbose = verbose)
 file - index.search(topic, pkgpaths, TRUE)
 return(file)
 }

  example2(deMorgan, package=QCA)
 Error in example2(deMorgan, package = QCA) :
   could not find function index.search


 I've tried an explicit library(utils), with the same result.
 ?index.search doesn't yield anything better...

 Could anyone point me in the right direction, please?
 Thank you,
 Adrian


 --
 Adrian Dusa
 University of Bucharest
 Romanian Social Data Archive
 1, Schitu Magureanu Bd.
 050025 Bucharest sector 5
 Romania
 Tel.:+40 21 3126618 \
 +40 21 3120210 / int.101
 Fax: +40 21 3158391

 __
 R-devel@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-devel




 --
 Gabriel Becker
 Graduate Student
 Statistics Department
 University of California, Davis



-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] index.search

2014-06-15 Thread Adrian Dușa
On Mon, Jun 16, 2014 at 6:36 AM, Brian Lee Yung Rowe r...@muxspace.com wrote:
 Adrian,

 You might consider using a unit testing framework such as RUnit or testthat,
 which does this but in a more structured manner. Essentially you codify the
 behavior in a set of tests as opposed to comparing with a previous version.

Right... I knew there was a better way to handle this.
Will take a look on those and try to build something on them.

Best,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] index.search

2014-06-15 Thread Adrian Dușa
On Mon, Jun 16, 2014 at 6:37 AM, Gabriel Becker gmbec...@ucdavis.edu wrote:
 [...]
 You can. This is valid R source, so the parser will understand it

 expr = parse(text= example(deMorgan, package=QCA, give.lines=TRUE))

 You can then evaluate some or all of that expression using either R's own
 eval package or, e.g. Hadley Wickham's evaluate package (for your particular
 usecase evaluate will be easier I think).

Oh, I see...! In that case I can use it, of course.
Did install the evaluate package, although one would expect some
better documentation (no examples at all, especially at the main
evaluate function).


 [...]
 index.search is an unexported function, which means that it is subject to
 change in how it behaves without notice or even externally available
 reasons. You can get it via :::, but again, it's really not the right tool
 here, and not safe to use in general in code you expect to keep working.

Yes, I figured that much.
Of course it's not meant to be used in any decently working code, but
I learn heavily by simply looking at these sort of (hidden) R
functions.

Thanks again,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] SEXPTYPEs

2014-05-17 Thread Adrian Dușa
On Fri, May 16, 2014 at 2:15 PM, Prof Brian Ripley rip...@stats.ox.ac.ukwrote:

 On 16/05/2014 11:06, Duncan Murdoch wrote:

 [...]

 I've looked at all SEXPTYPEs in the R internals, and the only one
 specific
 to integer vectors is INTSXP. Can this be used for vectors of length
 larger
 than 32-bit?


 Yes, see the section on Long Vectors in chapter 12.


 ('Yes on a 64-bit platform.')  The type for a vector length is R_xlen_t,
 e.g. allocVector is declared as

 SEXP Rf_allocVector(SEXPTYPE, R_xlen_t);

 That is never 'long long int' 


Oh yes, of course.
I now have all the information I need.

Thanks and best wishes,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] SEXPTYPEs

2014-05-16 Thread Adrian Dușa
Dear list,

On a follow up from my previous email, I am now trying to allocate vectors
of length larger than 32-bit in C.

From the R internals documentation, I read that:
The sxpinfo header is defined as a 32-bit C structure...
and
A SEXPREC is a C structure containing the 32-bit header...

The question is: does the INTSXP allow vectors larger than 32-bit?


A test example:

//###
int *p_temp;

SEXP root = PROTECT(allocVector(VECSXP, 5));

long long int verylargeinteger;
// something to compute verylargeinteger,  32-bit

SEXP temp = SET_VECTOR_ELT(root, 0, allocVector(INTSXP, verylargeinteger));
p_temp = INTEGER(temp);
//###

temp should be a vector of length  32-bit, and p_temp should be the
pointer to that vector.

If declaring:
long long int *p_temp;
(but the compiler throws a warning with an incompatible pointer type,
because pointers should be integers of course).

I've looked at all SEXPTYPEs in the R internals, and the only one specific
to integer vectors is INTSXP. Can this be used for vectors of length larger
than 32-bit?

Thank you again, in advance,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] large integer values

2014-05-14 Thread Adrian Dușa
Dear devels,

I need to create a (short) vector in C, which contains potentially very
large numbers, exponentially to the powers of 2.

This is my test example:

lgth = 35;
int power[lgth];
power[lgth - 1] = 1;
for (j = 1; j  lgth; j++) {
power[lgth - j - 1] = 2*power[lgth - j];
}

Everything works ok until it reaches the limit of 2^32:

power: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192,
16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304,
8388608, 16777216, 33554432, 67108864, 134217728, 268435456, 536870912,
1073741824, -2147483648, 0, 0, 0

How should I declare the power vector, in order to accept integer values
larger then 2^32?

Thanks very much in advance,
Adrian


-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] large integer values

2014-05-14 Thread Adrian Dușa
Dear Prof. Ripley,

Once again, thank you for your replies.
I must confess not being a genuine C programmer, having learned how to use
C only in connection to R (and the macros provided are almost a separate
language to learn).

I'll try to read more about the types you've indicated, and will keep
trying. So far, most certainly I am not doing it right, because all of them
have the same result. Tried declaring:

uint64_t power[lgth];
and
uint_fast64_t power[lgth];
and
uintmax_t power[lgth];

but still the top threshold appears at the limit of 32-bit in all cases.

Will keep reading about these...
Best wishes,
Adrian



On Wed, May 14, 2014 at 2:45 PM, Prof Brian Ripley rip...@stats.ox.ac.ukwrote:

 On 14/05/2014 10:37, Adrian Dușa wrote:

 Dear devels,

 I need to create a (short) vector in C, which contains potentially very
 large numbers, exponentially to the powers of 2.


 This isn't an R question, except in so far that R mandates the usual
 convention of C int being 32-bit.  However

 1) You should use an unsigned integer type.
 2) Most compilers have uint64_t but C99/C11 do not require it.  They
 require uint_fast64_t and uintmax_t (which is the widest unsigned int)
 types.
 3) double will hold much larger powers, and functions like pow_di (where
 supported) or pow will compute them efficiently for you.  And R has
 R_pow_di in Rmath.h.



 This is my test example:

 lgth = 35;
 int power[lgth];
 power[lgth - 1] = 1;
 for (j = 1; j  lgth; j++) {
  power[lgth - j - 1] = 2*power[lgth - j];
 }

 Everything works ok until it reaches the limit of 2^32:

 power: 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192,
 16384, 32768, 65536, 131072, 262144, 524288, 1048576, 2097152, 4194304,
 8388608, 16777216, 33554432, 67108864, 134217728, 268435456, 536870912,
 1073741824, -2147483648, 0, 0, 0

 How should I declare the power vector, in order to accept integer values
 larger then 2^32?

 Thanks very much in advance,
 Adrian




 --
 Brian D. Ripley,  rip...@stats.ox.ac.uk
 Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
 University of Oxford, Tel:  +44 1865 272861 (self)
 1 South Parks Road, +44 1865 272866 (PA)
 Oxford OX1 3TG, UKFax:  +44 1865 272595




-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] large integer values

2014-05-14 Thread Adrian Dușa
On Wed, May 14, 2014 at 5:35 PM, Simon Urbanek
simon.urba...@r-project.orgwrote:

 [...]

 How do you print them? It seems like you're printing 32-bit value instead
 ... (powers of 2 are simply shifts of 1).


I am simply using Rprintf():

long long int power[lgth];
power[lgth - 1] = 1;
Rprintf(power: %d, power[lgth - 1]);
for (j = 1; j  lgth; j++) {
power[lgth - j - 1] = 2*power[lgth - j];
Rprintf(, %d, power[lgth - j - 1]);
}


Basically, I need them in reversed order (hence the inverse indexing), but
the values are nonetheless the same.
Adrian

PS: also tried long long int, same result...

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] large integer values

2014-05-14 Thread Adrian Dușa
On Wed, May 14, 2014 at 6:24 PM, Martyn Plummer plumm...@iarc.fr wrote:

 [...]

 Your numbers are being coerced to int when you print them. Try the
 format , %lld instead.


Oh my goodness, this was a printing issue...!
(feeling embarrassed, but learned something new)

Problem solved, thanks very much all,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] C headers

2014-02-11 Thread Adrian Dușa
Apologies for my late reply, I've been away for a few days.
Everything is working fine now, thank you again for your advice.
Best wishes,
Adrian


On Thu, Feb 6, 2014 at 10:49 AM, Adrian Dușa dusa.adr...@unibuc.ro wrote:

 Dear list,

 Just upgraded to MacOS Mavericks, fresh install of R 3.0.2 and trying to
 install a previous version of my QCA package (the most recent one source
 file, which passed the R CMD check --as-cran with R 3.0.1)

 I seem to have some difficulties in the C code, apparently it doesn't find
 some headers (please see below):

 
 $ R CMD INSTALL --no-multiarch QCA.history/QCA_1.1-1.tar.gz
 * installing to library
 ‘/Library/Frameworks/R.framework/Versions/3.0/Resources/library’
 * installing *source* package ‘QCA’ ...
 ** libs
 llvm-gcc-4.2 -arch x86_64 -std=gnu99
 -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG
  -I/usr/local/include-fPIC  -mtune=core2 -g -O2  -c allSol.c -o allSol.o
 In file included from allSol.c:1:
 /Library/Frameworks/R.framework/Resources/include/R.h:28:20: error:
 stdlib.h: No such file or directory
 /Library/Frameworks/R.framework/Resources/include/R.h:29:73: error:
 stdio.h: No such file or directory
 In file included from
 /Library/Frameworks/R.framework/Resources/include/R.h:30,
  from allSol.c:1:
 /usr/llvm-gcc-4.2/bin/../lib/gcc/i686-apple-darwin11/4.2.1/include/limits.h:15:25:
 error: limits.h: No such file or directory
 In file included from allSol.c:1:
 /Library/Frameworks/R.framework/Resources/include/R.h:31:18: error:
 math.h: No such file or directory
 In file included from
 /Library/Frameworks/R.framework/Resources/include/R.h:44,
  from allSol.c:1:
 /Library/Frameworks/R.framework/Resources/include/R_ext/RS.h:26:47: error:
 string.h: No such file or directory
 In file included from allSol.c:2:
 /Library/Frameworks/R.framework/Resources/include/Rinternals.h:862: error:
 expected declaration specifiers or ‘...’ before ‘FILE’
 /Library/Frameworks/R.framework/Resources/include/Rinternals.h:865: error:
 expected declaration specifiers or ‘...’ before ‘FILE’
 allSol.c: In function ‘allSol’:
 allSol.c:62: warning: implicit declaration of function ‘div’
 allSol.c:62: error: request for member ‘quot’ in something not a structure
 or union
 allSol.c:62: error: request for member ‘rem’ in something not a structure
 or union
 allSol.c:213: error: request for member ‘quot’ in something not a
 structure or union
 allSol.c:213: error: request for member ‘rem’ in something not a structure
 or union
 allSol.c:279: error: request for member ‘quot’ in something not a
 structure or union
 allSol.c:279: error: request for member ‘rem’ in something not a structure
 or union
 make: *** [allSol.o] Error 1
 ERROR: compilation failed for package ‘QCA’
 * removing
 ‘/Library/Frameworks/R.framework/Versions/3.0/Resources/library/QCA’
 * restoring previous
 ‘/Library/Frameworks/R.framework/Versions/3.0/Resources/library/QCA’
 

 It doesn't find stdlib.h, stdio.h, limits.h, math.h, string.h (etc,
 basically all important ones).

 I have Xcode version 5.0.2 installed, do I need anything else installed in
 the system?

 Thank you,
 Adrian

 --
 Adrian Dusa
 University of Bucharest
 Romanian Social Data Archive
 1, Schitu Magureanu Bd.
 050025 Bucharest sector 5
 Romania
 Tel.:+40 21 3126618 \
 +40 21 3120210 / int.101
 Fax: +40 21 3158391




-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] C headers

2014-02-06 Thread Adrian Dușa
Dear list,

Just upgraded to MacOS Mavericks, fresh install of R 3.0.2 and trying to
install a previous version of my QCA package (the most recent one source
file, which passed the R CMD check --as-cran with R 3.0.1)

I seem to have some difficulties in the C code, apparently it doesn't find
some headers (please see below):


$ R CMD INSTALL --no-multiarch QCA.history/QCA_1.1-1.tar.gz
* installing to library
‘/Library/Frameworks/R.framework/Versions/3.0/Resources/library’
* installing *source* package ‘QCA’ ...
** libs
llvm-gcc-4.2 -arch x86_64 -std=gnu99
-I/Library/Frameworks/R.framework/Resources/include -DNDEBUG
 -I/usr/local/include-fPIC  -mtune=core2 -g -O2  -c allSol.c -o allSol.o
In file included from allSol.c:1:
/Library/Frameworks/R.framework/Resources/include/R.h:28:20: error:
stdlib.h: No such file or directory
/Library/Frameworks/R.framework/Resources/include/R.h:29:73: error:
stdio.h: No such file or directory
In file included from
/Library/Frameworks/R.framework/Resources/include/R.h:30,
 from allSol.c:1:
/usr/llvm-gcc-4.2/bin/../lib/gcc/i686-apple-darwin11/4.2.1/include/limits.h:15:25:
error: limits.h: No such file or directory
In file included from allSol.c:1:
/Library/Frameworks/R.framework/Resources/include/R.h:31:18: error: math.h:
No such file or directory
In file included from
/Library/Frameworks/R.framework/Resources/include/R.h:44,
 from allSol.c:1:
/Library/Frameworks/R.framework/Resources/include/R_ext/RS.h:26:47: error:
string.h: No such file or directory
In file included from allSol.c:2:
/Library/Frameworks/R.framework/Resources/include/Rinternals.h:862: error:
expected declaration specifiers or ‘...’ before ‘FILE’
/Library/Frameworks/R.framework/Resources/include/Rinternals.h:865: error:
expected declaration specifiers or ‘...’ before ‘FILE’
allSol.c: In function ‘allSol’:
allSol.c:62: warning: implicit declaration of function ‘div’
allSol.c:62: error: request for member ‘quot’ in something not a structure
or union
allSol.c:62: error: request for member ‘rem’ in something not a structure
or union
allSol.c:213: error: request for member ‘quot’ in something not a structure
or union
allSol.c:213: error: request for member ‘rem’ in something not a structure
or union
allSol.c:279: error: request for member ‘quot’ in something not a structure
or union
allSol.c:279: error: request for member ‘rem’ in something not a structure
or union
make: *** [allSol.o] Error 1
ERROR: compilation failed for package ‘QCA’
* removing
‘/Library/Frameworks/R.framework/Versions/3.0/Resources/library/QCA’
* restoring previous
‘/Library/Frameworks/R.framework/Versions/3.0/Resources/library/QCA’


It doesn't find stdlib.h, stdio.h, limits.h, math.h, string.h (etc,
basically all important ones).

I have Xcode version 5.0.2 installed, do I need anything else installed in
the system?

Thank you,
Adrian

-- 
Adrian Dusa
University of Bucharest
Romanian Social Data Archive
1, Schitu Magureanu Bd.
050025 Bucharest sector 5
Romania
Tel.:+40 21 3126618 \
+40 21 3120210 / int.101
Fax: +40 21 3158391

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel