Re: [Rd] Underscores in package names

2019-08-15 Thread Abby Spurdle
> While
> package names are not functions, using dots in package names
> encourages the use of dots in functions, a dangerous practice.

"dangerous"...?
I can't understand the necessity of RStudio and Tiny-Verse affiliated
persons to repeatedly use subjective and unscientific phrasing.

Elegant, Advanced, Dangerous...
At UseR, there was even "Advanced Use of your Favorite IDE".

This is not science.
This is marketing.

There's nothing dangerous about it other than your belief that it's
dangerous.
I note that many functions in the stats package use dots in function names.
Your statement implies that the stats package is badly designed, which it
is not.
Out of 14,800-ish packages on CRAN, very few of them are even close to the
standard set by the stats package, in my opinion.

And as noted by other people in this thread, changing naming policies could
interfere with a lot of software "out there", which is dangerous.

> Dots in
> names is also one of the common stones cast at R as a language, as
> dots are used for object oriented method dispatch in other common
> languages.

I don't think the goal is to copy other OOP systems.
Furthermore, some shells use dot as the current working directory and Java
uses dots in package namespaces.
And then there's regular expressions...

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread William Dunlap via R-devel
Using a non-capturing group, "(?:...)" instead of "(...)", simplifies my
example a bit

> x <- c("Groucho ", "", "Harpo")
> strcapture("([[:alpha:]]+)?(?: *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?", x,
proto=data.frame(Name=character(), Address=character(),
stringsAsFactors=FALSE))
 Name  Address
1 Groucho grou...@marx.com
2   ch...@marx.com
3   Harpo

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Aug 15, 2019 at 1:04 PM William Dunlap  wrote:

> I don't care much for regmatches and haven't tried strextract, but I think
> replacing the character(0) by NA_character_ is almost always inappropriate
> if the match information comes from gregexpr.
>
> I think strcapture() does a pretty good job of what I think you are trying
> to do.  Perhaps adding an argument to map no match to NA instead of ""
> would give you just what you wanted.
>
> > x <- c("Groucho ", "", "Harpo")
> > d <- strcapture("([[:alpha:]]+)?( *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?",
> x, proto=data.frame(Name=character(), Junk=character(),
> Address=character(), stringsAsFactors=FALSE))
> > d[c("Name", "Address")]
>  Name  Address
> 1 Groucho grou...@marx.com
> 2   ch...@marx.com
> 3   Harpo
> > str(.Last.value)
> 'data.frame':   3 obs. of  2 variables:
>  $ Name   : chr  "Groucho" "" "Harpo"
>  $ Address: chr  "grou...@marx.com" "ch...@marx.com" ""
> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com
>
>
> On Thu, Aug 15, 2019 at 11:31 AM Cyclic Group Z_1 <
> cyclicgroup...@yahoo.com> wrote:
>
>> I do think keeping the default behavior is desirable for backwards
>> compatibility; my suggestion is not to change default behavior but to add
>> an optional argument that allows a different behavior. Although this can be
>> implemented in a user-defined function, retaining empty matches facilitates
>> programmatic use, and seems to be something that should be available in
>> base R. It is available, for example, in MATLAB, a comparable array
>> language.
>>
>> Alternatively, perhaps a nomatch (or maybe emptymatch) argument in the
>> spirit of `[.data.table`? That is, an argument nomatch where nomatch = NULL
>> (the default) results in drops for vector outputs and character(0) for list
>> outputs and nomatch = NA results in insertion of NA_character_, and nomatch
>> = '' results in insertion of empty string.
>>
>> I can submit proposed patch code if others think this is a good idea.
>>
>> What are your thoughts on the proposed alteration to (currently
>> nonexported) strextract? I assume (maybe wrongly) that the plan is to
>> eventually export that function.
>>
>> Thank you,
>> CG
>>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread William Dunlap via R-devel
I don't care much for regmatches and haven't tried strextract, but I think
replacing the character(0) by NA_character_ is almost always inappropriate
if the match information comes from gregexpr.

I think strcapture() does a pretty good job of what I think you are trying
to do.  Perhaps adding an argument to map no match to NA instead of ""
would give you just what you wanted.

> x <- c("Groucho ", "", "Harpo")
> d <- strcapture("([[:alpha:]]+)?( *<([[:alpha:]. ]+@[[:alpha:]. ]+)>)?",
x, proto=data.frame(Name=character(), Junk=character(),
Address=character(), stringsAsFactors=FALSE))
> d[c("Name", "Address")]
 Name  Address
1 Groucho grou...@marx.com
2   ch...@marx.com
3   Harpo
> str(.Last.value)
'data.frame':   3 obs. of  2 variables:
 $ Name   : chr  "Groucho" "" "Harpo"
 $ Address: chr  "grou...@marx.com" "ch...@marx.com" ""
Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Aug 15, 2019 at 11:31 AM Cyclic Group Z_1 
wrote:

> I do think keeping the default behavior is desirable for backwards
> compatibility; my suggestion is not to change default behavior but to add
> an optional argument that allows a different behavior. Although this can be
> implemented in a user-defined function, retaining empty matches facilitates
> programmatic use, and seems to be something that should be available in
> base R. It is available, for example, in MATLAB, a comparable array
> language.
>
> Alternatively, perhaps a nomatch (or maybe emptymatch) argument in the
> spirit of `[.data.table`? That is, an argument nomatch where nomatch = NULL
> (the default) results in drops for vector outputs and character(0) for list
> outputs and nomatch = NA results in insertion of NA_character_, and nomatch
> = '' results in insertion of empty string.
>
> I can submit proposed patch code if others think this is a good idea.
>
> What are your thoughts on the proposed alteration to (currently
> nonexported) strextract? I assume (maybe wrongly) that the plan is to
> eventually export that function.
>
> Thank you,
> CG
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread Cyclic Group Z_1 via R-devel
I do think keeping the default behavior is desirable for backwards 
compatibility; my suggestion is not to change default behavior but to add an 
optional argument that allows a different behavior. Although this can be 
implemented in a user-defined function, retaining empty matches facilitates 
programmatic use, and seems to be something that should be available in base R. 
It is available, for example, in MATLAB, a comparable array language.

Alternatively, perhaps a nomatch (or maybe emptymatch) argument in the spirit 
of `[.data.table`? That is, an argument nomatch where nomatch = NULL (the 
default) results in drops for vector outputs and character(0) for list outputs 
and nomatch = NA results in insertion of NA_character_, and nomatch = '' 
results in insertion of empty string.

I can submit proposed patch code if others think this is a good idea.

What are your thoughts on the proposed alteration to (currently nonexported) 
strextract? I assume (maybe wrongly) that the plan is to eventually export that 
function.

Thank you,
CG

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Rf_defineVar(symbol, R_UnboundValue, environment) questions

2019-08-15 Thread William Dunlap via R-devel
While poking around the C++ code in the dplyr package I ran across the idiom
   Rf_defineVar(symbol, R_UnboundValue, environment)
to [sort of] remove 'symbol' from 'environment'

Using it makes the R-level functions objects(), exists(), and get()
somewhat inconsistent and I was wondering if that was intended.  E.g.,  use
SHLIB to make something from the following C code that dyn.load can load
into R

% cat defineVarAsUnboundValue.c
#include 
#include 

SEXP defineVarAsUnboundValue(SEXP name, SEXP envir)
{
Rf_defineVar(name, R_UnboundValue, envir);
return R_NilValue;
}
erratic:bill:292% R-3.6.1 CMD SHLIB defineVarAsUnboundValue.c
gcc -std=gnu99 -I"/home/R/R-3.6.1/lib64/R/include" -DNDEBUG
-I/usr/local/include  -fpic  -g -O2  -c defineVarAsUnboundValue.c -o
defineVarAsUnboundValue.o
gcc -std=gnu99 -shared -L/home/R/R-3.6.1/lib64/R/lib -L/usr/local/lib64 -o
defineVarAsUnboundValue.so defineVarAsUnboundValue.o
-L/home/R/R-3.6.1/lib64/R/lib -lR
erratic:bill:293% R-3.6.1 --quiet --vanilla
> dyn.load("defineVarAsUnboundValue.so")
> envir <- list2env(list(One=1, Two=2))
> objects(envir)
[1] "One" "Two"
>
> .Call("defineVarAsUnboundValue", quote(Two), envir)
NULL
> objects(envir)
[1] "One"
> objects(envir, all.names=TRUE) # is "Two" a 'hidden' object?
[1] "One" "Two"
> exists("Two", envir=envir, inherits=FALSE)
[1] TRUE
> get("Two", envir=envir, inherits=FALSE) # get fails when exists says ok
Error in get("Two", envir = envir, inherits = FALSE) :
  object 'Two' not found

Should Rf_defineVar(sym, R_UnboundValue, envir) remove sym from envir?

Bill Dunlap
TIBCO Software
wdunlap tibco.com

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread William Dunlap via R-devel
Changing the default behavior of regmatches would break its use with
gregexpr, where
the number of matches per input element faries, so a zero-length character
vector
makes more sense than NA_character_.

> x <- c("John Doe", "e e cummings", "Juan de la Madrid")
> m <- gregexpr("[A-Z]", x)
> regmatches(x,m)
[[1]]
[1] "J" "D"

[[2]]
character(0)

[[3]]
[1] "J" "M"

> vapply(.Last.value, function(x)paste(paste0(x, "."),collapse=""), "")
[1] "J.D." ".""J.M."

(We don't want e e cummings initials mapped to "NA.")

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Thu, Aug 15, 2019 at 12:15 AM Cyclic Group Z_1 via R-devel <
r-devel@r-project.org> wrote:

> A very common use case for regmatches is to extract regex matches into a
> new column in a data.frame (or data.table, etc.) or otherwise use the
> extracted strings alongside the input. However, the default behavior is to
> drop empty matches, which results in mismatches in column length if
> reassignment is done without subsetting.
>
> For consistency with other R functions and compatibility with this use
> case, it would be nice if regmatches did not automatically drop empty
> matches and would instead insert an NA_character_ value (similar to
> stringr::str_extract). This alternative regmatches could be implemented
> through an optional drop argument, a new function, or mentioned in the
> documentation (a la resample in ?sample).
>
> Alternatively, at the moment, there is a non-exported function strextract
> in utils which is very similar to stringr::str_extract. It would be great
> if this function, once exported, were to include a drop argument to prevent
> dropping positions with no matches.
>
> An example solution (last option):
>
> strextract <- function(pattern, x, perl = FALSE, useBytes = FALSE, drop =
> T) {
>  m <- regexec(pattern, x, perl=perl, useBytes=useBytes)
>  result <- regmatches(x, m)
>
>  if(isTRUE(drop)){
>  unlist(result)
>  } else if(isFALSE(drop)) {
>  unlist({result[lengths(result)==0] <- NA_character_; result})
>  } else {
>  stop("Invalid argument for `drop`")
>  }
> }
>
> Based on Ricardo Saporta's response to How to prevent regmatches drop non
> matches?
>
> --CG
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Underscores in package names

2019-08-15 Thread Jim Hester
Martin,

Thank you for discussing this amongst R-core and for detailing the
R-core discussion here.

Some specific examples where having underscores available would have
been useful.

1. My primerTree package (2013) was originally primer_tree, but I had
to change the name to camelCase to comply with the check requirements.
Using camelCase in the package name makes reading code jarring, as the
functions all use snake_case.
2. The widely used testthat package would likely be called test_that,
like the corresponding function within the package. This also
highlights one of the drawbacks of the current situation, without
separators the package name is more difficult to read, does it have
two t's or three?
3. The assertive suite of packages use `.` for separation, e.g.
`assertive.base`, `assertive.datetimes` etc. but all functions within
the packages use `_` separators, again likely this was done out of
necessity rather than desire.

There are many more I am sure, these were some that came immediately
to mind. More important than the specific examples is the opportunity
cost of having this restriction, which we cannot really quantify.

Using dots for separators has a number of practical problems.
Functions using dots are ambiguous, e.g. is `as.data.frame()` a
regular function, an `as.data()` method for a `frame` object, or an
`as()` method for a `data.frame` object? And in fact regular functions
can be accidentally promoted to S3 methods by defining a S3 generic,
which does actually happen in real life, confusing users [1]. While
package names are not functions, using dots in package names
encourages the use of dots in functions, a dangerous practice. Dots in
names is also one of the common stones cast at R as a language, as
dots are used for object oriented method dispatch in other common
languages.

The prevalence of dotted functions is the only major naming convention
which is steadily decreasing over time. It now accounts for only
around 15% of all function names when looking at all 94 Million lines
of code currently available on CRAN (See Figure 2. from Yen et. al.
[2]).

Thanks again for the public discussion,

Jim

[1]: https://twitter.com/_ColinFay/status/1105579764797108230
[2]: https://osf.io/preprints/socarxiv/ts2wq/

On Wed, Aug 14, 2019 at 5:16 AM Martin Maechler
 wrote:
>
> > Duncan Murdoch
> > on Fri, 9 Aug 2019 20:23:28 -0400 writes:
>
> > On 09/08/2019 4:37 p.m., Gabriel Becker wrote:
> >> Duncan,
> >>
> >>
> >> On Fri, Aug 9, 2019 at 1:17 PM Duncan Murdoch  >> > wrote:
> >>
> >> On 09/08/2019 2:41 p.m., Gabriel Becker wrote:
> >> > Note that this proposal would make mypackage_2.3.1 a valid
> >> *package name*,
> >> > whose corresponding tarball name might be mypackage_2.3.1_2.3.2
> >> after a
> >> > patch. Yes its a silly example, but why allow that kind of ambiguity?
> >> >
> >> CRAN already has a package named "FuzzyNumbers.Ext.2", whose tarball is
> >> FuzzyNumbers.Ext.2_3.2.tar.gz, so I think we've already lost that game.
> >>
> >>
> >> I suppose technically 2 is a valid version number for a package (?) so 
> I
> >> suppose you have me there. But as Ben pointed out while I was writing
> >> this, all I can really say is that in practice they read to me (as
> >> someone who has administered R on a large cluster and written
> >> build-system software for it) as substantially different levels of
> >> ambiguity. I do acknowledge, as Ben does, that yes a more complex
> >> regular expression/splitting algorithm can be written that would handle
> >> the more general package names. I just don't personally see a 
> motivation
> >> that justifies changing something this fundamental (even if it is both
> >> narrow and was initially more or less arbitrarily chosen) about R at
> >> this late date.
> >>
> >> I guess at the end of the day, I guess what I'm saying is that breaking
> >> and changing things is sometimes good, but if we're going to rock the
> >> boat personally I'd want to do so going after bigger wins than this 
> one.
> >> Thats just my opinion though.
>
> > Sorry, I wasn't clear.  I agree with you.  I was just saying that the
> > particular argument based on ugly tarball names isn't the reason.
>
> > Duncan Murdoch
>
> Thank you (and Gabe).
>
> We have had some R core internal "talk" about Jim Hester's
> suggestion (of adding underscores to the allow characters in
> package names).
> Duncan had already given a good reason why such a change would be problematic
> (the underscore being used as unique separator of package name
>  and version in source and binary package archives),
> and with Jim's offer to find and provide patches for all places
> this is used in the R sources, we've convinced ourselves that
> there is much more code "out there", notably 'devops' code in
> scripts, which currently 

Re: [Bioc-devel] Biocondutor Developers Forum

2019-08-15 Thread Mike Smith
Just a reminder/confirmation that this is happening today (August 15th) at
09:00 PDT/ 12:00 EDT / 18:00 CEST using BlueJeans and can be joined via:

https://bluejeans.com/136043474?src=join_info (Meeting ID: 136 043 474)

The call should last no more than one hour.

We also have a slack channel for more discussion, sharing of slides, etc at:

https://app.slack.com/client/T35G93A5T/CLUJWDQF4/details/info

Today's agenda will be a bit flexible based on who joins and how things
proceed, but I will present recent changes to & discussion on biomaRt and
hopefully Aaron Lun will give an update on the status of some of his recent
work.  If you have anything burning to discuss or present, please let me
know.

Cheers,
Mike

On Thu, 8 Aug 2019 at 10:31, Mike Smith  wrote:

> Dear all,
>
> I am excited to announce a new initiative within the Bioconductor project
> - the Bioconductor Developers' Forum.  This monthly teleconference is
> intended as a platform for Bioconductor developers to describe existing
> software infrastructure to other members of the BioC community, to present
> plans for future developments, and discuss changes that may impact other
> developers or software tools within the Bioconductor.
>
> The intended audience is anyone interested in software development and
> infrastructure, whether you're a member of the BioC core team with
> responsibility for multiple packages, or you're just getting started with
> creating a Bioconductor package.
>
> Our first meeting will take place on Thursday 15th August at 09:00 PDT/
> 12:00 EDT / 18:00 CEST using BlueJeans and can be joined via:
>
> https://bluejeans.com/136043474?src=join_info (Meeting ID: 136 043 474)
>
> More details on the intentions for this initiative, including a list of
> proposed topics, can found at:
>
>
> https://www.huber.embl.de/users/msmith/Bioconductor-Developers-Forum-Proposal.pdf
>
>
> The agenda for the first meeting is still open, so if you have a proposal
> or a particular topic you wish to prioritise please reach out to me.
>
> Best wishes,
>
> Mike
>
>
>

[[alternative HTML version deleted]]

___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Rd] Feature request: non-dropping regmatches/strextract

2019-08-15 Thread Cyclic Group Z_1 via R-devel
A very common use case for regmatches is to extract regex matches into a new 
column in a data.frame (or data.table, etc.) or otherwise use the extracted 
strings alongside the input. However, the default behavior is to drop empty 
matches, which results in mismatches in column length if reassignment is done 
without subsetting.

For consistency with other R functions and compatibility with this use case, it 
would be nice if regmatches did not automatically drop empty matches and would 
instead insert an NA_character_ value (similar to stringr::str_extract). This 
alternative regmatches could be implemented through an optional drop argument, 
a new function, or mentioned in the documentation (a la resample in ?sample). 

Alternatively, at the moment, there is a non-exported function strextract in 
utils which is very similar to stringr::str_extract. It would be great if this 
function, once exported, were to include a drop argument to prevent dropping 
positions with no matches. 

An example solution (last option):

strextract <- function(pattern, x, perl = FALSE, useBytes = FALSE, drop = T) {
 m <- regexec(pattern, x, perl=perl, useBytes=useBytes)
 result <- regmatches(x, m)
 
 if(isTRUE(drop)){
 unlist(result)
 } else if(isFALSE(drop)) {
 unlist({result[lengths(result)==0] <- NA_character_; result})
 } else {
 stop("Invalid argument for `drop`")
 }
}

Based on Ricardo Saporta's response to How to prevent regmatches drop non 
matches?

--CG

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel