Re: [Rd] Feature request: non-dropping regmatches/strextract

2019-08-16 Thread Cyclic Group Z_1 via R-devel
Using strcapture seems like a great workaround for use cases of this kind, at 
least in base R. I agree as well that filling with NA for regmatches(..., 
gregexpr(...)) makes less sense, given the outputs are lists and thus are 
retained in the list.  Also, I suppose in the meantime the stringr package can 
be used when non-dropping vector outputs are desired.

However, I do think that non-dropping regex string extraction/matching in 
vector outputs from regmatches(..., regexpr(...)) or strextract would be a 
great (optional) design feature to have in base R for sake of consistency with 
the rest of the language (missing values, denoted by NA, are generally not 
dropped from vectors elsewhere and seem to agree conceptually with empty 
matches) and would help R to reach greater feature parity with MATLAB and 
Pandas in this respect (granted, Pandas is not technically a language on its 
own).

Although I have written personal wrappers and used stringr to accomplish the 
non-dropping behavior in the past, I have nevertheless found the behavior of 
base R string operations mildly astonishing (in the sense of POLA) and think 
others may have as well. As the stringr documentation puts it, "they lag behind 
the string operations in other programming languages, so that some things that 
are easy to do in languages like Ruby or Python are rather hard to do in R." 
Since consistent, robust string operations are often a standard base feature of 
other data science and scientific programming languages, I think this minor 
change would be a great improvement to the language and hopefully help promote 
adoption of R, especially given the surge in text-based data analysis in recent 
years.

Alternatively, although I generally don't use the Tidyverse packages very 
often, stringr seems like a great candidate for inclusion in base or 
recommended R if the R Core team and the package developer see it fitting (just 
a suggestion and probably a long shot). 

However, I will try not to belabor this point further. In any case, thank you!

Best,CG
CG
[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Documenting else's greed

2019-08-16 Thread Duncan Murdoch

On 16/08/2019 12:36 p.m., Hugh Parsonage wrote:

I was initially pretty shocked by the result in this question:
https://stackoverflow.com/questions/57527434/when-do-i-need-parentheses-around-an-if-statement-to-control-the-sequence-of-a-f

Briefly, the following returns 0, not 3 as might be expected:

if (TRUE) {
 0
} else {
 2
} + 3

At first I thought it the question was simply one of syntax
precedence, but I believe the result is too surprising to not warrant
note in the documentation of Control. I believe the documentation
should highlight that the `alt.expr` is demarcated by a semicolon or
newline and the end of a *statement*, not a closing brace per se.

Perhaps in the paragraph starting 'Note that it is a common mistake to
forget to put braces...' it should end with. "Note too that it is the
end of a *statement*, not a closing brace per se, that determines
where `alt.expr` ends. Thus if (cond) {0} else {2} + 2 means if (cond)
{0} else {2 + 2} not {if (cond) {0} else {2}} + 2."




I agree this is surprising, and should perhaps be pointed out in the 
docs, but I don't think your suggestion is quite right.  { 2 } + 3 is a 
legal expression.  It doesn't have to be the end of a statement that 
limits the alt.expr, e.g. this could be one big statement:



 { if (TRUE) {
  0
} else {
  2
} + 3 }

What ends alt.expr is a token that isn't collected as part of alt.expr, 
not just a semicolon (which separates statements) or a newline.  I don't 
know how many of those there are, but the list would include at least

semicolon, newline, }, ), ], and maybe others.

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Bioc-devel] mixOmics issue url

2019-08-16 Thread Turaga, Nitesh
Please push to both branches  "master" and "RELEASE_3_9" on the remote 
"git@git,bioconductor.org:packages/mixOmics".

Do you have access to the Bioconductor git server? 

Have you followed this documentation and help page? 
http://bioconductor.org/developers/how-to/git/push-to-github-bioc/

Best,

Nitesh 

> On Aug 16, 2019, at 3:57 PM, Bioconductor Seattle 
>  wrote:
> 
> hi Nitesh -- can you help Abolfazi? I don't think he's updated the devel 
> repository, probably just the github repo...
> 
> On 8/15/19, 10:42 PM, "Abolfazl JalalAbadi"  
> wrote:
> 
> 
>Hi,
> 
> 
>I am one of the developers in mixOmics team.
> 
> 
>We have updated our BugReports URL for devel version but we would like to 
> change it on Bioconductor's page as well as we're no more active on Bitbucket 
> (we should have done it with the release but unfortunately it was missed). It 
> would help users avoid duplication
> of issues as well. Is there a way to have it changed to 
> https://github.com/mixOmicsTeam/mixOmics/issues on RELEASE_3_9?
> 
> 
>Best,
> 
> 
>Al
> 
> 
>Al J Abadi
> 
>Research Fellow in Computational Genomics
> 
> 
>Melbourne Integrative Genomics Bldg 184 (Old Microbiology Building)
>The University of Melbourne
> 
> 
> 
> 
> 
> 
> 
> 
> 



This email message may contain legally privileged and/or confidential 
information.  If you are not the intended recipient(s), or the employee or 
agent responsible for the delivery of this message to the intended 
recipient(s), you are hereby notified that any disclosure, copying, 
distribution, or use of this email message is prohibited.  If you have received 
this message in error, please notify the sender immediately by e-mail and 
delete this email message from your computer. Thank you.
___
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


[Rd] Documenting else's greed

2019-08-16 Thread Hugh Parsonage
I was initially pretty shocked by the result in this question:
https://stackoverflow.com/questions/57527434/when-do-i-need-parentheses-around-an-if-statement-to-control-the-sequence-of-a-f

Briefly, the following returns 0, not 3 as might be expected:

if (TRUE) {
0
} else {
2
} + 3

At first I thought it the question was simply one of syntax
precedence, but I believe the result is too surprising to not warrant
note in the documentation of Control. I believe the documentation
should highlight that the `alt.expr` is demarcated by a semicolon or
newline and the end of a *statement*, not a closing brace per se.

Perhaps in the paragraph starting 'Note that it is a common mistake to
forget to put braces...' it should end with. "Note too that it is the
end of a *statement*, not a closing brace per se, that determines
where `alt.expr` ends. Thus if (cond) {0} else {2} + 2 means if (cond)
{0} else {2 + 2} not {if (cond) {0} else {2}} + 2."

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] ALTREP wrappers and factors

2019-08-16 Thread Bemis, Kylie
Using R_tryWrap() at the C-level works perfectly and does what I need. Thanks, 
Gabe!

Yes, my reference count is maxed (I assume) because I am using 
MARK_NOT_MUTABLE().

Which makes me think I may want to return a wrapped matter/ALTREP object by 
default, so the user can set the names() and dim(), etc., without triggering a 
potentially-costly duplication. The data payload is intended to be immutable, 
but the attributes aren’t.

Decoupling the attributes and other metadata from the data payload seems like a 
good thing to have generally.

Are there any potential drawbacks of using R_tryWrap() that I should know 
about, besides an additional method dispatch happening somewhere?

Thanks again!

~~~
Kylie Ariel Bemis
Khoury College of Computer Sciences
Northeastern University
kuwisdelu.github.io










On Jul 19, 2019, at 4:00 AM, Gabriel Becker 
mailto:gabembec...@gmail.com>> wrote:

Hi Jiefei and Kylie,

Great to see people engaging with the ALTREP framework and identifying places 
we may need more tooling. Comments inline.

On Thu, Jul 18, 2019 at 12:22 PM King Jiefei 
mailto:szwj...@gmail.com>> wrote:

If that is the case and you are 100% sure the reference number should be 1
for your variable *y*, my solution is to call *SET_NAMED *in C++ to reset
the reference number. Note that you need to unbind your local variable
before you reset the number. To return an unbound SEXP,  the C++ function
should be placed at the end of your *matter:::as.altrep *function. I don't
know if there is any simpler way to do that and I'll be happy to see any
opinion.

So as far as I know, manually setting the NAMED value on any SEXP the garbage 
collector is aware of is a direct violation of C-API contract and not something 
that package code should ever be doing.

Its not at all clear to me that you can ever be 100% sure that the reference 
number should be 1 when it is not currently one for an R object that exists at 
the R-level (as opposed to only in pure C code). Sure, maybe the object is 
created within the body of your R function instead of being passed in, but what 
if someone is debugging your function and assigns the value to the global 
environment using <<-  for later inspection; now  you have an invalidly low 
NAMED value, ie you have a segfault coming. I know of no way for you to prevent 
this or even know it has happened.



On Thu, Jul 18, 2019 at 3:28 AM Bemis, Kylie 
mailto:k.be...@northeastern.edu>>
wrote:

> Hello,
>
> I’m experimenting with ALTREP and was wondering if there is a preferred
> way to create an ALTREP wrapper vector without using
> .Internal(wrap_meta(…)), which R CMD check doesn’t like since it uses an
> .Internal() function.

So there is the .doSortWrap  (and its currently inexplicably identical clone 
.doWrap) function in base, which is an R level function that calls down to 
.Internal(wrap_meta(...)), which you can use, but it doesn't look general 
enough for what  I think you need (it was written for things that have just 
been sorted, thus the name). Specifically, its not able to indicate that things 
are of unknown sortedness as currently written.  If matter vectors are 
guaranteed to be sorted for some reason, though, you can use this. I'll talk to 
Luke about whether we want to generalize this, it would be easy to have this 
support the full space of metadata for wrappers and be a general purpose 
wrapper-maker, but that isn't what it is right now.

At the C-level, it looks like we do make R_tryWrap available (it appears in 
Rinternals.h, and not within a USE_RINTERNALS section),so you can call that 
from your own C(++) code. This creates a wrapper that has no metadata on it (or 
rather it has metadata but  the metadata indicates that no special info is 
known about the vector).

>
> I was trying to create a factor that used an ALTREP integer, but
> attempting to set the class and levels attributes always ended up
> duplicating and materializing the integer vector. Using the wrapper avoided
> this issue.
>
> Here is my initial ALTREP integer vector:
>
> > fc0 <- factor(c("a", "a", "b"))
> >
> > y <- matter::as.matter(as.integer(fc0))
> > y <- matter:::as.altrep(y)
> >
> > .Internal(inspect(y))
> @7fb0ce78c0f0 13 INTSXP g0c0 [NAM(7)] matter vector (mode=3, len=3, mem=0)
>
> Here is what I get without a wrapper:
>
> > fc1 <- structure(y, class="factor", levels=levels(x))
> > .Internal(inspect(fc1))
> @7fb0cae66408 13 INTSXP g0c2 [OBJ,NAM(2),ATT] (len=3, tl=0) 1,1,2
> ATTRIB:
>   @7fb0ce771868 02 LISTSXP g0c0 []
> TAG: @7fb0c80043d0 01 SYMSXP g1c0 [MARK,LCK,gp=0x4000] "class" (has
> value)
> @7fb0c9fcbe90 16 STRSXP g0c1 [NAM(7)] (len=1, tl=0)
>   @7fb0c80841a0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached]
> "factor"
> TAG: @7fb0c8004050 01 SYMSXP g1c0 [MARK,NAM(7),LCK,gp=0x4000] "levels"
> (has value)
> @7fb0d1dd58c8 16 STRSXP g0c2 [MARK,NAM(7)] (len=2, tl=0)
>   @7fb0c81bf4c0 09 CHARSXP g1c1 [MARK,gp=0x61] [ASCII] [cached] "a"

Re: [Rd] Any plans for ALTREP lists (VECSXP)?

2019-08-16 Thread Bemis, Kylie
Thanks for the suggestions, everyone.

Is it not a pressing issue requiring alternatives, since the ‘matter_list’ 
object already behaves like a list, and I am just looking for a way to present 
a native R list (VECSXP) when a regular list is required.

In this case (in my typical use case), the ‘matter_list’ is homogenous and I 
use it like a ragged array; however, in general each element could be a 
different atomic vector type (specifically raw, logical, integer, or double).

Here, as.altrep() is an S4 method for converting my custom ‘matter’-class 
out-of-memory objects into their native R representations using ALTREP.

Seems to work well for the ‘matter' vectors, matrices, and arrays, where it 
just .Call()s my C function for making the corresponding ALTREP object, but the 
lists were giving me trouble because there I use lapply() to extract and 
uncompress the ‘matter_list’ metadata for each list element into a separate S4 
‘matter_vec’ out-of-memory vector, each of which is then used to create an 
ALTREP object for the corresponding list element. So it gets costly...

The cost is mostly in re-creating all of the metadata as regular R objects that 
end up occupying the R_altrep_data1() spot for all of the individual list 
elements. If I could make an ALTREP list, I could leave the metadata as-is and 
avoid all of that.

Anyway, not a pressing issue for me either, just something I noticed where 
having an ALTREP list could be useful, so I was wondering if it was in the 
plans, which Luke answered.

Thanks,

-Kylie

On Jul 23, 2019, at 8:27 PM, Gabriel Becker 
mailto:gabembec...@gmail.com>> wrote:

Hi Kylie,

Is it a list with only numerics in it? (I only see REALSXPs there, but 
obviously inspect isn't showing all of them). If so, you could load it up into 
one big vector and then also keep partitioning information around. Bioconductor 
does this (see ?IRanges::CompressedList ). The potential benefit here being 
that the underlying large vector could then be a big out-of-memory altrep. How 
helpful this would be depends somewhat on what you want to do with it, of 
course, but it is something that comes to mind.

Also, I would expect some overhead but that seems like a lot (without having 
done super much in the way of benchmarking). What exactly is as.altrep doing?

Best,
~G

On Tue, Jul 23, 2019 at 9:54 AM Michael Lawrence via R-devel 
mailto:r-devel@r-project.org>> wrote:
Hi Kylie,

As an alternative in the short term, you could consider deriving from
S4Vector's List class, implementing the getListElement() method to
lazily create the objects.

Michael

On Tue, Jul 23, 2019 at 9:09 AM Bemis, Kylie 
mailto:k.be...@northeastern.edu>> wrote:
>
> Hello,
>
> I was wondering if there were any plans for ALTREP lists (VECSXP)?
>
> It seems to me that they could be supported in a similar way to how ALTSTRING 
> works, with Elt() and Set_elt() methods, or would there be some problems with 
> that I’m not seeing due to lists not being atomic vectors?
>
> I was taking an approach of converting each list element (of a file-based 
> list data structure) to an ALTREP representation to build up an “ALTREP list”.
>
> This seems fine for shorter lists with large elements, but I noticed that for 
> longer lists with smaller elements, this could be far more time-consuming 
> than simply reading the entire list into memory and returning a non-ALTREP 
> list:
>
> > x
> <34840 length> matter_list :: out-of-memory list
> (1.1 MB real | 543.3 MB virtual)
>
> > system.time(y <- as.list(x))
>user  system elapsed
>   1.116   2.175   5.053
>
> > system.time(z <- as.altrep(x))
>user  system elapsed
>  36.295   4.717  41.216
>
> > .Internal(inspect(y))
> @108255000 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>   @7f9044d9fc00 14 REALSXP g1c7 [MARK] (len=1129, tl=0) 
> 404.093,404.096,404.099,404.102,404.105,...
>   @7f9044d25e00 14 REALSXP g1c7 [MARK] (len=890, tl=0) 
> 409.924,409.927,409.931,409.934,409.937,...
>   @7f9044da6000 14 REALSXP g1c7 [MARK] (len=1878, tl=0) 
> 400.3,400.303,400.306,400.309,400.312,...
>   @7f9031a6b000 14 REALSXP g1c7 [MARK] (len=2266, tl=0) 
> 402.179,402.182,402.185,402.188,402.191,...
>   @7f9031a77a00 14 REALSXP g1c7 [MARK] (len=1981, tl=0) 
> 403.021,403.024,403.027,403.03,403.033,...
>   ...
>
> > .Internal(inspect(z))
> @10821 19 VECSXP g1c7 [MARK,NAM(7)] (len=34840, tl=0)
>   @7f904eea7660 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, 
> len=1129, mem=0)
>   @7f9050347498 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, len=890, 
> mem=0)
>   @7f904d286b20 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, 
> len=1878, mem=0)
>   @7f904fd38820 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, 
> len=2266, mem=0)
>   @7f904c75ce90 14 REALSXP g1c0 [MARK,NAM(7)] matter vector (mode=4, 
> len=1981, mem=0)
>   ...
>
> In this situation, it would be much faster and simpler for me to return a 
> theoretical ALTREP list that serves SEXP elements on-demand, similar to 

Re: [Rd] Underscores in package names

2019-08-16 Thread Kevin Wright
I've heard the arguments against dots in names many times. The t.test and
data.frame examples have been repeated so often that it has become accepted
as gospel.  In my experience, evidence of any actual problems is fairly
limited (almost non-existent).  I've been happily using dots in function
names for 20 (sigh) years and only 1 time had an unanticipated S3 class
kick in.  I find the "." much easier to type than "_" because of the
proximity of the keys to the home-row on the keyboard.

On Thu, Aug 15, 2019 at 8:00 AM Jim Hester  wrote:

> Martin,
>
> Thank you for discussing this amongst R-core and for detailing the
> R-core discussion here.
>
> Some specific examples where having underscores available would have
> been useful.
>
> 1. My primerTree package (2013) was originally primer_tree, but I had
> to change the name to camelCase to comply with the check requirements.
> Using camelCase in the package name makes reading code jarring, as the
> functions all use snake_case.
> 2. The widely used testthat package would likely be called test_that,
> like the corresponding function within the package. This also
> highlights one of the drawbacks of the current situation, without
> separators the package name is more difficult to read, does it have
> two t's or three?
> 3. The assertive suite of packages use `.` for separation, e.g.
> `assertive.base`, `assertive.datetimes` etc. but all functions within
> the packages use `_` separators, again likely this was done out of
> necessity rather than desire.
>
> There are many more I am sure, these were some that came immediately
> to mind. More important than the specific examples is the opportunity
> cost of having this restriction, which we cannot really quantify.
>
> Using dots for separators has a number of practical problems.
> Functions using dots are ambiguous, e.g. is `as.data.frame()` a
> regular function, an `as.data()` method for a `frame` object, or an
> `as()` method for a `data.frame` object? And in fact regular functions
> can be accidentally promoted to S3 methods by defining a S3 generic,
> which does actually happen in real life, confusing users [1]. While
> package names are not functions, using dots in package names
> encourages the use of dots in functions, a dangerous practice. Dots in
> names is also one of the common stones cast at R as a language, as
> dots are used for object oriented method dispatch in other common
> languages.
>
> The prevalence of dotted functions is the only major naming convention
> which is steadily decreasing over time. It now accounts for only
> around 15% of all function names when looking at all 94 Million lines
> of code currently available on CRAN (See Figure 2. from Yen et. al.
> [2]).
>
> Thanks again for the public discussion,
>
> Jim
>
> [1]: https://twitter.com/_ColinFay/status/1105579764797108230
> [2]: https://osf.io/preprints/socarxiv/ts2wq/
>
> On Wed, Aug 14, 2019 at 5:16 AM Martin Maechler
>  wrote:
> >
> > > Duncan Murdoch
> > > on Fri, 9 Aug 2019 20:23:28 -0400 writes:
> >
> > > On 09/08/2019 4:37 p.m., Gabriel Becker wrote:
> > >> Duncan,
> > >>
> > >>
> > >> On Fri, Aug 9, 2019 at 1:17 PM Duncan Murdoch <
> murdoch.dun...@gmail.com
> > >> > wrote:
> > >>
> > >> On 09/08/2019 2:41 p.m., Gabriel Becker wrote:
> > >> > Note that this proposal would make mypackage_2.3.1 a valid
> > >> *package name*,
> > >> > whose corresponding tarball name might be mypackage_2.3.1_2.3.2
> > >> after a
> > >> > patch. Yes its a silly example, but why allow that kind of
> ambiguity?
> > >> >
> > >> CRAN already has a package named "FuzzyNumbers.Ext.2", whose
> tarball is
> > >> FuzzyNumbers.Ext.2_3.2.tar.gz, so I think we've already lost that
> game.
> > >>
> > >>
> > >> I suppose technically 2 is a valid version number for a package
> (?) so I
> > >> suppose you have me there. But as Ben pointed out while I was
> writing
> > >> this, all I can really say is that in practice they read to me (as
> > >> someone who has administered R on a large cluster and written
> > >> build-system software for it) as substantially different levels of
> > >> ambiguity. I do acknowledge, as Ben does, that yes a more complex
> > >> regular expression/splitting algorithm can be written that would
> handle
> > >> the more general package names. I just don't personally see a
> motivation
> > >> that justifies changing something this fundamental (even if it is
> both
> > >> narrow and was initially more or less arbitrarily chosen) about R
> at
> > >> this late date.
> > >>
> > >> I guess at the end of the day, I guess what I'm saying is that
> breaking
> > >> and changing things is sometimes good, but if we're going to rock
> the
> > >> boat personally I'd want to do so going after bigger wins than
> this one.
> > >> Thats just my opinion though.
> >
> > > 

[Rd] Deparsing raw vectors with names

2019-08-16 Thread Stepan

Hello,

deparse(structure(as.raw(1), .Names=c('a'))) gives "as.raw(c(a = 0x01))" 
in 3.5.1 and later (actually tested on 3.5.1, 3.6.1 and devel). If you 
execute as.raw(c(a = 0x01)), you get a raw vector without the names.


If the stripping of the names is the correct behavior of as.raw (I would 
think it is), then perhaps deparse should use the old behavior for raw 
vectors. On R-3.4.1 it gives: "structure(as.raw(0x01), .Names = 'a')".


Regards,
Stepan

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Underscores in package names

2019-08-16 Thread Jan Gorecki
Thanks Abby and Martin,

In every company I worked using R - 3 in total - there was at least
one (up to ~10) processes designed (dev and implemented) to depend on
current package naming scheme, having underscore as separator of
package name and its version. From my experience I believe this is a
(very?) common practice. I also use it myself.
Arguments for having underscore in package names are simply weak.
Dot in function names is an entirely different issue caused by S3
dispatch. No need to look at other OOP languages, it is R.
Package name is not a function name.
There are no practical gains.
There is nothing wrong in having package "a.pkg" and function "a_pkg()".

Regards,
Jan Gorecki


On Fri, Aug 16, 2019 at 1:20 AM Abby Spurdle  wrote:
>
> > While
> > package names are not functions, using dots in package names
> > encourages the use of dots in functions, a dangerous practice.
>
> "dangerous"...?
> I can't understand the necessity of RStudio and Tiny-Verse affiliated
> persons to repeatedly use subjective and unscientific phrasing.
>
> Elegant, Advanced, Dangerous...
> At UseR, there was even "Advanced Use of your Favorite IDE".
>
> This is not science.
> This is marketing.
>
> There's nothing dangerous about it other than your belief that it's
> dangerous.
> I note that many functions in the stats package use dots in function names.
> Your statement implies that the stats package is badly designed, which it
> is not.
> Out of 14,800-ish packages on CRAN, very few of them are even close to the
> standard set by the stats package, in my opinion.
>
> And as noted by other people in this thread, changing naming policies could
> interfere with a lot of software "out there", which is dangerous.
>
> > Dots in
> > names is also one of the common stones cast at R as a language, as
> > dots are used for object oriented method dispatch in other common
> > languages.
>
> I don't think the goal is to copy other OOP systems.
> Furthermore, some shells use dot as the current working directory and Java
> uses dots in package namespaces.
> And then there's regular expressions...
>
> [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel