Re: [R] dplyr, group_by and selective action according to each group

2024-05-24 Thread avi.e.gross
Although there may well be many ways to do what is being asked for with the 
tidyverse, sometimes things are simple enough to do the old-fashioned way.

The request seems to have been to do something to all rows in ONE specific 
group but was phrased in the sense of wanting to know which group your 
functionality is being called in.

What grouping gains you is more worthwhile if you are interested in doing 
things groupwise across all groups such as getting a count of how many are in 
each group or some vectorized operation like getting the mean or SD of a column 
or whatever.

But for the purposes mentioned here, consider a lower-tech alternative such as 
this.

Instead of group_by(gr) which is a trivial group, consider using other dplyr 
predicates like "mutate" to trigger on all rows that meet a condition like gr 
having a value of 3 as in:

mutate(DATAFRAME, result=ifelse(gr==3, f(), whatever)

The above is not a full-blown example but something similar can be tailored to 
do quite a bit. As an example, if gr specified whether the measure in another 
column was in meters or feet, you could convert that other column to meters if 
gr was == "feet" and on a second line of code, convert the "gr" value in that 
row to now say "meters" so that in the end, they are all in meters. 

Of course if you have a more complex use case such as grouping by multiple 
variables, and having the same (or different) logic for multiple values, this 
can get more complex.  But if you want to get working code sooner, consider 
using methods you understand rather than seeing if someone in the tidyverse 
universe has already created exactly what you want.

There are things you can access such as if you want to keep only the first 
record in each group, you can filter by row_number==1, or use the do() function.

The dplyr (and related packages) keep evolving and functionality may be 
deprecated, but check this page for ideas:

https://dplyr.tidyverse.org/reference/group_data.html

Some of those may give you access to which rows are in each group and to other 
ways to approach the problem somewhat from outside after grouping so you can 
apply your function to the subset of the rows you want.






-Original Message-
From: R-help  On Behalf Of Bert Gunter
Sent: Friday, May 24, 2024 6:52 PM
To: Laurent Rhelp 
Cc: r-help@r-project.org
Subject: Re: [R] dplyr, group_by and selective action according to each group

Laurent:
As I don't use dplyr, this won't help you, but I hope you and others may
find it entertaining anyway.

If I understand you correctly (and ignore this if I have not), there are a
ton of ways to do this in base R, including using switch() along the lines
you noted in your post. However, when the functions get sufficiently
complicated or numerous, it may be useful to store them in a named list and
use the names to call them in some sort of loop. Here I have just used your
anonymous functions in the list, but of course you could have used already
existing functions instead.

## your example
df_test <- data.frame( x1=1:9, x2=1:9, gr=rep(paste0("gr",1:3),each=3))

## function list with the relevant names
funcs <- list(gr1 = \(x)x+1, gr2 = \(x)0, gr3 = \(x)x+2)
## Alternatively you could do this if you had many different functions:
## funcs <- list(\(x)x+1, \(x)0,  \(x)x+2)
## names(funcs) <- sort(unique(df_test$gr))
## note that sort() is unnecessary in your example, but I think that it
would
## be helpful if you had a lot of different groups and corresponding
functions
## to track.

##Now the little loop to call the functions
df_test$x1 <- with(df_test,{
   for(nm in names(funcs))
  x1[gr == nm] <- funcs[[nm]](x1[gr == nm])
   x1}
)

#
Note that the above uses one of the features that I really like about R --
functions are full first class objects that can be thrown around and
handled just like any other "variables" . So funcs[[nm]](whatever) seems to
me to be a natural way to choose and call the function you want. You may
disagree, of course.

Caveat: I make no claims about the efficiency or lack thereof of the above.

Cheers,
Bert

On Fri, May 24, 2024 at 12:35 PM Laurent Rhelp  wrote:

> Dear RHelp-list,
>
> Using dplyr and the group_by approach on a dataframe, I want to be
> able to apply a specific action according to the group name. The code
> bellow works, but I am not able to write it in a more esthetic way using
> dplyr. Can somebody help me to find a better solution ?
>
> Thank you
>
> Best regards
>
> Laurent
>
> df_test <- data.frame( x1=1:9, x2=1:9, gr=rep(paste0("gr",1:3),each=3))
> df_test  <-  df_test %>% dplyr::group_by(gr) %>%
>group_modify(.f=function(.x,.y){
>  print(paste0("Nom du groupe : ",.y[["gr"]]))
>  switch(as.character(.y[["gr"]])
> , gr1 = {.x[,"x1"] <- .x[,"x1"]+1}
> , gr2 = {.x[,"x1"] <- 0}
> , gr3 = {.x[,"x1"] <- .x[,"x1"]+2}
> , {stop(paste0('The group ',.y[["gr"]]," is not taken into
> 

Re: [R] Listing folders on One Drive

2024-05-20 Thread avi.e.gross
Nick,

As Jeff said, we don't know what you tried and what did not work.

There are built-in and probably package versions but have you tried
something like list.files()?

You can tweak it to get the files you want by doing something like:

-change directory to HERE
- here.files <- list.files(recursive=TRUE)
- change directory to THERE
- here.files <- list.files(recursive=TRUE)

Now compare what you have in the two places. There are many ways but if all
the files in or, if recursive, deeper, are the same, you have them all. Of
course this does not test to see if the files are identical. Or you could
use sorting and comparing to see if you can isolate what is missing, or use
set operations that test for intersection or something like"

Missing <- setdiff(here.files, there.files)

And in that case, also test the reverse.

The function setequal() test for equality but won't tell you what is
missing.

Obviously, if your method generates full, not relative file names, you could
process the names to remove a fixed prefix.

-Original Message-
From: R-help  On Behalf Of Nick Wray
Sent: Monday, May 20, 2024 9:37 AM
To: r-help@r-project.org
Subject: [R] Listing folders on One Drive

Hello I have lots of folders of individual Scottish river catchments on my
uni One Drive.  Each folder is labelled with the river name eg "Tay" and
they are all in a folder named "Scotland"
I want to list the folders on One Drive so that I can cross check that I
have them all against a list of folders on my laptop.
Can I somehow use list.files() - I've tried various things but none seem to
work...
Any help appreciated
Thanks Nick Wray

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] x[0]: Can '0' be made an allowed index in R?

2024-04-23 Thread avi.e.gross
I think it might be fair to say that the discussion is becoming a tad wider
than whether you want your data structures indexed starting from 0 or 1.

Programming languages have added functionality to do many things on top of
the simple concept of accessing or changing the nth element one at a time.
If someone wants to make a parallel way to handle things, it may work for
some uses but not others and that is fine as long as the user of the package
knows the limits and is careful.

In R, as has been pointed out, there are behaviors associated with indexing
that are NOT the only way they could have been done. These are design
choices made long ago.

Negated numbers are allowed in some contexts and have meaning. Ranges of
numbers can be specified. Rows and columns both must be accommodated. And
since quite a bit of the language gets written in languages like C/C++ that
may not anticipate such a change, it MIGHT BE that there will be errors if
the way being zero-based is introduced does not work well.

I have not looked at the packages mentioned and it may well be that they are
very carefully crafted in a way that takes everything into account and even
behaves well if you have operations that work jointly on both 0 and 1 based
objects. They may even have a way to identify which way a particular object
is numbered.

As pointed, out programming languages differ. I believe JavaScript at one
point had a data structure that was sort of like a dictionary that treated
integer indexes as a sort of indexed array and did interesting things if
some numbers were missing or added and sort of reindexed it when used as an
array. Features can be very powerful and at the same time be potentially
dangerous. 

But the original request was not only reasonable but also something others
had worked on and produced packages. Chances are good that many of the
questions people had were considered and either implemented well or
documented as something to avoid.

I would like to add that R is evolving as is the way people use it. An
example might be how you can make and use data.frames (including new
variants) with variable list components and do transformations. I have seen
problems when using such features because some operations on them may not do
what is expected. In some cases, I suspect that ideas that are accepted and
moved back into the base after careful consideration are safer. You could
have a global setting for whether ALL operations are  0-based or 1-based or
have several data types to choose from that work as you want. But some of it
is simply engrained in peoples minds as in that programmers sometimes call a
function and index the results as in minmax(x)[2] and that digit 2 is
hardwired in with the assumption they are getting the max value stored in
the second position and ignoring the min value stored in the first position.
Flipping a switch at the top of the program leaves such errors in place.
Many packages include all kinds of shortcuts like that and a global switch
idea may leave many dangling packages.

For the many of us with experiences in many programming languages, all kinds
of mental stumbling blocks abound unless we accept each language as a world
of its own in which different design and implementation choices were made.
If you take a language like Python and ask how to support 1-based objects,
you might well come up with a rather different solution including using
hooks such as the dunder variables  they made easily accessible, making a
subclass, using a decorator and so on.

Having said that, consider that  there are many successful examples in R
where packages have augmented doing things in ways different than base R and
different does not have to be better or worse but can have advantages for
some people in some situations.

An example often discussed here is the a group of packages called the
tidyverse. Some aspects of it are more appealing to me than the parts of
base R I might use to do the same kinds of things and some don't. Within
reason, many parts can be mixed and matched. There do seem to be places the
match is not so great and places where they added too much functionality
that many users do not need and which complicated earlier ways of doing
simpler things. And, of course, it can divide a community when others cannot
understand your code without lots of additional education and new ways of
thinking.

A well-designed way to designate individual data structures to be 0-based
would indeed be nice to have, and apparently it is available. I have seen
other packages mentioned here that in which I worked a bit with the one
making the package and saw how many hooks had to be dealt with to handle
having multiple ways of being "missing" as is available in some programs
outside R. Many implementations only work with known cases and can break
when combined, perhaps with other newer changes to R or packages. But, if
used within a carefully designed environment they may do what you need and
preserve 

Re: [R] x[0]: Can '0' be made an allowed index in R?

2024-04-21 Thread avi.e.gross
Hans,

It is a good question albeit R made a conscious decision to have indices
that correspond to things like row numbers and thus start with 1. Some
others have used a start of zero but often for reasons more related to
making use of all combinations of the implementation of integers on many
machines where starting with 1 would only allow use of the 255 of the 256
combinations available in 8 bits and so on.

My solution when I needed to start with zero is simply to do things like
x[n+1] or have a small function that does an increment like x[inc(n)] that
makes very clear what is happening.

You have been given several possible ways closer to what you want and that
may work for you but may confuse anyone else ever looking at your code so I
would add some comments or documentation explaining your non-standard use.

But do note the possibility of issues with any solution if you use other
indexing methods like x[5:8] which might not be done consistently. 

And if you were using a 2-D structure like a matrix or data.frame, would
your columns also be starting with column 0 as in mydata[0,0] to get the
first item in the first row, or are columns still 1-based while rows are
not?

Beware some solutions may be incomplete and may result in subtle errors.
Just routinely adding 1 seems safer as you know what you will get.


-Original Message-
From: R-help  On Behalf Of Hans W
Sent: Sunday, April 21, 2024 3:56 AM
To: R help project 
Subject: [R] x[0]: Can '0' be made an allowed index in R?

As we all know, in R indices for vectors start with 1, i.e, x[0] is not a
correct expression. Some algorithms, e.g. in graph theory or combinatorics,
are much easier to formulate and code if 0 is an allowed index pointing to
the first element of the vector.

Some programming languages, for instance Julia (where the index for normal
vectors also starts with 1), provide libraries/packages that allow the user
to define an index range for its vectors, say 0:9 or 10:20 or even negative
indices.

Of course, this notation would only be feasible for certain specially
defined vectors. Is there a library that provides this functionality?
Or is there a simple trick to do this in R? The expression 'x[0]' must
be possible, does this mean the syntax of R has to be twisted somehow?

Thanks, Hans W.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] any and all

2024-04-13 Thread avi.e.gross
Yes, Lennart, I have been looking at doing something like you say by using the 
vectorized ways the tidyverse is now offering. 

For my application, if the naming was consistent, an approach like yours is 
good, albeit has to be typed carefully. When I cannot control the names  but 
have to lump them into multiple groups that each require at least one to not be 
NA, I would need to probably spell them out rather than checking what it ends 
with.

Since the default for filter() is to do an AND when it sees a comma and another 
condition, I can simply repeat the if_any() with changes several times without 
using an if_all() but I have concerns over handing over fairly complex code to 
anyone who may modify it a bit later and have problems.

So, I am tempted to just use things they already know such as one of the 
ifelse() variations that are vectorized.

The tidyverse keeps evolving and regularly replacing old functionality that 
seemed to work fine with new and improved but extremely abstract functionality 
that is both very powerful and at the same time can be a pain to use or even 
explain when you just want to do something fairly simple. 

And I notice how some packages have been trying to move away from using delayed 
interpretation features or removing functions people use (or deprecating them) 
as so many things in R were cobbled together and then constantly changed. Much 
of the tidyverse is an example of functionality which might have been designed 
into the base portion of a new language as compared to add-ons to a language 
they want to keep simpler and more stable. It took a long while just to add a 
native pipe to R but once done, I wonder if many other ideas and functions 
people use regularly through packages, might also enter the mainstream.

Your code reminds me of the importance of choosing names, as in column names, 
that have patterns built-in to allow some abstract operations. In your example, 
applied to the kind of data I am being given, I can even imagine a step that 
re-arranges the order of the columns in such a way that the groupings I am 
talking about are adjacent. (I mean a group of columns where at least one is 
non-NA.) Such groups can use methods of specifying all at once as in first:last 
even when I have no control over the names.

Thanks for the feedback.

Avi

-Original Message-
From: Lennart Kasserra  
Sent: Saturday, April 13, 2024 3:17 AM
To: avi.e.gr...@gmail.com; murdoch.dun...@gmail.com; toth.de...@kogentum.hu; 
r-help@r-project.org
Subject: Re: [R] any and all

Hi Avi,


As Dénes Tóth has rightly diagnosed, you are building an "all or 
nothing" filter. However, you do not need to explicitly spell out all 
columns that you want to filter for; the "tidy" way would be to use a 
helper function like `if_all()` or `if_any()`. Consider this example (I 
hope I understand your intentions correctly):

```

library(dplyr)


data <- tribble(
   ~first.a, ~first.b, ~first.c,
   1L,1L,   0L,
   NA,   1L,   0L,
   1L,0L,   NA,
   NA,   NA,   1L
)

```

Let's say we only want to keep rows that have a non-missing value for 
either `first.a` or `first.b` (or hypothetical later generations like 
`second.a` and `second.b` etc.):

```

data |>
   filter(if_any(ends_with(c(".a", ".b")), \(x) !is.na(x)))

```

So: `filter()` (keep observations) `if_any` of the columns ending with 
.a or .b is not `NA` (we have to wrap `!is.na` into an anonymous 
function for it to be a valid argument type). This would yield

```

# A tibble: 3 × 3
   first.a first.b first.c
   
1   1   1   0
2  NA   1   0
3   1   0  NA

```

Discarding only the row where both of them are missing. Another way of 
writing this would be

```

data |>
   filter(!if_all(ends_with(c(".a", ".b")), is.na))

```

i.e. don't keep rows where all columns ending in .a or .b are `NA`, 
which returns the same result. Hope this helps,

Lennart Kasserra

Am 12.04.24 um 21:52 schrieb avi.e.gr...@gmail.com:
> Base R has generic functions called any() and all() that I am having trouble
> using.
>   
> It works fine when I play with it in a base R context as in:
>   
>> all(any(TRUE, TRUE), any(TRUE, FALSE))
> [1] TRUE
>> all(any(TRUE, TRUE), any(FALSE, FALSE))
> [1] FALSE
>   
> But in a tidyverse/dplyr environment, it returns wrong answers.
>   
> Consider this example. I have data I have joined together with pairs of
> columns representing a first generation and several other pairs representing
> additional generations. I want to consider any pair where at least one of
> the pair is not NA as a success. But in order to keep the entire row, I want
> all three pairs to have some valid data. This seems like a fairly common
> reasonable thing often needed when evaluating data.
>   
> So to make it very general, I chose to do something a bit like this:
>   
> result <- filter(mydata,
>   all(
> any(!is.na(first.a), 

Re: [R] Just for your (a|be)musement.

2024-04-13 Thread avi.e.gross
Richard,

The code you show is correct and it does not include where you say ChatGTP
explained it was 33/26 rather than the correct 42/216.

I gather it have the proper fraction for the other two scenarios.

So what would cause such a localized error?

The method chosen is to try all possible combinations and count how many
times they add up to a winning combo and then divide by the possible
combinations. Why would that go wrong?

As a human, at least most days, I would calculate using the first part of
the formula it used to just get a numerator.

> sum(rowSums(expand.grid(faces, faces, faces)) %in% c(7,11))
[1] 42

So is this perhaps a case where ChatGTP, instead, did some kind of reverse
engineering and used something else to estimate what 0.194 might be as a
fraction with an integral numerator and denominator? There are often many
ways to do this including some that allow an approximation of 7/36 or 42/170
or so. Include the fact that floating point representations are not exact
and it may be it used an algorithm in which it neglected to say it should be
a fraction with a denominator of 216.

If my guess is correct, you could argue this is partially an issue of
comprehension. Humans, to a limited extent, can look at a problem and see
that a solution to a second issue is already almost visible in what they did
to solve the first. A machine who searches data they were fed, may see your
question as having several parts and solves them sequentially using advice
from two places and has an imperfect understanding of what it read in the
second place, or perhaps there was an error there.

This reminds me a bit of the way some computer languages have a sort of
in-line assignment operator in python that was added. The walrus operator
allows parts of an expression to be evaluated and the result stored in a
variable for use elsewhere in that expression or later.  So it you have an
expression that effectively needs to do some calculation two or more times,
such as the sum of some numbers or their average, you do it once and ask for
the number to be saved and then just state you want it elsewhere in the
expression.

Or consider something like the quadratic formula where you calculate the
square root part twice because the two answer are + or - the same thing.

It is often easy for humans to see and extract such commonalities but
programs written so far often are not really designed with examining things
this way.

I note that the above method can be a tad slow and expensive for very large
cases like rolling a hundred dice as you end up making a huge data structure
in which all the entries must sum above 11 if the minimum roll is 1. Again,
a human may realize this and skip using the method. The chances of rolling
100 die and getting a 7 or 11 or even a 99 are absolutely zero. For some
other problems, such as rolling 8 die, there are only solutions for 11, not
for 7. And, rather than generating all possible combinations in advance,
there may be an algorithm that builds a tree with pruning so that a first
toss of 6 or 5 or anything where having all remaining dice at 1 each makes
it go too high (such as 12, makes it skip any further exploration in that
direction. If the current sum is such that the only valid solution is all
ones, again, you can declare that result and prune any further progress
along the tree.

But would chatGTP be flexible enough to suggest using such an algorithm or
know to switch to it for dice above some level?

Can anyone explain better what went wrong? I have heard statements before
about how some of these pseudo-AI make simple mathematical errors and this
sounds like one.



-Original Message-
From: R-help  On Behalf Of Richard O'Keefe
Sent: Saturday, April 13, 2024 5:54 AM
To: R Project Help 
Subject: [R] Just for your (a|be)musement.

I recently had the chance to read a book explaining how to use
ChatGPT with a certain programming language.  (I'm not going
to describe the book any more than that because I don't want to
embarrass whoever wrote it.)

They have appendix material showing three queries to ChatGPT
and the answers.  Paraphrased, the queries are "if I throw 2 (3, 4)
fair dice, what is the probability I get 7 or 11?  Show the reasoning."
I thought those questions would make a nice little example,
maybe something for Exercism or RosettaCode.  Here's the R version:

> faces <- 1:6
> sum(rowSums(expand.grid(faces, faces)) %in% c(7,11))/6^2
[1] 0.222
> sum(rowSums(expand.grid(faces, faces, faces)) %in% c(7,11))/6^3
[1] 0.194
> sum(rowSums(expand.grid(faces, faces, faces, faces)) %in% c(7,11))/6^4
[1] 0.09567901

Here's where it gets amusing.  ChatGPT explained its answers with
great thoroughness.  But its answer to the 3 dice problem, with what
was supposedly a list of success cases, was quite wrong.  ChatGPT
claimed the answer was 33/216 instead of 42/216.

Here's where it gets bemusing.  Whoever wrote the book included
the interaction in the book WITHOUT CHECKING the 

Re: [R] any and all

2024-04-12 Thread avi.e.gross
Thanks everyone and any/all reading this. I think I got my answer. And, no, I 
suspected I did not need to provide a very specific example, at least not yet.

The answer is that my experiment was not vectorized while using dplyr verbs 
like mutate do their work implicitly in a vectorized way. 

This is in some ways similar to the difference between using an if/else type of 
statement or using the ifelse() function in base R that works on all elements 
of a vector at once. Some changes to R have been looking at not allowing a 
vector of length greater than 1 to be used in contexts where formerly only the 
first element was read and used and the rest ignored.

Dénes asked some other questions about dplyr that I can reply to in private 
(and if he wishes in Hungarian or other languages we share) as this forum is 
mainly focused on base R and not on various packages and apparently especially 
not on the tidyverse that some see as being closely related to a company. 
Speaking for myself, I see no reason to be wedded to base R and use what I like.

Thanks again. I knew it was simple. And, if anyone cares, I can now look more 
carefully for functions that do what any/all do but are vectorized because that 
is basically what I did in my example code where I primitively created new 
columns in vectorized fashion to impact all rows "at once" as that is one major 
style of doing things in R. 

Having said that, it is indeed an issue to be cautious with in R as sometimes 
vectors being used may not be the same length and may even be automatically 
extended to be so. I also often program in Python and we had a discussion there 
of what exactly some modules should do if given multiple vectors (or lists or 
other data structures including generators) and zip the results into tuples 
when one or another runs out first.

I note that using | versus || and similarly & and && often messes up programs 
if used wrong. A vectorized any/all and other such verbs as at_least_n() can be 
very useful but only when used carefully.



-Original Message-
From: Dénes Tóth  
Sent: Friday, April 12, 2024 6:43 PM
To: Duncan Murdoch ; avi.e.gr...@gmail.com; 
r-help@r-project.org
Subject: Re: [R] any and all

Hi Avi,

As Duncan already mentioned, a reproducible example would be helpful to 
assist you better. Having said that, I think you misunderstand how 
`dplyr::filter` works: it performs row-wise filtering, so the filtering 
expression shall return a logical vector of the same length as the 
data.frame, or must be a single boolean value meaning "keep all" (TRUE) 
or "drop all" (FALSE). If you use `any()` or `all()`, they return a 
single boolean value, so you have an all-or-nothing filter in the end, 
which is probably not what you want.

Note also that you do not need to use `mutate` to use `filter` (read 
?dpylr::filter carefully):
```
filter(
   .data = mydata,
   !is.na(first.a) | !is.na(first.b),
   !is.na(second.a) | !is.na(second.b),
   !is.na(third.a) | !is.na(third.b)
)
```

Or you can use `base::subset()`:
```
subset(
   mydata,
   (!is.na(first.a) | !is.na(first.b))
   & (!is.na(second.a) | !is.na(second.b))
   & (!is.na(third.a) | !is.na(third.b))
)
```

Regards,
Denes

On 4/12/24 23:59, Duncan Murdoch wrote:
> On 12/04/2024 3:52 p.m., avi.e.gr...@gmail.com wrote:
>> Base R has generic functions called any() and all() that I am having 
>> trouble
>> using.
>> It works fine when I play with it in a base R context as in:
>>> all(any(TRUE, TRUE), any(TRUE, FALSE))
>> [1] TRUE
>>> all(any(TRUE, TRUE), any(FALSE, FALSE))
>> [1] FALSE
>> But in a tidyverse/dplyr environment, it returns wrong answers.
>> Consider this example. I have data I have joined together with pairs of
>> columns representing a first generation and several other pairs 
>> representing
>> additional generations. I want to consider any pair where at least one of
>> the pair is not NA as a success. But in order to keep the entire row, 
>> I want
>> all three pairs to have some valid data. This seems like a fairly common
>> reasonable thing often needed when evaluating data.
>> So to make it very general, I chose to do something a bit like this:
> 
> We can't really help you without a reproducible example.  It's not 
> enough to show us something that doesn't run but is a bit like the real 
> code.
> 
> Duncan Murdoch
> 
>> result <- filter(mydata,
>>   all(
>> any(!is.na(first.a), !is.na(first.b)),
>> any(!is.na(second.a), !is.na(second.b)),
>> any(!is.na(third.a), !is.na(third.b
>> I apologize if the formatting is not seen properly. The above logically
>> should work. And it should be extendable to scenarios where you want at
>> least one of M columns to contain data as a group with N such groups 
>> of any
>> size.
>> But since it did not work, I tried a plan that did work and feels 
>> silly. I
>> used mutate() to make new columns such as:
>> result <-
>>mydata |>
>>

[R] any and all

2024-04-12 Thread avi.e.gross
Base R has generic functions called any() and all() that I am having trouble
using.
 
It works fine when I play with it in a base R context as in:
 
> all(any(TRUE, TRUE), any(TRUE, FALSE))
[1] TRUE
> all(any(TRUE, TRUE), any(FALSE, FALSE))
[1] FALSE
 
But in a tidyverse/dplyr environment, it returns wrong answers.
 
Consider this example. I have data I have joined together with pairs of
columns representing a first generation and several other pairs representing
additional generations. I want to consider any pair where at least one of
the pair is not NA as a success. But in order to keep the entire row, I want
all three pairs to have some valid data. This seems like a fairly common
reasonable thing often needed when evaluating data.
 
So to make it very general, I chose to do something a bit like this:
 
result <- filter(mydata,
 all(
   any(!is.na(first.a), !is.na(first.b)),
   any(!is.na(second.a), !is.na(second.b)),
   any(!is.na(third.a), !is.na(third.b
 
I apologize if the formatting is not seen properly. The above logically
should work. And it should be extendable to scenarios where you want at
least one of M columns to contain data as a group with N such groups of any
size.
 
But since it did not work, I tried a plan that did work and feels silly. I
used mutate() to make new columns such as:
 
result <-
  mydata |>
  mutate(
usable.1 = (!is.na(first.a) | !is.na(first.b)),
usable.2 = (!is.na(second.a) | !is.na(second.b)),
usable.3 = (!is.na(third.a) | !is.na(third.b)),
usable = (usable.1 & usable.2 & usable.3)
  ) |>
  filter(usable == TRUE)
 
The above wastes time and effort making new columns so I can check the
calculations then uses the combined columns to make a Boolean that can be
used to filter the result.
 
I know this is not the place to discuss dplyr. I want to check first if I am
doing anything wrong in how I use any/all. One guess is that the generic is
messed with by dplyr or other packages I libraried.
 
And, of course, some aspects of delayed evaluation can interfere in subtle
ways.
 
I note I have had other problems with these base R functions before and
generally solved them by not using them, as shown above. I would much rather
use them, or something similar.
 
 
Avi
 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Exceptional slowness with read.csv

2024-04-10 Thread avi.e.gross
Dave,

Your method works for you and seems to be a one-time fix of a corrupted data 
file so please accept what I write not as a criticism but explaining my 
alternate reasoning which I suspect may work faster in some situations.

Here is my understanding of what you are doing:

You have a file in CSV format containing N rows with commas to make M columns. 
A few rows have a glitch in that there is a double quote character at the 
beginning or end (meaning between commas adjacent to one, or perhaps at the 
beginning or end of the line of text) that mess things up. This may be in a 
specific known column or in several.

So your algorithm is to read the entire file in, or alternately you could do 
one at a time. Note the types of the columns may not be apparent to you when 
you start as you are not allowing read.csv() see what it needs to or perform 
all kinds of processing like dealing with a comment.
You then call functions millions of times (N) such as read.csv(). Argh!

You do that by setting up an environment N times to catch errors. Of course, 
most lines are fine and no error.

Only on error lines do you check for a regular expression that checks for 
quotes not immediately adjacent to a comma. I am not sure what you used albeit 
I imagine sometimes spaces could intervene. You fix any such lines and 
re-evaluate.

It seems your goal was to rewrite a corrected file so you are doing so while 
appending to it a row/line at a time.

My strategy was a bit different.

- Call read.csv() just once with no error checking but an option to not treat a 
quote specially. Note if the quoted region may contain commas, this is a bad 
strategy. If all it has is spaces or other non-comma items, it may be fine. 

There is now a data.frame or other similar data structure in memory if it works 
with N rows and M columns.

- Pick only columns that may have this issue, meaning the ones containing say 
text as compared to numbers or logical values.
- Using those columns, perhaps one at a time, evaluate them all at once for a 
regular expression that tests the entry for the presence of exactly one quote 
either at the start or end (the commas you used as anchors are not in this 
version.) So you are looking for something like:

"words perhaps including, commas
Or
words perhaps including, commas"

but not for:

words perhaps including, commas
"words perhaps including, commas"

You can save the query as a Boolean vector of TRUE/FALSE as one method, to mark 
which rows need fixing. Or you might use an ifelse() or the equivalent in which 
you selectively apply a fix to the rows. One method is to use something like 
sub() to both match all text except an initial or terminal quote and replace it 
with a quote followed by the match followed by a quote, if any quotes were 
found.

Whatever you choose can be done in a vectorized manner that may be more 
efficient. You do not need to check for failures, let alone N times. And you 
only need process those columns that need it.

When done, you may want to make sure all the columns are of the type you want 
as who knows if read.csv() made a bad choice on those columns, or others.

Note again, this is only a suggestion and it fails if commas can be part of the 
quoted parts or even misquoted parts.

-Original Message-
From: R-help  On Behalf Of Dave Dixon
Sent: Wednesday, April 10, 2024 12:20 PM
To: Rui Barradas ; r-help@r-project.org
Subject: Re: [R] Exceptional slowness with read.csv

That's basically what I did

1. Get text lines using readLines
2. use tryCatch to parse each line using read.csv(text=...)
3. in the catch, use gregexpr to find any quotes not adjacent to a comma 
(gregexpr("[^,]\"[^,]",...)
4. escape any quotes found by adding a second quote (using str_sub from 
stringr)
6. parse the patched text using read.csv(text=...)
7. write out the parsed fields as I go along using write.table(..., 
append=TRUE) so I'm not keeping too much in memory.

I went directly to tryCatch because there were 3.5 million records, and 
I only expected a few to have errors.

I found only 6 bad records, but it had to be done to make the datafile 
usable with read.csv(), for the benefit of other researchers using these 
data.


On 4/10/24 07:46, Rui Barradas wrote:
> Às 06:47 de 08/04/2024, Dave Dixon escreveu:
>> Greetings,
>>
>> I have a csv file of 76 fields and about 4 million records. I know 
>> that some of the records have errors - unmatched quotes, 
>> specifically. Reading the file with readLines and parsing the lines 
>> with read.csv(text = ...) is really slow. I know that the first 
>> 2459465 records are good. So I try this:
>>
>>  > startTime <- Sys.time()
>>  > first_records <- read.csv(file_name, nrows = 2459465)
>>  > endTime <- Sys.time()
>>  > cat("elapsed time = ", endTime - startTime, "\n")
>>
>> elapsed time =   24.12598
>>
>>  > startTime <- Sys.time()
>>  > second_records <- read.csv(file_name, skip = 2459465, nrows = 5)
>>  > endTime <- Sys.time()
>>  > cat("elapsed 

Re: [R] Exceptional slowness with read.csv

2024-04-10 Thread avi.e.gross
It sounds like the discussion is now on how to clean your data, with a twist. 
You want to clean it before you can properly read it in using standard methods.

Some of those standard methods already do quite a bit as they parse the data 
such as looking ahead to determine the data type for a column.

The specific problem being discussed seems to be related to a lack of balance 
in individual lines of a CSV file related to double quotes that then mess up 
that row and following rows for a while. I am not clear on the meaning of the 
quotes to the user but wonder if they can simply not be viewed as quotes. 
Functions like read.csv() or the tidyverse variant of read_csv() allow you to 
specify the quote character or disable it.

So what would happen to the damaged line/row in your case, or any row with both 
quotes intact if you tried reading it in with an argument disabling processing 
quoted regions? It may cause problems but in your case, maybe it won't.

If so, after reading in the file, you can march through it and make fixes, such 
as discussed. The other alternative seems to be to read the lines in the 
old-fashioned way, do some surgery on whole lines rather than individual 
row/column entries, and perhaps feed the huge amount of data in some form to 
read.csv as text=TEXT or write it out to another file and read it in again.

And, of course, if there is just one bad line, then you might just open it with 
a program such as EXCEL or anything that lets you edit it once, ...




-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Wednesday, April 10, 2024 9:46 AM
To: Dave Dixon ; r-help@r-project.org
Subject: Re: [R] Exceptional slowness with read.csv

Às 06:47 de 08/04/2024, Dave Dixon escreveu:
> Greetings,
> 
> I have a csv file of 76 fields and about 4 million records. I know that 
> some of the records have errors - unmatched quotes, specifically. 
> Reading the file with readLines and parsing the lines with read.csv(text 
> = ...) is really slow. I know that the first 2459465 records are good. 
> So I try this:
> 
>  > startTime <- Sys.time()
>  > first_records <- read.csv(file_name, nrows = 2459465)
>  > endTime <- Sys.time()
>  > cat("elapsed time = ", endTime - startTime, "\n")
> 
> elapsed time =   24.12598
> 
>  > startTime <- Sys.time()
>  > second_records <- read.csv(file_name, skip = 2459465, nrows = 5)
>  > endTime <- Sys.time()
>  > cat("elapsed time = ", endTime - startTime, "\n")
> 
> This appears to never finish. I have been waiting over 20 minutes.
> 
> So why would (skip = 2459465, nrows = 5) take orders of magnitude longer 
> than (nrows = 2459465) ?
> 
> Thanks!
> 
> -dave
> 
> PS: readLines(n=2459470) takes 10.42731 seconds.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Hello,

Can the following function be of help?
After reading the data setting argument quote=FALSE, call a function 
applying gregexpr to its character columns, then transforming the output 
in a two column data.frame with columns

  Col - the column processed;
  Unbalanced - the rows with unbalanced double quotes.

I am assuming the quotes are double quotes. It shouldn't be difficult to 
adapt it to other cas, single quotes, both cases.




unbalanced_dquotes <- function(x) {
   char_cols <- sapply(x, is.character) |> which()
   lapply(char_cols, \(i) {
 y <- x[[i]]
 Unbalanced <- gregexpr('"', y) |>
   sapply(\(x) attr(x, "match.length") |> length()) |>
   {\(x) (x %% 2L) == 1L}() |>
   which()
 data.frame(Col = i, Unbalanced = Unbalanced)
   }) |>
   do.call(rbind, args = _)
}

# read the data disregardin g quoted strings
df1 <- read.csv(fl, quote = "")
# determine which strings have unbalanced quotes and
# where
unbalanced_dquotes(df1)


Hope this helps,

Rui Barradas


-- 
Este e-mail foi analisado pelo software antivírus AVG para verificar a presença 
de vírus.
www.avg.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding reservoir volume and water level

2024-04-07 Thread avi.e.gross
John,

Your reaction was what my original reaction was until I realized I had to
find out what a DEM file was and that contains enough of the kind of
depth-dimension data you describe albeit what may be a very irregular cross
section to calculate for areas and thence volumes.

If I read it correctly, this can be a very real-world problem worthy of a
solution, such as in places like California where they had a tad more rain
than usual and some reservoirs may overflow. Someone else provided what
sounds like a mathematical algorithm but my guess is what is needed here is
perhaps less analytic since there may be no trivial way to create formulas
and take integrals and so on, but simply an approximate way to calculate
incremental volumes for each horizontal "slice" and keep adding or
subtracting them till you reach a target and then read off another variable
at that point such as depth.

Some care must be taken as water level has to be relative to something and
many natural reservoirs have no unique bottom level. Some water may also be
stored underground and to the side and pour in if the level lowers or can be
used to escape if the level rises.


-Original Message-
From: R-help  On Behalf Of Sorkin, John
Sent: Sunday, April 7, 2024 3:08 PM
To: Rui Barradas ; javad bayat ;
R-help 
Subject: Re: [R] Question regarding reservoir volume and water level

Aside from the fact that the original question might well be a class
exercise (or homework), the question is unanswerable given the data given by
the original poster. One needs to know the dimensions of the reservoir,
above and below the current waterline. Are the sides, above and below the
waterline smooth? Is the region currently above the waterline that can store
water a mirror image of the region below the waterline? Is the region above
the reservoir include a flood plane? Will the additional water go into the
flood plane?

The lack of required detail in the question posed by the original poster
suggests that there are strong assumptions, assumptions that typically would
be made in a class-room example or exercise.

John

John David Sorkin M.D., Ph.D.
Professor of Medicine, University of Maryland School of Medicine;
Associate Director for Biostatistics and Informatics, Baltimore VA Medical
Center Geriatrics Research, Education, and Clinical Center;
PI Biostatistics and Informatics Core, University of Maryland School of
Medicine Claude D. Pepper Older Americans Independence Center;
Senior Statistician University of Maryland Center for Vascular Research;

Division of Gerontology and Paliative Care,
10 North Greene Street
GRECC (BT/18/GR)
Baltimore, MD 21201-1524
Cell phone 443-418-5382





From: R-help  on behalf of Rui Barradas

Sent: Sunday, April 7, 2024 10:53 AM
To: javad bayat; R-help
Subject: Re: [R] Question regarding reservoir volume and water level

Às 13:27 de 07/04/2024, javad bayat escreveu:
> Dear all;
> I have a question about the water level of a reservoir, when the volume
> changed or doubled.
> There is a DEM file with the highest elevation 1267 m. The lowest
elevation
> is 1230 m. The current volume of the reservoir is 7,000,000 m3 at 1240 m.
> Now I want to know what would be the water level if the volume rises to
> 1250 m? or what would be the water level if the volume doubled (14,000,000
> m3)?
>
> Is there any way to write codes to do this in R?
> I would be more than happy if anyone could help me.
> Sincerely
>
>
>
>
>
>
>
>
Hello,

This is a simple rule of three.
If you know the level l the argument doesn't need to be named but if you
know the volume v then it must be named.


water_level <- function(l, v, level = 1240, volume = 7e6) {
   if(missing(v)) {
 volume * l / level
   } else level * v / volume
}

lev <- 1250
vol <- 14e6

water_level(l = lev)
#> [1] 7056452
water_level(v = vol)
#> [1] 2480


Hope this helps,

Rui Barradas


--
Este e-mail foi analisado pelo software antivírus AVG para verificar a
presença de vírus.
http://www.avg.com/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.r-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question regarding reservoir volume and water level

2024-04-07 Thread avi.e.gross
Chris, since it does indeed look like homework, albeit a deeper looks
suggests it may not beI think we can safely answer the question:

>Is there any way to write codes to do this in R?

The answer is YES.

And before you ask, it can be done in Python, Java, C++, Javascript, BASIC,
FORTRAN and probably even COBOL and many forms of assembler.

And, it can be done even without a computer using your mind and pencil and
paper.

I have seen similar problems discussed using a search and wonder if that is
where you should go, or perhaps consult your textbook or class notes.

OK, levity aside, what is the real question? 

If you want help designing an algorithm that solves the problem, that is
outside the scope of this forum and may indeed count as helping someone for
free with their homework or other work.

If this was a place for tutoring help you might  be asked to try to show
some work and point out where one step seems stuck. You might get answers.

Perhaps a better question is to look at your problem and see what it might
need and ask if someone knows of one or more R packages that handle your
needs.

But as I read your message, assume people reading it have no idea what a DEM
file is. I looked it up and it a Digital Elevation Model. I then searched to
see if anyone discussed how to bring the contents of the file into an R
session and found some suggestions but note I have not, nor plan, to try any
of them.

Your request does not specify any particular shape for the containment of
existing water or what is above. If it is from a DES file, it would have
info about what likely may be quite irregular surfaces that vary with depth.
That is not as simple a calculation as asking what happens if the container
is a cylinder or cone . It depends on the data we cannot see. It sounds way
beyond basic R as it likely involves working with 3-D matrices or something
similar.

So I looked for packages you can search for too and I see one called,
appropriately, DEM. I see other packages called Terra and CopernicusDEM  and
whitebox and you may want to do searching and see if anything solves parts
of your problem.

And, of course, it may be something you find will do it easily for you if
someone has provided say a module for Python.

Good Luck.


-Original Message-
From: R-help  On Behalf Of Chris Ryan via
R-help
Sent: Sunday, April 7, 2024 9:26 AM
To: r-help@r-project.org; javad bayat ; R-help

Subject: Re: [R] Question regarding reservoir volume and water level

Homework?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

On April 7, 2024 8:27:18 AM EDT, javad bayat  wrote:
>Dear all;
>I have a question about the water level of a reservoir, when the volume
>changed or doubled.
>There is a DEM file with the highest elevation 1267 m. The lowest elevation
>is 1230 m. The current volume of the reservoir is 7,000,000 m3 at 1240 m.
>Now I want to know what would be the water level if the volume rises to
>1250 m? or what would be the water level if the volume doubled (14,000,000
>m3)?
>
>Is there any way to write codes to do this in R?
>I would be more than happy if anyone could help me.
>Sincerely
>
>
>
>
>
>
>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Printout and saved results

2024-03-26 Thread avi.e.gross
Just FYI, the R interpreter typically saves the last value returned briefly
in a variable called .Last.value that can be accessed before you do anything
else.

> sin(.5)
[1] 0.4794255
> temp <- .Last.value
> print(temp)
[1] 0.4794255
> sin(.666)
[1] 0.6178457
> .Last.value
[1] 0.6178457
> temp
[1] 0.4794255
> invisible(sin(0.2))
> .Last.value
[1] 0.1986693

So perhaps if you grab it in time, you can call your function and let the
REPL display it (or not) and yet save the value.



-Original Message-
From: R-help  On Behalf Of Jeff Newmiller via
R-help
Sent: Tuesday, March 26, 2024 1:03 AM
To: r-help@r-project.org
Subject: Re: [R] Printout and saved results

Your desire is not unusual among novices... but it is really not a good idea
for your function to be making those decisions. Look at how R does things:

The lm function prints nothing... it returns an object containing the result
of a linear regression. If you happen to call it directly from the R command
prompt and don't assign it to a variable, then the command interpreter
notices that return value and prints it. Since the lm object has a dedicated
print method associated with it, that output looks different than a plain
list object would... but the fact that it has a special print method
(?print.lm) is just window dressing unrelated to your request.

The important part is that the lm function doesn't even consider printing
anything out... it is the code that calls the function that determines
whether it will get printed. So...

lm( hp ~ disp, data = mtcars )  # printed by command interpreter
z <- lm( hp ~ disp, data = mtcars ) # assignment operator returns the value
of z to the command processor, but invisibly
( z <- lm( hp ~ disp, data = mtcars ) ) # strips off the invisible marking
so the value gets printed

Another example:

f <- function() {
  x <- 4
  x  # doesn't print
  invisible( 5 ) # return invisible result
}

f()  # doesn't print 4 because there is no command prompt looking at x alone
on a line... it is inside f
# command prompt doesn't print 5 because that 5 has been marked as invisible
(f()) # command interpreter prints 5

Leaving it up to the calling code to decide whether to print gives you the
option of calling your analysis function possibly thousands of times and
figuring out some slick way to summarize all those runs without thousands of
printouts that you are not going to wade through anyway and would only slow
the computer down (printing really does slow the computer down!)


On March 25, 2024 9:00:49 PM PDT, Steven Yen  wrote:
>I just like the subroutine to spit out results (Mean, Std.dev, etc.) and
also be able to access the results for further processing, i.e.,
>
>v$Mean
>
>v$Std.dev
>
>On 3/26/2024 11:24 AM, Richard O'Keefe wrote:
>> Not clear what you mean by "saved".
>> If you call a function and the result is printed, the result is
>> remembered for a wee while in
>> the variable .Last.value, so you can do
>>> function.with.interesting.result(...)
>>> retained.interesting.result <- .Last.value
>> or even
>>> .Last.value -> retained.interesting.result
>> If you know before you start writing the expression that you want to
>> save the value,
>> you can wrap the assignment in parentheses, making it an expression:
>> 
>>> (retained.interesting.result <-
function.with.interesting.result(..))
>> 
>> On Tue, 26 Mar 2024 at 15:03, Steven Yen  wrote:
>>> How can I have both printout and saved results at the same time.
>>> 
>>> The subroutine first return "out" and the printout gets printed, but not
>>> saved.
>>> 
>>> I then run the "invisible" line. Results got saved and accessible but no
>>> printout.
>>> 
>>> How can I have both printout and also have the results saved? Thank you!
>>> 
>>>   > dstat4 <- function(data,digits=3){
>>> +   Mean<- apply(data,2,mean,na.rm=TRUE)
>>> +   Std.dev <- apply(data,2,sd,  na.rm=TRUE)
>>> +   Min <- apply(data,2,min,na.rm=TRUE)
>>> +   Max <- apply(data,2,max,na.rm=TRUE)
>>> +   Obs <- dim(data)[1]
>>> +   out <-round(cbind(Mean,Std.dev,Min,Max,Obs),digits)
>>> +   out
>>> + # invisible(list(Mean=Mean,Std.dev=Std.dev,Min=Min,Max=Max))
>>> + }
>>>   > x1<-rnorm(n=5,mean=5, sd=1)
>>>   > x2<-rnorm(n=5,mean=10,sd=2)
>>>   > w<-rnorm(n=5,mean=2,sd=0.3)
>>>   > mydata<-data.frame(cbind(x1,x2))
>>>   > v<-dstat4(mydata); v
>>>Mean Std.dev   MinMax Obs
>>> x1  5.000   0.922 3.900  6.282   5
>>> x2 10.769   1.713 9.209 13.346   5
>>>   > v$Mean
>>> Error in v$Mean : $ operator is invalid for atomic vectors
>>>   > dstat4 <- function(data,digits=3){
>>> +   Mean<- apply(data,2,mean,na.rm=TRUE)
>>> +   Std.dev <- apply(data,2,sd,  na.rm=TRUE)
>>> +   Min <- apply(data,2,min,na.rm=TRUE)
>>> +   Max <- apply(data,2,max,na.rm=TRUE)
>>> +   Obs <- dim(data)[1]
>>> +   out <-round(cbind(Mean,Std.dev,Min,Max,Obs),digits)
>>> + # out
>>> +   invisible(list(Mean=Mean,Std.dev=Std.dev,Min=Min,Max=Max))
>>> + }
>>> 
>>>   > v<-dstat4(mydata)
>>>   > v$Mean
>>> x1   

Re: [R] [External] Re: Building Packages. (fwd)

2024-03-21 Thread avi.e.gross
Thank you Duncan, you explained quite a bit.

I am unclear how this change causes the problem the OP mentioned.

It is an example of people using a clever trick to get what they think they
want that could be avoided if the original program provided a hook. Of
course the hook could be used more maliciously by others.

-Original Message-
From: R-help  On Behalf Of Duncan Murdoch
Sent: Thursday, March 21, 2024 8:28 AM
To: luke-tier...@uiowa.edu; r-help@r-project.org
Subject: Re: [R] [External] Re: Building Packages. (fwd)

If you are wondering why RStudio did this, you can see their substitute 
function using

   (parent.env(environment(install.packages)))$hook

They appear to do these things:

  - Allow package installation to be disabled.

  - Check if a package to be installed is already loaded, so that 
RStudio can restart R for the install.

  - Add Rtools to the PATH if necessary.

  - Trigger an event to say that something is about to be changed about 
the installed packages, presumably so that they can mark a cached list 
of installed packages as stale.

  - Call the original function.

I think all of these things could be done  if install.packages() called 
a hook at the start, as library() does (via attachNamespace()) when a 
package is attached.  It might be that putting the wrapper code into 
tools:rstudio would cause confusion for users when there were two 
objects of the same name on the search list, though I don't see how.

Duncan Murdoch


On 21/03/2024 7:44 a.m., luke-tierney--- via R-help wrote:
> [forgot to copy to R-help so re-sending]
> 
> -- Forwarded message --
> Date: Thu, 21 Mar 2024 11:41:52 +
> From: luke-tier...@uiowa.edu
> To: Duncan Murdoch 
> Subject: Re: [External] Re: [R] Building Packages.
> 
> At least on my installed version (which tells me it is out of date)
> they appear to just be modifying the "package:utils" parent frame of
> the global search path.
> 
> There seem to be a few others:
> 
> checkUtilsFun <- function(n)
>   identical(get(n, "package:utils"), get(n, getNamespace("utils")))
> names(which(! sapply(ls("package:utils", all = TRUE), checkUtilsFun)))
> ## [1] "bug.report"   "file.edit""help.request" ## [4]
"history"
> "install.packages" "remove.packages" ## [7] "View"
> 
> I don't know why they don't put these overrides in the tools:rstudio
frame.
> At least that would make them more visible.
> 
> You can fix all of these with something like
> 
> local({
> up <- match("package:utils", search())
> detach("package:utils")
> library(utils, pos = up)
> })
> 
> or just install.packages with
> 
> local({
>   up <- match("package:utils", search())
>   unlockBinding("install.packages", pos.to.env(up))
>   assign("install.packages", utils::install.packages, "package:utils")
>   lockBinding("install.packages", pos.to.env(up))
> })
> 
> Best,
> 
> luke
> 
> On Thu, 21 Mar 2024, Duncan Murdoch wrote:
> 
>> Yes, you're right.  The version found in the search list entry for
>> "package:utils" is the RStudio one; the ones found with two or three
colons
>> are the original.
>>
>> Duncan Murdoch
>>
>> On 21/03/2024 5:48 a.m., peter dalgaard wrote:
>>> Um, what's with the triple colon? At least on my install, double seems
to
>>> suffice:
>>>
 identical(utils:::install.packages, utils::install.packages)
>>> [1] TRUE
 install.packages
>>> function (...)
>>> .rs.callAs(name, hook, original, ...)
>>> 
>>>
>>> -pd
>>>
 On 21 Mar 2024, at 09:58 , Duncan Murdoch 
wrote:

 The good news for Jorgen (who may not be reading this thread any more)
is
 that one can still be sure of getting the original install.packages()
by
 using

  utils:::install.packages( ... )

 with *three* colons, to get the internal (namespace) version of the
 function.

 Duncan Murdoch


 On 21/03/2024 4:31 a.m., Martin Maechler wrote:
>> "Duncan Murdoch on Wed, 20 Mar 2024 13:20:12 -0400 writes:
>   > On 20/03/2024 1:07 p.m., Duncan Murdoch wrote:
>   >> On 20/03/2024 12:37 p.m., Ben Bolker wrote:
>   >>> Ivan, can you give more detail on this? I've heard this
>   >>> issue mentioned, but when I open RStudio and run
>   >>> find("install.packages") it returns
>   >>> "utils::install.packages", and running dump() from
>   >>> within RStudio console and from an external "R
>   >>> --vanilla" gives identical results.
>   >>>
>   >>> I thought at one point this might only refer to the GUI
>   >>> package-installation interface, but you seem to be
>   >>> saying it's the install.packages() function as well.
>   >>>
>   >>> Running an up-to-date RStudio on Linux, FWIW -- maybe
>   >>> weirdness only happens on other OSs?
>   >>
>   >> On MacOS, I see this:
>   >>
>   >> > install.packages function (...)  

Re: [R] Building Packages.

2024-03-21 Thread avi.e.gross
With all this discussion, I shudder to ask this. I may have missed the
answers but the discussion seems to have been about identifying and solving
the problem rapidly rather than what maybe is best going forward if all
parties agree.

What was the motivation for what RSTUDIO did for their version and the
decision to replace what came with utils unless someone very explicitly
over-rode them by asking for the original? Is their version better in other
ways? Is there a possibility the two implementations may someday merge into
something that meets several sets of needs or are they incompatible?

Is there agreement that what broke with the substitution is a valid use or
is it something that just happens to work on the utils version if not
patched?



-Original Message-
From: R-help  On Behalf Of Duncan Murdoch
Sent: Thursday, March 21, 2024 5:53 AM
To: peter dalgaard 
Cc: Jorgen Harmse ; r-help@r-project.org; Martin Maechler

Subject: Re: [R] Building Packages.

Yes, you're right.  The version found in the search list entry for 
"package:utils" is the RStudio one; the ones found with two or three 
colons are the original.

Duncan Murdoch

On 21/03/2024 5:48 a.m., peter dalgaard wrote:
> Um, what's with the triple colon? At least on my install, double seems to
suffice:
> 
>> identical(utils:::install.packages, utils::install.packages)
> [1] TRUE
>> install.packages
> function (...)
> .rs.callAs(name, hook, original, ...)
> 
> 
> -pd
> 
>> On 21 Mar 2024, at 09:58 , Duncan Murdoch 
wrote:
>>
>> The good news for Jorgen (who may not be reading this thread any more) is
that one can still be sure of getting the original install.packages() by
using
>>
>> utils:::install.packages( ... )
>>
>> with *three* colons, to get the internal (namespace) version of the
function.
>>
>> Duncan Murdoch
>>
>>
>> On 21/03/2024 4:31 a.m., Martin Maechler wrote:
 "Duncan Murdoch on Wed, 20 Mar 2024 13:20:12 -0400 writes:
>>>  > On 20/03/2024 1:07 p.m., Duncan Murdoch wrote:
>>>  >> On 20/03/2024 12:37 p.m., Ben Bolker wrote:
>>>  >>> Ivan, can you give more detail on this? I've heard this
>>>  >>> issue mentioned, but when I open RStudio and run
>>>  >>> find("install.packages") it returns
>>>  >>> "utils::install.packages", and running dump() from
>>>  >>> within RStudio console and from an external "R
>>>  >>> --vanilla" gives identical results.
>>>  >>>
>>>  >>> I thought at one point this might only refer to the GUI
>>>  >>> package-installation interface, but you seem to be
>>>  >>> saying it's the install.packages() function as well.
>>>  >>>
>>>  >>> Running an up-to-date RStudio on Linux, FWIW -- maybe
>>>  >>> weirdness only happens on other OSs?
>>>  >>
>>>  >> On MacOS, I see this:
>>>  >>
>>>  >> > install.packages function (...)  .rs.callAs(name, hook,
>>>  >> original, ...)  
>>>  >>
>>>  >> I get the same results as you from find().  I'm not sure
>>>  >> what RStudio is doing to give a different value for the
>>>  >> function than what find() sees.
>>>  > Turns out that RStudio replaces the install.packages
>>>  > object in the utils package.
>>>  > Duncan Murdoch
>>> Yes, and this has been the case for several years now, and I
>>> have mentioned this several times, too  (though some of it
>>> possibly not in a public R-* mailing list).
>>> And yes, that they modify the package environment
>>>as.environment("package:utils")
>>> but leave the
>>>namespace  asNamespace("utils")
>>> unchanged, makes it harder to see what's
>>> going on (but also has less severe consequences; if they kept to
>>> the otherwise universal *rule* that the namespace and package must have
the same objects
>>> apart from those only in the namespace,
>>> people would not even have access to R's true install.packages()
>>> but only see the RStudio fake^Hsubstitute..
>>> We are still not happy with their decision. Also
>>> help(install.packages) goes to R's documentation of R's
>>> install.packages, so there's even more misleading of useRs.
>>> Martin
>>>
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide 

[R] Rtools and things dependent on it

2024-02-23 Thread avi.e.gross
This may be a dumb question and the answer may make me feel dumber.
 
I have had trouble for years with R packages wanting Rtools on my machine
and not being able to use it. Many packages are fine as binaries are
available. I have loaded Rtools and probably need to change my PATH or
something.
 
But I recently suggested to someone that they might want to use the tabyl()
function in the janitor package that I find helpful. I get a warning when I
install it about Rtools but it works fine. When they install it, it fails. I
assumed they would get it from CRAN the same way I did as we are both using
Windows and from within RSTUDIO.
 
In the past, I have run into other packages I could not use and just moved
on but it seems like time to see if this global problem has a work-around.
 
And, in particular, I have the latest versions of both R and RSTUDIO which
can be a problem when other things are not as up-to-date.
 
Or, maybe some people with R packages could be convinced to make binaries
available in the first place?
 
Avi
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping

2024-02-18 Thread avi.e.gross
Steven,

It depends what you want to do. What you are showing seems to replace the 
values stored in "data" each time.

Many kinds of loops will do that, with one simple way being to store all the 
filenames in a list and loop on the contents of the list as arguments to 
read.csv.

Since you show filenames as having a number from 1 to 24 in middle, you can 
make such a vector using paste().

A somewhat related question is if you want to concatenate all the data into one 
larger data.frame. 


-Original Message-
From: R-help  On Behalf Of Steven Yen
Sent: Sunday, February 18, 2024 10:28 PM
To: R-help Mailing List 
Subject: [R] Looping

I need to read csv files repeatedly, named data1.csv, data2.csv,… data24.csv, 
24 altogether. That is, 

data<-read.csv(“data1.csv”)
…
data<-read.csv(“data24.csv”)
…

Is there a way to do this in a loop? Thank you.

Steven from iPhone
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Truncated plots

2024-01-09 Thread avi.e.gross


Nick, obviously figuring out the problem is best but you may want to deal
with the symptom.

RSTUDIO lets you adjust the sizes of the various windows and enlarging the
window (lower right normally) where the graph is shown may be a first
attempt if the problem is display space.

And note RSTUDIO in that window lets you zoom a pop-out window with the
graph. It also lets you save the graph into files with various formats.

And, as someone else pointed out, you can change the program to choose a
device to save the output in.

Of course, if the real problem is in some aspect of the setup that causes
the clipping, that has to be looked at.

If the problem is in RSTUDIO, not just R, this is not the right forum to
deal with it.

Avi

-Original Message-
From: R-help  On Behalf Of Nick Wray
Sent: Tuesday, January 9, 2024 11:43 AM
To: r-help@r-project.org
Subject: [R] Truncated plots

Hello As a postgrad I have been helping an undergraduate student with R
coding but she has a problem with R studio on her laptop which I can't fix
- basically when she runs a plot it appears without a y axis label with the
black line plot frame hard against the plot window and the bottom of the
plot, where you would expect to see the horizontal axis and the x axis
label etc is completely "chopped off" by the bottom edge of the R studio
interface window.  I can't find anything on the net detailing this problem
- can anyone help?  I have a screenshot which could email if anyone needs
to see what it looks like.

Thanks Nick Wray

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Function with large nested list

2023-12-18 Thread avi.e.gross
Emily,

I too copied/pasted your code in and it worked fine. I then asked for the
function definition and got it.

Did you put the entire text in? I mean nothing extra above or below except
maybe whitespace or comments?

What sometimes happens to make the code incomplete is to leave out a
matching parentheses of brace or bracket or sometimes quotes or using the
wrong kind of quote as in copying from a program like Microsoft Word.


-Original Message-
From: R-help  On Behalf Of Emily Bakker
Sent: Monday, December 18, 2023 4:56 AM
To: r-help@r-project.org
Subject: [R] Function with large nested list

Hello list,

I want to make a large rulebased algorithm, to provide decision support for
drug prescriptions. I have defined the algorithm in a function, with a for
loop and many if statements. The structure should be as follows:
1. Iterate over a list of drug names. For each drug:
2. Get some drug related data (external dataset). Row of a dataframe.
3.  Check if adaptions should be made to standard dosage and safety
information in case of contraindications. If patient has an indication,
update current dosage and safety information with the value from the
dataframe row. 
4. Save dosage and safety information in some lists and continue to the next
drug. 
5. When the iteration over all drugs is done, return the lists.

ISSUE:
So it is a very large function with many nested if statements. I have
checked the code structure multiple times, but i run into some issues. When
i try to run the function definiton, the command never "completes" in de
console. Instead of ">", the console shows "+". No errors are raised.

As I said, i have checked the structure multiple times, but cant find an
error. I have tried rebuilding it and testing each time i add a part. Each
part functions isolated, but not together in the same function. I can't find
any infinite loops either. 
I suspect the function may be too large, and i have to define functions for
each part separately. That isn't an issue necessarily, but i would still
like to know why my code won't run. And whether there are any downsides or
considerations for using many small functions.

Below is my code. I have left part of it out. There are six more parts like
the diabetes part that are similar.
I also use a lot of data/variabeles not included here, to try and keep
things compact. But I can provide additional information if helpful.
Thanks it advance for thinking along!!
Kind regards,
Emily

The code:

decision_algorithm <- function(AB_list, dataset_ab = data.frame(), diagnose
= 'cystitis', diabetes_status = "nee", katheter_status = "nee", 
   lang_QT_status = "nee", obesitas_status =
"nee", zwangerschap_status = "nee", 
   medicatie_actief =
data.frame(dict[["med_AB"]]), geslacht = "man", gfr=90){
  
  
  
  # vars
  list_AB_status <- setNames(as.list(rep("green", length(AB_list))),
names(AB_list)) #make a dict of all AB's and assign status green as deafault
for status
  list_AB_remarks <- setNames(as.list(rep("Geen opmerkingen",
length(AB_list))), names(AB_list)) #make a dict of all AB's and assign
"Geen" as default for remarks #Try empty list
  list_AB_dosering <- setNames(as.list(rep("Geen informatie",
length(AB_list))), names(AB_list)) # make named list of all AB's and assign
"Geen informatie", will be replaced with actual information in algorithm
  list_AB_duur <- setNames(as.list(rep("Geen informatie", length(AB_list))),
names(AB_list)) # make named list of all AB's and assign "Geen informatie",
will be replaced with actual information in algorithm
  
  # CULTURES #
  for (i in names(AB_list)) {

ab_data <- dataset_ab[dataset_ab$middel == i,] #get info for this AB
from dataset_ab

# Extract and split the diagnoses, dosering, and duur info for the
current antibiotic
ab_diagnoses <- str_split(ab_data$diagnoses, pattern = " \\| ")[[1]]
ab_diagnose_dosering <- str_split(ab_data$`diagnose dosering`, pattern =
" \\| ")[[1]]
ab_diagnose_duur <- str_split(ab_data$`diagnose duur`, pattern = " \\|
")[[1]]

# Find the index of the current diagnose in the ab_diagnoses list
diagnose_index <- match(diagnose, ab_diagnoses)

# Determine dosering and duur based on the diagnose_index
if (!is.na(diagnose_index)) {
  dosering <- ifelse(ab_diagnose_dosering[diagnose_index] ==
"standaard", ab_data$dosering, ab_diagnose_dosering[diagnose_index])
  duur <- ifelse(ab_diagnose_duur[diagnose_index] == "standaard",
ab_data$duur, ab_diagnose_duur[diagnose_index])
} else {
  # Use general dosering and duur as fallback if diagnose is not found
  dosering <- ab_data$dosering
  duur <- ab_data$duur
}

list_AB_dosering[[i]] <- dosering
list_AB_duur[[i]] <- duur

if ((!is.null(AB_list[[i]]) && AB_list[[i]] == "I")) {
  list_AB_status[[i]] <- "yellow"
list_AB_remarks[[i]] <- "Kweek verminderd gevoelig"
} else if 

Re: [R] adding "Page X of XX" to PDFs

2023-12-02 Thread avi.e.gross
Having read all of the replies, it seems there are solutions for the
question and the OP points out that some solutions such as making the
document twice will affect the creation date.

I suspect the additional time to do so is seconds or at most minutes so it
may not be a big deal.

But what about the idea of creating a PDF with a placeholder like "Page N of
XXX" and after the file has been created, dates and all, perhaps edit it
programmatically and replace all instances of XXX with something of the same
length like " 23" as there seem to be tools like the pdftools package that
let you get the number of pages. I have no idea if some program, perhaps
external, can do that and retain the date you want.

-Original Message-
From: R-help  On Behalf Of Dennis Fisher
Sent: Friday, December 1, 2023 3:53 PM
To: r-help@r-project.org
Subject: [R] adding "Page X of XX" to PDFs

OS X
R 4.3.1

Colleagues

I often create multipage PDFs [pdf()] in which the text "Page X" appears in
the margin.  These PDFs are created automatically using a massive R script.

One of my clients requested that I change this to:
Page X of XX 
where XX is the total number of pages.  

I don't know the number of expected pages so I can't think of any clever way
to do this.  I suppose that I could create the PDF, find out the number of
pages, then have a second pass in which the R script was fed the number of
pages.  However, there is one disadvantage to this -- the original PDF
contains a timestamp on each page -- the new version would have a different
timestamp -- so I would prefer to not use this approach.

Has anyone thought of some terribly clever way to solve this problem?

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] I need to create new variables based on two numeric variables and one dichotomize conditional category variables.

2023-11-04 Thread avi.e.gross
There are many techniques Callum and yours is an interesting twist I had not 
considered. 
 
Yes, you can specify what integer a factor uses to represent things but not 
what I meant. Of course your trick does not work for some other forms of data 
like real numbers in double format. There is a cost to converting a column to a 
factor that is recouped best if it speeds things up multiple times.
 
The point I was making was that when you will be using group_by, especially if 
done many times, it might speed things up if the column is already a normal 
factor, perhaps just indexed from 1 onward. My guess is that underneath the 
covers, some programs implicitly do such a factor conversion if needed. An 
example might be aspects of the ggplot program where you may get a mysterious 
order of presentation in the graph unless you create a factor with the order 
you wish to have used and avoid it making one invisibly.
 
From: CALUM POLWART  
Sent: Saturday, November 4, 2023 7:14 PM
To: avi.e.gr...@gmail.com
Cc: Jorgen Harmse ; r-help@r-project.org; mkzama...@gmail.com
Subject: Re: [R] I need to create new variables based on two numeric variables 
and one dichotomize conditional category variables.
 
I might have factored the gender.
 
I'm not sure it would in any way be quicker.  But might be to some extent 
easier to develop variations of. And is sort of what factors should be doing... 
 
# make dummy data
gender <- c("Male", "Female", "Male", "Female")
WC <- c(70,60,75,65)
TG <- c(0.9, 1.1, 1.2, 1.0)
myDf <- data.frame( gender, WC, TG )
 
# label a factor
myDf$GF <- factor(myDf$gender, labels= c("Male"=65, "Female"=58))
 
# do the maths
myDf$LAP <- (myDf$WC - as.numeric(myDf$GF))* myDf$TG
 
#show results
head(myDf)
 
gender WC  TG GF  LAP
1   Male 70 0.9 58 61.2
2 Female 60 1.1 65 64.9
3   Male 75 1.2 58 87.6
4 Female 65 1.0 65 64.0
 
 
(Reality: I'd have probably used case_when in tidy to create a new numeric 
column)
 
 
 
 
The equation to
calculate LAP is different for male and females. I am giving both equations
below.

LAP for male = (WC-65)*TG
LAP for female = (WC-58)*TG

My question is 'how can I calculate the LAP and create a single new column?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Sum data according to date in sequence

2023-11-04 Thread avi.e.gross
There may be a point to consider about the field containing dates in the 
request below. Yes, much code will "work" just fine if the column  are is seen 
as text as you can group by that too. The results will perhaps not be in the 
order by row that you expected but you can do your re-sorting perhaps even more 
efficiently after your summarise() either by converting the fewer remaining 
rows to a form of date or by transforming the text dates into an order of 
year/month/date that then sorts properly in forward or reverse order as needed. 

Converting lots of rows to date is not a cheap process and grouping by that 
more complex date data structure may be harder. Heck, it may even make sense to 
use the text form of dates organized as a factor as the grouping becomes sort 
of pre-done.

The above comments are not saying any other solutions offered are wrong but 
simply discussing whether, especially for larger data sets, there are ways that 
could be more efficient.

-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Saturday, November 4, 2023 12:56 PM
To: roslinazairimah zakaria ; jim holtman 

Cc: r-help mailing list 
Subject: Re: [R] Sum data according to date in sequence

Às 01:49 de 03/11/2023, roslinazairimah zakaria escreveu:
> Hi all,
> 
> This is the data:
> 
>> dput(head(dt1,20))structure(list(StationName = c("PALO ALTO CA / CAMBRIDGE 
>> #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1", "PALO ALTO CA / CAMBRIDGE #1",
> "PALO ALTO CA / CAMBRIDGE #1"), date = c("1/14/2016", "1/14/2016",
> "1/14/2016", "1/15/2016", "1/15/2016", "1/15/2016", "1/15/2016",
> "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016", "1/16/2016",
> "1/16/2016", "1/16/2016", "1/17/2016", "1/17/2016", "1/17/2016",
> "1/17/2016", "1/17/2016", "1/18/2016"), time = c("12:09", "19:50",
> "20:22", "8:25", "14:23", "18:17", "21:46", "10:19", "12:12",
> "14:12", "16:22", "19:16", "19:19", "20:24", "9:54", "12:16",
> "13:53", "19:03", "22:00", "8:58"), EnergykWh = c(4.680496, 6.272414,
> 1.032782, 11.004884, 10.096824, 6.658797, 4.808874, 1.469384,
> 2.996239, 0.303222, 4.988339, 8.131804, 0.117156, 3.285669, 1.175608,
> 3.677487, 1.068393, 8.820755, 8.138583, 9.0575)), row.names = c(NA,
> 20L), class = "data.frame")
> 
> 
> I would like to sum EnergykW data by the date. E.g. all values for
> EnergykWh on 1/14/2016
> 
> 
> On Fri, Nov 3, 2023 at 8:10 AM jim holtman  wrote:
> 
>> How about send a 'dput' of some sample data.  My guess is that your date
>> is 'character' and not 'Date'.
>>
>> Thanks
>>
>> Jim Holtman
>> *Data Munger Guru*
>>
>>
>> *What is the problem that you are trying to solve?Tell me what you want to
>> do, not how you want to do it.*
>>
>>
>> On Thu, Nov 2, 2023 at 4:24 PM roslinazairimah zakaria <
>> roslina...@gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> I have this set of data. I would like to sum the EnergykWh according date
>>> sequences.
>>>
 head(dt1,20)   StationName  date  time EnergykWh
>>> 1  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 12:09  4.680496
>>> 2  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 19:50  6.272414
>>> 3  PALO ALTO CA / CAMBRIDGE #1 1/14/2016 20:22  1.032782
>>> 4  PALO ALTO CA / CAMBRIDGE #1 1/15/2016  8:25 11.004884
>>> 5  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 14:23 10.096824
>>> 6  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 18:17  6.658797
>>> 7  PALO ALTO CA / CAMBRIDGE #1 1/15/2016 21:46  4.808874
>>> 8  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 10:19  1.469384
>>> 9  PALO ALTO CA / CAMBRIDGE #1 1/16/2016 12:12  2.996239
>>> 10 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 14:12  0.303222
>>> 11 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 16:22  4.988339
>>> 12 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:16  8.131804
>>> 13 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 19:19  0.117156
>>> 14 PALO ALTO CA / CAMBRIDGE #1 1/16/2016 20:24  3.285669
>>> 15 PALO ALTO CA / CAMBRIDGE #1 1/17/2016  9:54  1.175608
>>> 16 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 12:16  3.677487
>>> 17 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 13:53  1.068393
>>> 18 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 19:03  8.820755
>>> 19 PALO ALTO CA / CAMBRIDGE #1 1/17/2016 22:00  8.138583
>>> 20 PALO ALTO CA / CAMBRIDGE #1 1/18/2016  8:58  9.057500
>>>
>>> I have tried this:
>>> library(dplyr)
>>> sums <- dt1 %>%
>>>group_by(date) %>%
>>>summarise(EnergykWh = sum(EnergykWh))
>>>
>>> head(sums,20)
>>>
>>> The date is not by daily sequence but by year sequence.
>>>
 head(sums,20)# A tibble: 20 × 2
>>> date  EnergykWh
>>> 

Re: [R] Adding columns to a tibble based on a value in a different tibble

2023-11-04 Thread avi.e.gross
Yes, Bert. At first glance I thought it was one of the merge/joins and then 
wondered at the wording that made it sound like the ids may not be one per 
column.

IFF the need is the simpler case, it is a straightforward enough and common 
need. An example might make it clear enough so actual code can be shared as 
compared to talking about a first and second tibble.

Here is one reference to consider:

https://r4ds.hadley.nz/joins.html#:~:text=dplyr%20provides%20six%20join%20functions,is%20primarily%20determined%20by%20x%20.


A left_join may be what works, and of course more basic R includes the merge() 
function:

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/merge

If the column were to contain multiple ID, that changes things and a more 
complex approach could be needed.

-Original Message-
From: R-help  On Behalf Of Bert Gunter
Sent: Saturday, November 4, 2023 10:35 AM
To: Alessandro Puglisi 
Cc: r-help@r-project.org
Subject: Re: [R] Adding columns to a tibble based on a value in a different 
tibble

I think a simple reproducible example ("reprex") may be necessary for you
to get a useful reply. Questions with vague specifications such as yours
often result in going round and round with attempts to clarify what you
mean without a satisfactory answer. Clarification at the outset with a
reprex may save you and others a lot of frustration.

Cheers,
Bert

On Sat, Nov 4, 2023 at 1:41 AM Alessandro Puglisi <
alessandro.pugl...@gmail.com> wrote:

> Hi everyone,
>
> I have a tibble with various ids and associated information.
>
> I need to add a new column to this tibble that retrieves a specific 'y'
> value from a different tibble that has some of the mentioned ids in the
> first column and a 'y' value in the second one. If the id, and so the 'y'
> value is found, it will be included; otherwise, 'NA' will be used.
>
> Could you please help me?
>
> Thanks,
> Alessandro
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [EXTERNAL] RE: I need to create new variables based on two numeric variables and one dichotomize conditional category variables.

2023-11-03 Thread avi.e.gross
To be fair, Jordan, I think R has some optimizations so that the arguments
in some cases are NOT evaluated until needed. So only one or the other
choice ever gets evaluated for each row. My suggestion merely has
typographic implications and some aspects of clarity and minor amounts of
less memory and parsing needed. 
 
But ifelse() is currently implemented somewhat too complexly for my taste.
Just type "ifelse" at the prompt and you will see many lines of code that
handle various scenarios.
 
If you KNOW you have a certain situation such as a data.frame with multiple
rows and are sure a simpler solution works, there may well be faster ways to
do this. Obviously you could write a function that can be called once per
line and returns the answer, or a vectorized version that returns a vector
of 65 and 58 entries. Or you could add a few lines of code creating a
vector, perhaps as a temporary new column that looks like:
 
logic_male <- df$G == "male"
 
age[logic_male] <- 65
age(!logic_male] <- 58
 
Then use the age column in a formula directly as it contains the part of the
ifelse needed. You can then delete "age" whether stand-alone or as a column.
 
What is more efficient depends on your data.
 
Do note though that an advantage of using ifelse() is when you have nested
conditions which cannot trivially be written out along the lines above, but
I find that sometimes such nested expressions may be easier to read using
other techniques such as the dplyr function case_when().
Here is an example of code where some entries are NA or not categorized:
 
library(tidyverse)
WC <- 100
TG <- 2
Gender <- c("male", "female", "no comment", NA, "female")
 
result <- TG * (WC -
  case_when(
is.na(Gender) ~NA,
Gender == "male" ~ 65,
Gender == "female" ~ 58,
.default  = NA
  ))
 
The result for the above example is a result:
 
> result
[1] 70 84 NA NA 84
 
 
If you later want to add categories such as "transgender" with a value of 61
or have other numbers for groups like "Hispanic male", you can amend the
instructions as long as you put your conditions in an order so that they are
tried until one of them matches, or it takes the default. Yes, in a sense
the above is doable using a deeply nested ifelse() but easier for me to read
and write and evaluate. It may not be more efficient or may be as some of
dplyr is compiled code.
 
Please note some here prefer discussions about base-R functionality and some
have qualms about the tidyverse for various reasons. I don't and find much
of their functionality more easy to use.
 
 
 
From: Jorgen Harmse  
Sent: Friday, November 3, 2023 6:27 PM
To: avi.e.gr...@gmail.com; r-help@r-project.org; mkzama...@gmail.com
Subject: Re: [EXTERNAL] RE: [R] I need to create new variables based on two
numeric variables and one dichotomize conditional category variables.
 
Yes, that will halve the number of multiplications.
 
If you're looking for such optimisations then you can also consider
ifelse(G=='male', 65L, 58L). That will definitely use less time & memory if
WC is integer, but the trade-offs are more complicated if WC is floating
point.
 
Regards,
Jorgen Harmse.


 
From: avi.e.gr...@gmail.com 
mailto:avi.e.gr...@gmail.com> >
Date: Friday, November 3, 2023 at 16:12
To: Jorgen Harmse mailto:jhar...@roku.com> >,
r-help@r-project.org   mailto:r-help@r-project.org> >, mkzama...@gmail.com
  mailto:mkzama...@gmail.com> >
Subject: [EXTERNAL] RE: [R] I need to create new variables based on two
numeric variables and one dichotomize conditional category variables.
Just a minor point in the suggested solution:

df$LAP <- with(df, ifelse(G=='male', (WC-65)*TG, (WC-58)*TG))

since WC and TG are not conditional, would this be a slight improvement?

df$LAP <- with(df, TG*(WC - ifelse(G=='male', 65, 58)))



-Original Message-
From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of Jorgen Harmse via
R-help
Sent: Friday, November 3, 2023 11:56 AM
To: r-help@r-project.org  ; mkzama...@gmail.com
 
Subject: Re: [R] I need to create new variables based on two numeric
variables and one dichotomize conditional category variables.

df$LAP <- with(df, ifelse(G=='male', (WC-65)*TG, (WC-58)*TG))

That will do both calculations and merge the two vectors appropriately. It
will use extra memory, but it should be much faster than a 'for' loop.

Regards,
Jorgen Harmse.

--

Message: 8
Date: Fri, 3 Nov 2023 11:10:49 +1030
From: "Md. Kamruzzaman" mailto:mkzama...@gmail.com> >
To: r-help@r-project.org  
Subject: [R] I need to create new variables based on two numeric
variables and one dichotomize conditional category variables.
Message-ID:

Re: [R] I need to create new variables based on two numeric variables and one dichotomize conditional category variables.

2023-11-03 Thread avi.e.gross
Just a minor point in the suggested solution:

df$LAP <- with(df, ifelse(G=='male', (WC-65)*TG, (WC-58)*TG))

since WC and TG are not conditional, would this be a slight improvement?

df$LAP <- with(df, TG*(WC - ifelse(G=='male', 65, 58)))



-Original Message-
From: R-help  On Behalf Of Jorgen Harmse via
R-help
Sent: Friday, November 3, 2023 11:56 AM
To: r-help@r-project.org; mkzama...@gmail.com
Subject: Re: [R] I need to create new variables based on two numeric
variables and one dichotomize conditional category variables.

df$LAP <- with(df, ifelse(G=='male', (WC-65)*TG, (WC-58)*TG))

That will do both calculations and merge the two vectors appropriately. It
will use extra memory, but it should be much faster than a 'for' loop.

Regards,
Jorgen Harmse.

--

Message: 8
Date: Fri, 3 Nov 2023 11:10:49 +1030
From: "Md. Kamruzzaman" 
To: r-help@r-project.org
Subject: [R] I need to create new variables based on two numeric
variables and one dichotomize conditional category variables.
Message-ID:

Content-Type: text/plain; charset="utf-8"

Hello Everyone,
I have three variables: Waist circumference (WC), serum triglyceride (TG)
level and gender. Waist circumference and serum triglyceride is numeric and
gender (male and female) is categorical. From these three variables, I want
to calculate the "Lipid Accumulation Product (LAP) Index". The equation to
calculate LAP is different for male and females. I am giving both equations
below.

LAP for male = (WC-65)*TG
LAP for female = (WC-58)*TG

My question is 'how can I calculate the LAP and create a single new column?

Your cooperation will be highly appreciated.

Thanks in advance.

With Regards

**

*Md Kamruzzaman*

*PhD **Research Fellow (**Medicine**)*
Discipline of Medicine and Centre of Research Excellence in Translating
Nutritional Science to Good Health
Adelaide Medical School | Faculty of Health and Medical Sciences
The University of Adelaide
Adelaide SA 5005

[[alternative HTML version deleted]]



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] [Tagged] Re: col.names in as.data.frame() ?

2023-10-28 Thread avi.e.gross
Jef, your terse reply was so constructive that you converted me! LOL!

That is an interesting point though that I remain a bit unclear on. 

Both data.frame and as.data.frame can be used in some ways similarly as in:

> data.frame(matrix(1:12, nrow=3))
  X1 X2 X3 X4
1  1  4  7 10
2  2  5  8 11
3  3  6  9 12

> as.data.frame(matrix(1:12, nrow=3))
  V1 V2 V3 V4
1  1  4  7 10
2  2  5  8 11
3  3  6  9 12

But yes, the constructor accepts many arguments  while the converter really 
normally handles a single object.

Where do some other things like cbind() fit, not to mention a dplyr function 
like tribble()?

I do wonder though, why asking a converter to convert a matrix to a data.frame 
and perhaps adding column names, is considered changing the object contents. 
The manual page for as.data.frame seems to include quite a few options to 
specify the name of a single column, truncate column names, deal with row 
names, and more as well as whether to convert strings to factors. Are those 
things different enough than what we are discussing?

Of course, we may indeed be experiencing mission creep where something simple 
keeps being improved with new features until the original simplicity and 
clarity gets lost.


-Original Message-
From: R-help  On Behalf Of Jeff Newmiller via 
R-help
Sent: Saturday, October 28, 2023 2:54 PM
To: r-help@r-project.org; Boris Steipe ; R. Mailing 
List 
Subject: Re: [R] [Tagged] Re: col.names in as.data.frame() ?

as.data.frame is a _converter_, while data.frame is a _constructor_.  Changing 
the object contents is not what a conversion is for.

On October 28, 2023 11:39:22 AM PDT, Boris Steipe  
wrote:
>Thanks Duncan and Avi!
>
>That you could use NULL in a matrix() dimnames = list(...) argument wasn't 
>clear to me. I thought that would be equivalent to a one-element list - and 
>thereby define rownames. So that's good to know.
>
>The documentation could be more explicit - but it is probably more work to do 
>that than just patch the code to honour a col.names argument. (At least I 
>can't see a reason not to.)
>
>Thanks again!
>:-)
>
>
>
>
>> On Oct 28, 2023, at 14:24, avi.e.gr...@gmail.com wrote:
>> 
>> Борис,
>> 
>> Try this where you tell matrix the column names you want:
>> 
>> nouns <- as.data.frame(
>>  matrix(c(
>>"gaggle",
>>"geese",
>> 
>>"dule",
>>"doves",
>> 
>>"wake",
>>"vultures"
>>  ), 
>>  ncol = 2, 
>>  byrow = TRUE, 
>>  dimnames=list(NULL, c("collective", "category"
>> 
>> Result:
>> 
>>> nouns
>>  collective category
>> 1 gagglegeese
>> 2   duledoves
>> 3   wake vultures
>> 
>> 
>> The above simply names the columns earlier when creating the matrix.
>> 
>> There are other ways and the way you tried LOOKS like it should work but
>> fails for me with a message about it weirdly expecting three rows versus two
>> which seems to confuse rows and columns. My version of R is recent and I
>> wonder if there is a bug here.
>> 
>> Consider whether you really need the data.frame created in a single
>> statement or can you change the column names next as in:
>> 
>> 
>>> nouns
>>  V1   V2
>> 1 gagglegeese
>> 2   duledoves
>> 3   wake vultures
>>> colnames(nouns)
>> [1] "V1" "V2"
>>> colnames(nouns) <- c("collective", "category")
>>> nouns
>>  collective category
>> 1 gagglegeese
>> 2   duledoves
>> 3   wake vultures
>> 
>> Is there a known bug here or is the documentation wrong?
>> 
>> -Original Message-
>> From: R-help  On Behalf Of Boris Steipe
>> Sent: Saturday, October 28, 2023 1:54 PM
>> To: R. Mailing List 
>> Subject: [R] col.names in as.data.frame() ?
>> 
>> I have been trying to create a data frame from some structured text in a
>> single expression. Reprex:
>> 
>> nouns <- as.data.frame(
>>  matrix(c(
>>"gaggle",
>>"geese",
>> 
>>"dule",
>>"doves",
>> 
>>"wake",
>>"vultures"
>>  ), ncol = 2, byrow = TRUE),
>>  col.names = c("collective", "category")
>> )
>> 
>> But ... :
>> 
>>> str(nouns)
>> 'data.frame': 3 obs. of  2 variables:
>> $ V1: chr  "gaggle" "dule" "wake"
>> $ V2: chr  "geese" "doves" "vultures"
>> 
>> i.e. the col.names argument does nothing. From my reading of ?as.data.frame,
>> my example should have worked.
>> 
>> I know how to get the required result with colnames(), but I would like to
>> understand why the idiom as written didn't work, and how I could have known
>> that from the help file.
>> 
>> 
>> Thanks!
>> Boris
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>> 
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting 

Re: [R] col.names in as.data.frame() ?

2023-10-28 Thread avi.e.gross
Борис,

Try this where you tell matrix the column names you want:

nouns <- as.data.frame(
  matrix(c(
"gaggle",
"geese",

"dule",
"doves",

"wake",
"vultures"
  ), 
  ncol = 2, 
  byrow = TRUE, 
  dimnames=list(NULL, c("collective", "category"

Result:

> nouns
  collective category
1 gagglegeese
2   duledoves
3   wake vultures


The above simply names the columns earlier when creating the matrix.

There are other ways and the way you tried LOOKS like it should work but
fails for me with a message about it weirdly expecting three rows versus two
which seems to confuse rows and columns. My version of R is recent and I
wonder if there is a bug here.

Consider whether you really need the data.frame created in a single
statement or can you change the column names next as in:


> nouns
  V1   V2
1 gagglegeese
2   duledoves
3   wake vultures
> colnames(nouns)
[1] "V1" "V2"
> colnames(nouns) <- c("collective", "category")
> nouns
  collective category
1 gagglegeese
2   duledoves
3   wake vultures

Is there a known bug here or is the documentation wrong?

-Original Message-
From: R-help  On Behalf Of Boris Steipe
Sent: Saturday, October 28, 2023 1:54 PM
To: R. Mailing List 
Subject: [R] col.names in as.data.frame() ?

I have been trying to create a data frame from some structured text in a
single expression. Reprex:

nouns <- as.data.frame(
  matrix(c(
"gaggle",
"geese",

"dule",
"doves",

"wake",
"vultures"
  ), ncol = 2, byrow = TRUE),
  col.names = c("collective", "category")
)

But ... :

> str(nouns)
'data.frame':   3 obs. of  2 variables:
 $ V1: chr  "gaggle" "dule" "wake"
 $ V2: chr  "geese" "doves" "vultures"

i.e. the col.names argument does nothing. From my reading of ?as.data.frame,
my example should have worked.

I know how to get the required result with colnames(), but I would like to
understand why the idiom as written didn't work, and how I could have known
that from the help file.


Thanks!
Boris

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to Reformat a dataframe

2023-10-28 Thread avi.e.gross
Paul,

I have snipped away your long message and want to suggest another approach
or way of thinking to consider.

You have received other good suggestions and I likely would have used
something like that, probably within the dplyr/tidyverse but consider
something simpler.

You seem to be viewing a data.frame as similar to a matrix that you want to
reformat. There are similarities but a data.frame is also different. A
Matrix actually may be the right way for you to deal with your data. Can you
read it in as a matrix or must it be a data.frame? 

The thing about a matrix is that underneath, it is just a linear vector
which you really seem to want. All your columns seem to be the same kind of
numeric and perhaps the order does not matter whether it is row major or
column major. So consider my smaller example. I am making a data.frame that
is smaller for illustration:

> small <- data.frame(A=1:4, B=5:8, C=9:12)
> small
  A B  C
1 1 5  9
2 2 6 10
3 3 7 11
4 4 8 12

Now I am making it a matrix and keeping the columns the same:

> small.mat <- as.matrix(small)
> small.mat
 A B  C
[1,] 1 5  9
[2,] 2 6 10
[3,] 3 7 11
[4,] 4 8 12

This can be linearized into a vector in many ways such as this:

> small.vec <- as.vector(small.mat)
> small.vec
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

You can make that into a data.frame if you like:

> revised <- data.frame(colname=small.vec)
> revised
   colname
11
22
33
44
55
66
77
88
99
10  10
11  11
12  12

Of course, the above can be combined into more of a one-liner or made more
efficient. But in some cases, if you know the exact details of your
data.frame, you can spell out a way to combine the columns trivially. In my
example, I have three columns that can simply be concatenated into a vector
like so:

> small.onecol <- data.frame(onecol=c(small$A, small$B, small$C))
> small.onecol
   onecol
1   1
2   2
3   3
4   4
5   5
6   6
7   7
8   8
9   9
10 10
11 11
12 12

This is not a generalized solution but is simple enough even with the number
of columns you have. You are simply consolidating the vectors into one
bigger one. If you want to connect many, there are shorter loops that can do
it as in:

> cols <- colnames(small)
> cols
[1] "A" "B" "C"
> 
> new <- vector(mode="numeric", length=0)
> for (col in cols) {
+   new <- append(new, small[[col]])
+ }
> new
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
> 
> new.df <- data.frame(newname=new)
> new.df
   newname
11
22
33
44
55
66
77
88
99
10  10
11  11
12  12

The number of ways to do what you want is huge. You can pick a way that
makes more sense to you, especially the ones others have supplied, or one
that seems more efficient. As noted, all methods may also need to deal with
your NA issue at some stage.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to test for numeric digits?

2023-10-20 Thread avi.e.gross
Leonard,

Since it now seems a main consideration you have is speed/efficiency, maybe a 
step back might help.

Are there simplifying assumptions that are valid or can you make it simpler, 
such as converting everything to the same case?

Your sample data was this and I assume your actual data is similar and far 
longer.

c("Li", "Na", "K",  "2", "Rb", "Ca", "3")

So rather than use complex and costly regular expressions, or other full 
searches, can you just assume all entries start with either an uppercase letter 
orn a numeral and test for those usinnd something simple like
> substr(c("Li", "Na", "K",  "2", "Rb", "Ca", "3"), 1, 1)
[1] "L" "N" "K" "2" "R" "C" "3"

If you save that in a variable you can check if that is greater than or equal 
to "A" or perhaps "0" and also perhaps if it is less than or equal to "Z" or 
perhaps "9" and see if such a test is faster.

orig <- c("Li", "Na", "K",  "2", "Rb", "Ca", "3")
initial <- substr(orig, 1, 1)
elements_bool <- initial >= "A" & initial <= "Z"

The latter contains a Boolean vector you can use to index your original and 
toss away the ones with digits, or any lower case letter versions or any other 
UNICODE symbols.

orig_elements <- orig[elements_bool]

> orig
[1] "Li" "Na" "K"  "2"  "Rb" "Ca" "3" 
> orig_elements
[1] "Li" "Na" "K"  "Rb" "Ca"
> orig[!elements_bool]
[1] "2" "3"

Other approaches you might consider depending on your needs is to encapsulate 
your data as a column in a data.frame or tibble or other such construct and 
generate additional columns along the way that keep your information 
consolidated in what could be an efficient way especially if you shift some of 
your logic to using faster compiled functionality and perhaps using packages 
that fit your needs better such as data.table or dplyr and other things in the 
tidyverse. And note if using pipelines, for many purposes, the new built-in 
pipelines may be faster.


-Original Message-
From: R-help  On Behalf Of Leonard Mada via R-help
Sent: Wednesday, October 18, 2023 10:59 AM
To: R-help Mailing List 
Subject: [R] Best way to test for numeric digits?

Dear List members,

What is the best way to test for numeric digits?

suppressWarnings(as.double(c("Li", "Na", "K",  "2", "Rb", "Ca", "3")))
# [1] NA NA NA  2 NA NA  3
The above requires the use of the suppressWarnings function. Are there 
any better ways?

I was working to extract chemical elements from a formula, something 
like this:
split.symbol.character = function(x, rm.digits = TRUE) {
 # Perl is partly broken in R 4.3, but this works:
 regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
 # stringi::stri_split(x, regex = regex);
 s = strsplit(x, regex, perl = TRUE);
 if(rm.digits) {
 s = lapply(s, function(s) {
 isNotD = is.na(suppressWarnings(as.numeric(s)));
 s = s[isNotD];
 });
 }
 return(s);
}

split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))


Sincerely,


Leonard


Note:
# works:
regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)


# broken in R 4.3.1
# only slightly "erroneous" with stringi::stri_split
regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Best way to test for numeric digits?

2023-10-18 Thread avi.e.gross
Rui,

The problem with searching for elements, as with many kinds of text, is that 
the optimal search order may depend on the probabilities of what is involved. 
There can be more elements added such as Unobtainium in the future with 
whatever abbreviations that may then change the algorithm you may have chosen 
but then again, who actually looks for elements with a negligible half-life?

If you had an application focused on Organic Chemistry, a relatively few of the 
elements would normally be present while for something like electronics 
components of some kind, a different overlapping palette with probabilities can 
be found.

Just how important is the efficiency for you? If this was in a language like 
python, I would consider using a dictionary or set and I think there are 
packages in R that support a version of this.  In your case, one solution can 
be to pre-create a dictionary of all the elements, or just a set, and take your 
word tokens and check if they are in the dictionary/set or not. Any that aren't 
can then be further examined as needed and if your data is set a specific way, 
they may all just end up to be numeric. The cost is the hashing and of course 
memory used. Your corpus of elements is small enough that this may not be as 
helpful as parsing text that can contain many thousands of words.

Even in plain R, you can probably also use something like:

elements = c("H", "He", "Li", ...)
If (text %in% elements) ...

Something like the above may not be faster but can be quite a bit more readable 
than the regular expressions

But plenty of the solutions others offered may well be great for your current 
need.

Some may even work with Handwavium.

-Original Message-
From: R-help  On Behalf Of Leonard Mada via R-help
Sent: Wednesday, October 18, 2023 12:24 PM
To: Rui Barradas ; R-help Mailing List 

Subject: Re: [R] Best way to test for numeric digits?

Dear Rui,

Thank you for your reply.

I do have actually access to the chemical symbols: I have started to 
refactor and enhance the Rpdb package, see Rpdb::elements:
https://github.com/discoleo/Rpdb

However, the regex that you have constructed is quite heavy, as it needs 
to iterate through all chemical symbols (in decreasing nchar). Elements 
like C, and especially O, P or S, appear late in the regex expression - 
but are quite common in chemistry.

The alternative regex is (in this respect) simpler. It actually works 
(once you know about the workaround).

Q: My question focused if there is anything like is.numeric, but to 
parse each element of a vector.

Sincerely,


Leonard


On 10/18/2023 6:53 PM, Rui Barradas wrote:
> Às 15:59 de 18/10/2023, Leonard Mada via R-help escreveu:
>> Dear List members,
>>
>> What is the best way to test for numeric digits?
>>
>> suppressWarnings(as.double(c("Li", "Na", "K",  "2", "Rb", "Ca", "3")))
>> # [1] NA NA NA  2 NA NA  3
>> The above requires the use of the suppressWarnings function. Are there
>> any better ways?
>>
>> I was working to extract chemical elements from a formula, something
>> like this:
>> split.symbol.character = function(x, rm.digits = TRUE) {
>>   # Perl is partly broken in R 4.3, but this works:
>>   regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
>>   # stringi::stri_split(x, regex = regex);
>>   s = strsplit(x, regex, perl = TRUE);
>>   if(rm.digits) {
>>   s = lapply(s, function(s) {
>>   isNotD = is.na(suppressWarnings(as.numeric(s)));
>>   s = s[isNotD];
>>   });
>>   }
>>   return(s);
>> }
>>
>> split.symbol.character(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"))
>>
>>
>> Sincerely,
>>
>>
>> Leonard
>>
>>
>> Note:
>> # works:
>> regex = "(?<=[A-Z])(?![a-z]|$)|(?<=.)(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
>> strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)
>>
>>
>> # broken in R 4.3.1
>> # only slightly "erroneous" with stringi::stri_split
>> regex = "(?<=[A-Z])(?![a-z]|$)|(?=[A-Z])|(?<=[a-z])(?=[^a-z])";
>> strsplit(c("CCl3F", "Li4Al4H16", "CCl2CO2AlPO4SiO4Cl"), regex, perl = T)
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://eu01.z.antigena.com/l/boS9jwics77ZHEe0yO-Lt8AIDZm9-s6afEH4ulMO3sMyE9mLHNAR603_eeHQG2-_t0N2KsFVQRcldL-XDy~dLMhLtJWX69QR9Y0E8BCSopItW8RqG76PPj7ejTkm7UOsLQcy9PUV0-uTjKs2zeC_oxUOrjaFUWIhk8xuDJWb
>> PLEASE do read the posting guide
>> https://eu01.z.antigena.com/l/rUSt2cEKjOO0HrIFcEgHH_NROfU9g5sZ8MaK28fnBl9G6CrCrrQyqd~_vNxLYzQ7Ruvlxfq~P_77QvT1BngSg~NLk7joNyC4dSEagQsiroWozpyhR~tbGOGCRg5cGlOszZLsmq2~w6qHO5T~8b5z8ZBTJkCZ8CBDi5KYD33-OK
>> and provide commented, minimal, self-contained, reproducible code.
> Hello,
>
> If you want to extract chemical elements symbols, the following might work.
> It uses the periodic table in GitHub package chemr and a package stringr
> function.
>
>
> devtools::install_github("paleolimbot/chemr")
>
>
>
> split_chem_elements <- 

Re: [R] transform a list of arrays to tibble

2023-10-17 Thread avi.e.gross
Arnaud,


Short answer may be that the tibble data structure will not be supporting row 
names and you may want to simply save those names in an additional column or 
externally.

My first thought was to simply save the names you need and then put them back 
on the tibble. In your code, something like this:

save.names <- names(my.ret.lst)
result.tib <- as_tibble_col(unlist(my.ret.lst), column_name = 'return')
rownames(result.tib) <- save.names

Unfortunately, I got an error message:

> save.names
[1] "BTCUSDT" "ETHUSDT" "TRXUSDT"
> rownames(result.tib) <- save.names
Warning message:
Setting row names on a tibble is deprecated. 
Error in exists(cacheKey, where = .rs.WorkingDataEnv, inherits = FALSE) : 
  invalid first argument
Error in assign(cacheKey, frame, .rs.CachedDataEnv) : 
  attempt to use zero-length variable name

If a tibble deprecates row names, it may not be the ideal storage for you. A 
plain data.frame works:

> result.df <- as.data.frame(result.tib)
> rownames(result.df) <- save.names
> result.df
return
BTCUSDT  15.36
ETHUSDT   4.06
TRXUSDT  10.90

Trying to convert it to a tibble, as anticipated, is not working for me:

> as.tibble(result.df)
# A tibble: 3 × 1
  return
   
1  15.4 
2   4.06
3  10.9 
Warning message:
`as.tibble()` was deprecated in tibble 2.0.0.
ℹ Please use `as_tibble()` instead.
ℹ The signature and semantics have changed, see `?as_tibble`.
This warning is displayed once every 8 hours.
Call `lifecycle::last_lifecycle_warnings()` to see where this
warning was generated.

You can, instead, create a matrix and assign the row and column names you save 
or create:

result.mat <- matrix(my.ret.lst)
colnames(result.mat) <- c("return")
rownames(result.mat) <- save.names

> result.mat
return
BTCUSDT 15.36 
ETHUSDT 4.06  
TRXUSDT 10.9

But saving a matrix to reuse has other considerations.

So, if I may make a suggestion, if you really want a tibble that allows you to 
know what each row is for, consider one of many methods for saving the previous 
row names as a new column. I used that to take the data.frame version I made 
above and got:

> temp <- as_tibble(result.df, rownames="rows")
> temp
# A tibble: 3 × 2
  rowsreturn
  
1 BTCUSDT  15.4 
2 ETHUSDT   4.06
3 TRXUSDT  10.9

Note the above uses as_tibble with an underscore, but many other ways to make a 
column exist.


-Original Message-
From: R-help  On Behalf Of arnaud gaboury
Sent: Tuesday, October 17, 2023 4:30 AM
To: r-help 
Subject: [R] transform a list of arrays to tibble

I work with a list of crypto assets daily closing prices in a xts
class. Here is a limited example:

asset.xts.lst <- list(BTCUSDT = structure(c(26759.63, 26862, 26852.48, 27154.15,
27973.45), dim = c(5L, 1L), index = structure(c(1697068800, 1697155200,
1697241600, 1697328000, 1697414400), tzone = "UTC", tclass = "Date"),
class = c("xts",
"zoo")), ETHUSDT = structure(c(1539.61, 1552.16, 1554.94, 1557.77,
1579.73), dim = c(5L, 1L), index = structure(c(1697068800, 1697155200,
1697241600, 1697328000, 1697414400), tzone = "UTC", tclass = "Date"),
class = c("xts",
"zoo")), TRXUSDT = structure(c(0.08481, 0.08549, 0.08501, 0.08667,
0.08821), dim = c(5L, 1L), index = structure(c(1697068800, 1697155200,
1697241600, 1697328000, 1697414400), tzone = "UTC", tclass = "Date"),
class = c("xts",
"zoo")))

I will compute some function from PerformanceAnalytics package and
write all results in a tibble. Let's apply a first function,
Return.annualized() (at first I computed returns from daily prices). I
have now a list of arrays named my.ret.lst:

my.ret.lst <- list(BTCUSDT = structure(15.36, dim = c(1L, 1L), dimnames = list(
"Annualized Return", NULL)), ETHUSDT = structure(4.06, dim = c(1L,
1L), dimnames = list("Annualized Return", NULL)), TRXUSDT =
structure(10.9, dim = c(1L,
1L), dimnames = list("Annualized Return", NULL)))

Now I can't find how to build a  tibble in a specific format (asset
names as row names and observations as column names) .
I can of course run:
> mytb <- as_tibble(unlist(my.ret.lst)
but I loose row and column names.
> as_tibble_col(unlist(my.ret.lst), column_name = 'return')
will give me the wanted column name but row names (in my case asset
names) are missing.


Thank you for help

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Question about R software and output

2023-10-03 Thread avi.e.gross
Charity,

As some of the answers I have seen show, your question is not clear.

You need to be clear on what you mean about R software and other concepts
before an answer makes sense.

The Base version of R may come on your computer already but likely has been
installed from some external source, or updated and the CRAN mirrors are one
such installation. As far as I know, it downloads the main R software as
well as packages considered important as part of that distribution and from
then on, you can disconnect your computer from any network and it will work
fine for any programs that operate only locally. 

Of course any program you run that wishes to access external resources can
do so. In particular, it is quite common for you at the prompt, or within an
R program, to ask for a package stored externally to be placed within your
computer but once done, it remains there until and unless you want to get a
newer version or something.

There are packages that include data within themselves and may well be some
that dynamically go out and get info from somewhere but I know of no simple
way to find out other than reading the source code or seeing what happens if
you disconnect from the internet.

Your mention of EXCEL also needs clarification. EXCEL has absolutely nothing
to do with R. It is an unrelated product and you do not need to have it
installed to run R and vice versa. That does not mean your program cannot
interact with EXCEL or the files it makes, just that it is not part of base
R. R does allow importing data (locally or not) in lots of formats including
some that EXCEL can save data in. But generally, internal to R there are
storage methods such as the data.frame that hold data and manipulate it and
a typical R program may read in some data from files like .CSV that can be
saved from EXCEL or many other sources or with the proper packages, you can
read directly from more native EXCEL formats such as XLSX files. And, you
can write out results in many ways using packages that include those
formats.

Once data has been received, I know of no way in R that tags it with the
source of the data as being internal or external. Often the fact that
something is external is well hidden as many R packages have ways to access
external data as if it were local.

Perhaps you can explain more clearly what your concerns are. But note R is
not atypical among computer languages and many others might share the same
issues that concern you. 

Avi

-Original Message-
From: R-help  On Behalf Of Ferguson Charity
(CEMINFERGUSON)
Sent: Monday, October 2, 2023 3:49 AM
To: r-help@r-project.org
Subject: [R] Question about R software and output

To whom it may concern,



My understanding is that the R software is downloaded from a CRAN network
and data is imported into it using Microsoft Excel for example. Could I
please just double check whether any data or results from the output is held
on external servers or is it just held on local files on the computer?



Many thanks,



Charity



*

The information contained in this message and or attachments is intended
only for the
person or entity to which it is addressed and may contain confidential
and/or 
privileged material. Unless otherwise specified, the opinions expressed
herein do not
necessarily represent those of Guy's and St Thomas' NHS Foundation Trust or
any of its subsidiaries. The information contained in this e-mail may be
subject to 
public disclosure under the Freedom of Information Act 2000. Unless the
information 
is legally exempt from disclosure, the confidentiality of this e-mail and
any replies
cannot be guaranteed.

Any review, retransmission,dissemination or other use of, or taking of any
action in 
reliance upon, this information by persons or entities other than the
intended
recipient is prohibited. If you received this in error, please contact the
sender
and delete the material from any system and destroy any copies.

We make every effort to keep our network free from viruses. However, it is
your
responsibility to ensure that this e-mail and any attachments are free of
viruses as
we can take no responsibility for any computer virus which might be
transferred by 
way of this e-mail.


*

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, 

Re: [R] How to fix this problem

2023-09-25 Thread avi.e.gross
David,

This may just be the same as your earlier problem. When the type of a column is 
guessed by looking at the early entries, any non-numeric entry forces the 
entire column to be character.

Suggestion: fix your original EXCEL FILE or edit your CSV to remove the last 
entries that look just lie commas.


-Original Message-
From: R-help  On Behalf Of Parkhurst, David
Sent: Sunday, September 24, 2023 2:06 PM
To: r-help@r-project.org
Subject: [R] How to fix this problem

I have a matrix, KD6, and I�m trying to get a correlation matrix from it.  When 
I enter cor(KD6), I get the message �Error in cor(KD6) : 'x' must be numeric�.
Here are some early lines from KD6:
Flow  E..coliTNSRP TPTSS
1  38.82,4201.65300 0.0270 0.0630  66.80
2 133.02,4201.39400 0.0670 0.1360   6.80
3  86.2   101.73400 0.0700 0.1720  97.30
4   4.85,3900.40400 0.0060 0.0280   8.50
5   0.32,4900.45800 0.0050 0.0430  19.75
6   0.0  1860.51200 0.0040 0.0470  12.00
7  11.19,8351.25500 0.0660 0.1450  12.20

Why are these not numeric?
There are some NAs later in the matrix, but I get this same error if I ask for 
cor(KD6[1:39,]) to leave out the lines with NAs.  Are they a problem anyway?

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Odd result

2023-09-24 Thread avi.e.gross
David,

You have choices depending on your situation and plans.

Obviously the ideal solution is to make any CSV you save your EXCEL data in to 
have exactly what you want. So if your original EXCEL file contains things like 
a blank character down around row 973, get rid of it or else all lines to there 
may be picked up and made into an NA. I suggest deleting all extra lines as a 
first try.

The other method to try is simply to read in the file and only keep complete 
cases. But your data shows you can have an NA in some columns, such as for 7/25 
so using complete.cases() is not a good choice.

So since your first column  (or maybe second) seems to be a date and I think 
that is not optional, simply filter your data.frame to remove all rows where 
is.na(DF$COL) is TRUE or some similar stratagem such as checking if all columns 
are NA.

My guess is you may have re-used an EXCEL file and put new shorter data in it, 
or that the file has been edited and something was left where it should not be, 
perhaps something non-numeric. 

Another idea is to NOT use the CSV route and use one of many packages carefully 
to read the data from a native EXCEL format such as an XSLX file where you can 
specify which tab you want and where on the page you want to read from. You can 
point it at the precise rectangular area you want.

And, of course, there are an assortment of cut/paste ways to get the data into 
tour R program, albeit if the data can change and you need to run the analysis 
again, these are less useful. Here is an example making use of the fact that on 
Windows, the copied text is tab separated.

text="A B
1   0
2   1
3   2
4   3
5   4
6   5
7   6
8   7
9   8
10  9
"
df=read.csv(text=text, sep="\t")
df

A B
1   1 0
2   2 1
3   3 2
4   4 3
5   5 4
6   6 5
7   7 6
8   8 7
9   9 8

-Original Message-
From: R-help  On Behalf Of Parkhurst, David
Sent: Saturday, September 23, 2023 6:55 PM
To: r-help@r-project.org
Subject: [R] Odd result

With help from several people, I used file.choose() to get my file name, and 
read.csv() to read in the file as KurtzData.  Then when I print KurtzData, the 
last several lines look like this:
39   5/31/22  16.0  3411.75525 0.0201 0.0214   7.00
40   6/28/22  2:00 PM  0.0  2150.67950 0.0156 0.0294 NA
41   7/25/22 11:00 AM  11.9   1943.5NA NA 0.0500   7.80
42   8/31/22  0220.5NA NA 0.0700  30.50
43   9/28/22  0.067 10.9NA NA 0.0700  10.20
44  10/26/22  0.086  237NA NA 0.1550  45.00
45   1/12/23  1:00 PM 36.2624196NA NA 0.7500 283.50
46   2/14/23  1:00 PM 20.71   55NA NA 0.0500   2.40
47  NA NA NA NA
48  NA NA NA NA
49  NA NA NA NA

Then the NA�s go down to one numbered 973.  Where did those extras likely come 
from, and how do I get rid of them?  I assume I need to get rid of all the 
lines after #46,  to do calculations and graphics, no?

David

[[alternative HTML version deleted]]


-Original Message-
From: R-help  On Behalf Of Parkhurst, David
Sent: Saturday, September 23, 2023 6:55 PM
To: r-help@r-project.org
Subject: [R] Odd result

With help from several people, I used file.choose() to get my file name, and 
read.csv() to read in the file as KurtzData.  Then when I print KurtzData, the 
last several lines look like this:
39   5/31/22  16.0  3411.75525 0.0201 0.0214   7.00
40   6/28/22  2:00 PM  0.0  2150.67950 0.0156 0.0294 NA
41   7/25/22 11:00 AM  11.9   1943.5NA NA 0.0500   7.80
42   8/31/22  0220.5NA NA 0.0700  30.50
43   9/28/22  0.067 10.9NA NA 0.0700  10.20
44  10/26/22  0.086  237NA NA 0.1550  45.00
45   1/12/23  1:00 PM 36.2624196NA NA 0.7500 283.50
46   2/14/23  1:00 PM 20.71   55NA NA 0.0500   2.40
47  NA NA NA NA
48  NA NA NA NA
49  NA NA NA NA

Then the NA�s go down to one numbered 973.  Where did those extras likely come 
from, and how do I get rid of them?  I assume I need to get rid of all the 
lines after #46,  to do calculations and graphics, no?

David

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2

2023-08-18 Thread avi.e.gross
This discussion is sooo familiar.

If you want indefinite precision arithmetic, feel free to use a language and 
data type that supports it.

Otherwise, only do calculations that fit in a safe zone.

This is not just about this scenario. Floating point can work well when adding 
(or subtracting) two numbers of about the same size. But if one number is 
.123456789... and another is the same except raised to the -45th power of ten, 
then adding them effectively throws away the second number.

This is a well-known problem for any finite binary representation of numbers. 
In the example given, yes, the smaller the number is, the worse the behavior in 
this case tends to be.

There are many solutions and some are fairly expensive in terms of computation 
time and sometimes memory usage. 

Are there any good indefinite (or much higher) precision packages out there 
that would not only support the data type needed but also properly be used and 
passed along to the functions used to do complex calculations? No, I am not 
asking for indefinite precision complex numbers, but generally that would be a 
tuple of such numbers.


-Original Message-
From: R-help  On Behalf Of Bert Gunter
Sent: Friday, August 18, 2023 7:06 PM
To: Leonard Mada 
Cc: R-help Mailing List ; Martin Maechler 

Subject: Re: [R] Numerical stability of: 1/(1 - cos(x)) - 2/x^2

"The ugly thing is that the error only gets worse as x decreases. The
value neither drops to 0, nor does it blow up to infinity; but it gets
worse in a continuous manner."

If I understand you correctly, this is wrong:

> x <- 2^(-20) ## considerably less then 1e-4 !!
> y <- 1 - x^2/2;
> 1/(1 - y) - 2/x^2
[1] 0

It's all about the accuracy of the binary approximation of floating point
numbers (and their arithmetic)

Cheers,
Bert


On Fri, Aug 18, 2023 at 3:25 PM Leonard Mada via R-help <
r-help@r-project.org> wrote:

> I have added some clarifications below.
>
> On 8/18/2023 10:20 PM, Leonard Mada wrote:
> > [...]
> > After more careful thinking, I believe that it is a limitation due to
> > floating points:
> > [...]
> >
> > The problem really stems from the representation of 1 - x^2/2 as shown
> > below:
> > x = 1E-4
> > print(1 - x^2/2, digits=20)
> > print(0.5, digits=20) # fails
> > # 0.50003039
>
> The floating point representation of 1 - x^2/2 is the real culprit:
> # 0.50003039
>
> The 3039 at the end is really an error due to the floating point
> representation. However, this error blows up when inverting the value:
> x = 1E-4;
> y = 1 - x^2/2;
> 1/(1 - y) - 2/x^2
> # 1.215494
> # should be 1/(x^2/2) - 2/x^2 = 0
>
>
> The ugly thing is that the error only gets worse as x decreases. The
> value neither drops to 0, nor does it blow up to infinity; but it gets
> worse in a continuous manner. At least the reason has become now clear.
>
>
> >
> > Maybe some functions of type cos1p and cos1n would be handy for such
> > computations (to replace the manual series expansion):
> > cos1p(x) = 1 + cos(x)
> > cos1n(x) = 1 - cos(x)
> > Though, I do not have yet the big picture.
> >
>
> Sincerely,
>
>
> Leonard
>
> >
> >
> > On 8/17/2023 1:57 PM, Martin Maechler wrote:
> >>> Leonard Mada
> >>>  on Wed, 16 Aug 2023 20:50:52 +0300 writes:
> >>  > Dear Iris,
> >>  > Dear Martin,
> >>
> >>  > Thank you very much for your replies. I add a few comments.
> >>
> >>  > 1.) Correct formula
> >>  > The formula in the Subject Title was correct. A small glitch
> >> swept into
> >>  > the last formula:
> >>  > - 1/(cos(x) - 1) - 2/x^2
> >>  > or
> >>  > 1/(1 - cos(x)) - 2/x^2 # as in the subject title;
> >>
> >>  > 2.) log1p
> >>  > Actually, the log-part behaves much better. And when it fails,
> >> it fails
> >>  > completely (which is easy to spot!).
> >>
> >>  > x = 1E-6
> >>  > log(x) -log(1 - cos(x))/2
> >>  > # 0.3465291
> >>
> >>  > x = 1E-8
> >>  > log(x) -log(1 - cos(x))/2
> >>  > # Inf
> >>  > log(x) - log1p(- cos(x))/2
> >>  > # Inf => fails as well!
> >>  > # although using only log1p(cos(x)) seems to do the trick;
> >>  > log1p(cos(x)); log(2)/2;
> >>
> >>  > 3.) 1/(1 - cos(x)) - 2/x^2
> >>  > It is possible to convert the formula to one which is
> >> numerically more
> >>  > stable. It is also possible to compute it manually, but it
> >> involves much
> >>  > more work and is also error prone:
> >>
> >>  > (x^2 - 2 + 2*cos(x)) / (x^2 * (1 - cos(x)))
> >>  > And applying L'Hospital:
> >>  > (2*x - 2*sin(x)) / (2*x * (1 - cos(x)) + x^2*sin(x))
> >>  > # and a 2nd & 3rd & 4th time
> >>  > 1/6
> >>
> >>  > The big problem was that I did not expect it to fail for x =
> >> 1E-4. I
> >>  > thought it is more robust and works maybe until 1E-5.
> >>  > x = 1E-5
> >>  > 2/x^2 - 2E+10
> >>  > # -3.814697e-06
> >>
> >>  > This is the reason why I believe that there is room for
> >> improvement.
> 

Re: [R] Stacking matrix columns

2023-08-06 Thread avi.e.gross
This topic is getting almost funny as there are an indefinite ever-sillier
set of ways to perform the action and even more if you include packages like
purr.

If mymat is a matrix, several variants work such as:

> mymat
 [,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12
> unlist(as.list(mymat))
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
  1   2   3   4   5   6   7   8   9  10  11  12 
> as.vector(unlist(as.data.frame(mymat)))
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

But as noted repeatedly, since underneath it all, mymat is a vector with a
dim attribute, the trivial solution is to set that to NULL or make it 1-D so
the internal algorithm works.

Still, if brute force programming as is done in earlier languages is
important to someone, and you do not want to do it in a language like C,
this would work too.


You could use a for loop in a brute force approach. Here is an example of a
function that I think does what you want and accepts not just matrices (2D
only, no arrays) of any kind but also data.frame or tibble objects as well
as row or column vectors..

mat2vec <- function(mat) {
  # Accept a matrix of any type and return it as
  # a vector stacked by columns.
  
  # Do it the dumb way for illustration.
  
  # If fed a data.frame or tibble, convert it to a matrix first.
  # And handle row or column vectors
  if (is.data.frame(mat)) mat <- as.matrix(mat)
  if (is.vector(mat)) mat <- as.matrix(mat)
  
  # Calculate the rows and columns and the typeof
  # to initialize a vector to hold the result.
  rows <- dim(mat)[1L]
  cols <- dim(mat)[2L]
  type <- typeof(mat)
  result <- vector(length=rows*cols, mode=type)
  
  index <- 1
  
  # Double loop to laboriously copy the items to the result
  
  for (col in 1:cols) {
for (row in 1:rows) {
  result[index] <- mat[row, col]
  index <- index + 1
} # end inner loop on rows
  } # end outer loop on cols
  
  return (result)
} # end function daffynition mat2vec

Checking if it works on many cases:

> mymat
 [,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12
> mat2vec(mymat)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

> mydf
  V1 V2 V3 V4
1  1  4  7 10
2  2  5  8 11
3  3  6  9 12
> mat2vec(mydf)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

> mytib
# A tibble: 3 × 4
 V1V2V3V4
 
1 1 4 710
2 2 5 811
3 3 6 912
> mat2vec(mytib)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12

> myboolmat <- mymat <= 5
> myboolmat
 [,1]  [,2]  [,3]  [,4]
[1,] TRUE  TRUE FALSE FALSE
[2,] TRUE  TRUE FALSE FALSE
[3,] TRUE FALSE FALSE FALSE
> mat2vec(myboolmat)
 [1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

> mypi <- pi * mymat
> mat2vec(mypi)
 [1]  3.141593  6.283185  9.424778 12.566371 15.707963 18.849556 21.991149
25.132741 28.274334 31.415927
[11] 34.557519 37.699112

> myletters
 [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
[1,] "A"  "E"  "I"  "M"  "Q"  "U"  "Y"  "c"  "g"  "k"   "o"   "s"   "w"  
[2,] "B"  "F"  "J"  "N"  "R"  "V"  "Z"  "d"  "h"  "l"   "p"   "t"   "x"  
[3,] "C"  "G"  "K"  "O"  "S"  "W"  "a"  "e"  "i"  "m"   "q"   "u"   "y"  
[4,] "D"  "H"  "L"  "P"  "T"  "X"  "b"  "f"  "j"  "n"   "r"   "v"   "z"  
> mat2vec(myletters)
 [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R"
"S" "T" "U" "V" "W" "X" "Y" "Z" "a"
[28] "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
"t" "u" "v" "w" "x" "y" "z"

> myNA <- matrix(c(NA, Inf, 0), 4, 6)
> myNA
 [,1] [,2] [,3] [,4] [,5] [,6]
[1,]   NA  Inf0   NA  Inf0
[2,]  Inf0   NA  Inf0   NA
[3,]0   NA  Inf0   NA  Inf
[4,]   NA  Inf0   NA  Inf0
> mat2vec(myNA)
 [1]  NA Inf   0  NA Inf   0  NA Inf   0  NA Inf   0  NA Inf   0
[16]  NA Inf   0  NA Inf   0  NA Inf   0

> myvec <- 1:8
> mat2vec(myvec)
[1] 1 2 3 4 5 6 7 8
> mat2vec(t(myvec))
[1] 1 2 3 4 5 6 7 8

> mylist <- list("lions", "tigers", "bears")
> mylist
[[1]]
[1] "lions"

[[2]]
[1] "tigers"

[[3]]
[1] "bears"

> mat2vec(mylist)
[1] "lions"  "tigers" "bears" 

> myohmy <- list("lions", 1, "tigers", 2, "bears", 3, list("oh", "my"))
> myohmy
[[1]]
[1] "lions"

[[2]]
[1] 1

[[3]]
[1] "tigers"

[[4]]
[1] 2

[[5]]
[1] "bears"

[[6]]
[1] 3

[[7]]
[[7]][[1]]
[1] "oh"

[[7]][[2]]
[1] "my"

> mat2vec(myohmy)
[1] "lions"  "1"  "tigers" "2"  "bears"  "3"  "oh" "my"

Is it bulletproof? Nope. I could make sure it is not any other type or not a
type that cannot be coerced into something like a matrix, or perhaps is of
higher dimensionality.

But as noted, if you already have a valid matrix object, there is a trivial
way to get it by row.


-Original Message-
From: R-help  On Behalf Of Ebert,Timothy Aaron
Sent: Sunday, August 6, 2023 11:06 PM
To: Rui Barradas ; Iris Simmons ;
Steven Yen 
Cc: R-help Mailing List 
Subject: Re: [R] Stacking matrix columns



-Original 

Re: [R] Stacking matrix columns

2023-08-06 Thread avi.e.gross
Based on a private communication, it sounds like Steven is asking the question 
again because he wants a different solution that may be the way this might be 
done in another language. I think he wants to use loops explicitly and I 
suspect this may be along the lines of a homework problem for him.

R has loops and if you loop over some variable moving "col" from 1 to the 
number of columns, then if your matrix is called "mat" you can use mat[,col] to 
grab a whole column at a time and concatenate the results gradually into one 
longer vector.

If you want a more loopy solution, simply reserve a vector big enough to hold 
an M by N matric (size is M*N) and loop over both "row" and "col" and keep 
adding mat[row,col] to your growing vector.

As stated, this is not the way many people think in R, especially those of us 
who know a matrix is just a vector with a bonus.

-Original Message-
From: R-help  On Behalf Of Steven Yen
Sent: Saturday, August 5, 2023 8:08 PM
To: R-help Mailing List 
Subject: [R] Stacking matrix columns

I wish to stack columns of a matrix into one column. The following 
matrix command does it. Any other ways? Thanks.

 > x<-matrix(1:20,5,4)
 > x
  [,1] [,2] [,3] [,4]
[1,]16   11   16
[2,]27   12   17
[3,]38   13   18
[4,]49   14   19
[5,]5   10   15   20

 > matrix(x,ncol=1)
   [,1]
  [1,]1
  [2,]2
  [3,]3
  [4,]4
  [5,]5
  [6,]6
  [7,]7
  [8,]8
  [9,]9
[10,]   10
[11,]   11
[12,]   12
[13,]   13
[14,]   14
[15,]   15
[16,]   16
[17,]   17
[18,]   18
[19,]   19
[20,]   20
 >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stacking matrix columns

2023-08-06 Thread avi.e.gross
Eric,

I fully agreed with you that anyone doing serious work in various projects such 
as machine learning that make heavy use of mathematical data structures  would 
do well to find some decent well designed and possibly efficient packages to do 
much of the work rather than re-inventing their own way. As you note, it can be 
quite common to do things like flatten a matrix (or higher order array) into a 
vector and do tasks like image recognition using the vector form. 

Mind you, there are other packages that might be a better entry point than 
rTensor and which incorporate that package or similar ones for the user but 
supply other high-level ways to do things like set up neural networks. 

I think the primary focus of this group has been to deal with how to do things 
within a main R distribution and when people start suggesting ways using other 
packages, such as in the tidyverse, that not everyone knows or uses, then there 
is sometimes feedback suggesting a return to more standard topics. Personally, 
I am happy to look at any well-known and appreciated packages even if 
maintained by another company as long as it helps get work done easily. 

I downloaded rTensor just to take a look and see how it implements vec():

> vec
nonstandardGenericFunction for "vec" defined from package "rTensor"

function (tnsr) 
{
standardGeneric("vec")
}

So, no, they do not directly use the trick of setting the dim attribute and 
that may be because they are not operating on a naked matrix/vector but on an 
enhanced version that wraps it as a tensor. 

I experimented with making a small matrix and calling as.tensor() on it and it 
is quite a bit more complex:

> attributes(mymat)
$dim
[1] 3 4

> attributes(mytens)
$num_modes
[1] 2

$modes
[1] 3 4

$data
 [,1] [,2] [,3] [,4]
[1,]147   10
[2,]258   11
[3,]369   12

$class
[1] "Tensor"
attr(,"package")
[1] "rTensor"

This makes sense given that it has to store something more complex. Mind you, 
an R array does it simpler. I am confident this design has advantages for how 
the rTensor package does many other activities. It is more than is needed if 
the OP really has a simple case use.

I do note the vec() function produces the same result as one of the mentioned 
solutions of setting the dim attribute to NULL. 

It would not surprise me if other packages like TensorFlow or ones built on top 
of it like Keras, also have their own ways to do this simple task. The OP may 
want to choose a specific package instead, or as well, for meeting their other 
needs.

-Original Message-
From: Eric Berger  
Sent: Sunday, August 6, 2023 11:59 AM
To: avi.e.gr...@gmail.com
Cc: R-help Mailing List 
Subject: Re: [R] Stacking matrix columns

Avi,

I was not trying to provide the most economical solution. I was trying
to anticipate that people (either the OP or others searching for how
to stack columns of a matrix) might be motivated by calculations in
multilinear algebra, in which case they might be interested in the
rTensor package.


On Sun, Aug 6, 2023 at 6:16 PM  wrote:
>
> Eric,
>
> I am not sure your solution is particularly economical albeit it works for 
> arbitrary arrays of any dimension, presumably. But it seems to involve 
> converting a matrix to a tensor just to undo it back to a vector. Other 
> solutions offered here, simply manipulate the dim attribute of the data 
> structure.
>
> Of course, the OP may have uses in mind which the package might make easier. 
> We often get fairly specific questions here without the additional context 
> that may help guide a better answer.
>
> -Original Message-
> From: R-help  On Behalf Of Eric Berger
> Sent: Sunday, August 6, 2023 3:34 AM
> To: Bert Gunter 
> Cc: R-help Mailing List ; Steven Yen 
> Subject: Re: [R] Stacking matrix columns
>
> Stacking columns of a matrix is a standard operation in multilinear
> algebra, usually written as the operator vec().
> I checked to see if there is an R package that deals with multilinear
> algebra. I found rTensor, which has a function vec().
> So, yet another way to accomplish what you want would be:
>
> > library(rTensor)
> > vec(as.tensor(x))
>
> Eric
>
>
> On Sun, Aug 6, 2023 at 5:05 AM Bert Gunter  wrote:
> >
> > Or just dim(x) <- NULL.
> > (as matrices in base R are just vectors with a dim attribute stored in
> > column major order)
> >
> > ergo:
> >
> > > x
> >  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> > > x<- 1:20  ## a vector
> > > is.matrix(x)
> > [1] FALSE
> > > dim(x) <- c(5,4)
> > > is.matrix(x)
> > [1] TRUE
> > > attributes(x)
> > $dim
> > [1] 5 4
> >
> > > ## in painful and unnecessary detail as dim() should be used instead
> > > attr(x, "dim") <- NULL
> > > is.matrix(x)
> > [1] FALSE
> > > x
> >  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> >
> > ## well, you get it...
> >
> > -- Bert
> >
> > On Sat, Aug 5, 2023 at 5:21 PM Iris Simmons  wrote:
> > >
> > > You could also do
> > 

Re: [R] Stacking matrix columns

2023-08-06 Thread avi.e.gross
Eric,

I am not sure your solution is particularly economical albeit it works for 
arbitrary arrays of any dimension, presumably. But it seems to involve 
converting a matrix to a tensor just to undo it back to a vector. Other 
solutions offered here, simply manipulate the dim attribute of the data 
structure.

Of course, the OP may have uses in mind which the package might make easier. We 
often get fairly specific questions here without the additional context that 
may help guide a better answer. 

-Original Message-
From: R-help  On Behalf Of Eric Berger
Sent: Sunday, August 6, 2023 3:34 AM
To: Bert Gunter 
Cc: R-help Mailing List ; Steven Yen 
Subject: Re: [R] Stacking matrix columns

Stacking columns of a matrix is a standard operation in multilinear
algebra, usually written as the operator vec().
I checked to see if there is an R package that deals with multilinear
algebra. I found rTensor, which has a function vec().
So, yet another way to accomplish what you want would be:

> library(rTensor)
> vec(as.tensor(x))

Eric


On Sun, Aug 6, 2023 at 5:05 AM Bert Gunter  wrote:
>
> Or just dim(x) <- NULL.
> (as matrices in base R are just vectors with a dim attribute stored in
> column major order)
>
> ergo:
>
> > x
>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> > x<- 1:20  ## a vector
> > is.matrix(x)
> [1] FALSE
> > dim(x) <- c(5,4)
> > is.matrix(x)
> [1] TRUE
> > attributes(x)
> $dim
> [1] 5 4
>
> > ## in painful and unnecessary detail as dim() should be used instead
> > attr(x, "dim") <- NULL
> > is.matrix(x)
> [1] FALSE
> > x
>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
>
> ## well, you get it...
>
> -- Bert
>
> On Sat, Aug 5, 2023 at 5:21 PM Iris Simmons  wrote:
> >
> > You could also do
> >
> > dim(x) <- c(length(x), 1)
> >
> > On Sat, Aug 5, 2023, 20:12 Steven Yen  wrote:
> >
> > > I wish to stack columns of a matrix into one column. The following
> > > matrix command does it. Any other ways? Thanks.
> > >
> > >  > x<-matrix(1:20,5,4)
> > >  > x
> > >   [,1] [,2] [,3] [,4]
> > > [1,]16   11   16
> > > [2,]27   12   17
> > > [3,]38   13   18
> > > [4,]49   14   19
> > > [5,]5   10   15   20
> > >
> > >  > matrix(x,ncol=1)
> > >[,1]
> > >   [1,]1
> > >   [2,]2
> > >   [3,]3
> > >   [4,]4
> > >   [5,]5
> > >   [6,]6
> > >   [7,]7
> > >   [8,]8
> > >   [9,]9
> > > [10,]   10
> > > [11,]   11
> > > [12,]   12
> > > [13,]   13
> > > [14,]   14
> > > [15,]   15
> > > [16,]   16
> > > [17,]   17
> > > [18,]   18
> > > [19,]   19
> > > [20,]   20
> > >  >
> > >
> > > __
> > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible code.
> > >
> >
> > [[alternative HTML version deleted]]
> >
> > __
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Stacking matrix columns

2023-08-05 Thread avi.e.gross
Steve,

As Iris pointed out, some implementations of a matrix are actually of a vector 
with special qualities. There are sometimes choices whether to store it a row 
at a time or a column at a time.

In R, your data consisted of the integers from 1 to 20 and they clearly are 
stored a column at a time:

> x<-matrix(1:20,5,4)
> 
> x
 [,1] [,2] [,3] [,4]
[1,]16   11   16
[2,]27   12   17
[3,]38   13   18
[4,]49   14   19
[5,]5   10   15   20

Your method involved creating a second matrix/ But you could as easily ask for 
a vector that gives you back the info in that order:

> as.vector(x)
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

Iris mentioned the fact that the version of built-in matrices you are using in 
R is actually a vector with an attribute:

> attributes(x)
$dim
[1] 5 4

This means that when you try to work with the matrix and ask for x[3,4] it does 
a calculation on where in the vector to look and since you want column 4, it 
means three columns with five things are ahead of it meaning 15 items. Then it 
sees you want the third item in this fourth column so it adds 3 to make 18 and 
looks at the 18th item in the vector:

> x<-matrix(data=1:20,nrow=5, ncol=4, byrow=TRUE)
> x
 [,1] [,2] [,3] [,4]
[1,]1234
[2,]5678
[3,]9   10   11   12
[4,]   13   14   15   16
[5,]   17   18   19   20

This is now stored differently:

> x[18]
[1] 12

> as.vector(x)
 [1]  1  5  9 13 17  2  6 10 14 18  3  7 11 15 19  4  8 12 16 20

And just FYI, you can make multidimensional arrays using the array() function 
that as far as I know are just extensions of the same analysis with a dim 
attribute containg the dimensions.



> x[3,4]
[1] 18
> x[18]
[1] 18
> as.vector(x)[18]
[1] 18

The latter two approaches get to look at the pure vector implementation. So 
your data is already in the order you want and it is just a question of how to 
access it.

Rather than copying the matrix, if not needed for another purpose, changing the 
attribute lets you reshape it in many ways, including a 2x10 matrix but also 
into a vertical or horizontal matrix of 1x20 or 20x1. Of course, if you will be 
passing the object around to places that expect a vector, it would be safer. 
Inn this case, it does seem like it becomes seen like any vector.

> length(x)
[1] 20
> str(x)
 int [1:5, 1:4] 1 2 3 4 5 6 7 8 9 10 ...

> attr(x, "dim") <- c(length(x), 1)
> dim(x)
[1] 20  1
> str(x)
 int [1:20, 1] 1 2 3 4 5 6 7 8 9 10 ...
> typeof(x)
[1] "integer"

But please note my comments above do not apply the same if you make the matric 
by rows as in:



-Original Message-
From: R-help  On Behalf Of Steven Yen
Sent: Saturday, August 5, 2023 8:11 PM
To: R-help Mailing List 
Subject: [R] Stacking matrix columns

I wish to stack columns of a matrix into one column. The following 
matrix command does it. Any other ways? Thanks.

 > x<-matrix(1:20,5,4)
 > x
  [,1] [,2] [,3] [,4]
[1,]16   11   16
[2,]27   12   17
[3,]38   13   18
[4,]49   14   19
[5,]5   10   15   20

 > matrix(x,ncol=1)
   [,1]
  [1,]1
  [2,]2
  [3,]3
  [4,]4
  [5,]5
  [6,]6
  [7,]7
  [8,]8
  [9,]9
[10,]   10
[11,]   11
[12,]   12
[13,]   13
[14,]   14
[15,]   15
[16,]   16
[17,]   17
[18,]   18
[19,]   19
[20,]   20
 >

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiply

2023-08-04 Thread avi.e.gross
[See the end for an interesting twist on moving a column to row.names.]

Yes, many ways to do things exist but it may make sense to ask for what the 
user/OP really wants. Sometimes the effort to make a brief example obscures 
things.

Was there actually any need to read in a file containing comma-separated 
values? Did it have to include one, or perhaps more, non-numeric columns? Was 
the ID column guaranteed to exist and be named ID or just be the first column? 
Is any error checking needed?

So assuming we read in two data structures into data.frames called A and B to 
be unoriginal. A wider approach might be to split A into A.text and A.numeric 
by checking which columns test as numeric (meaning is.numeric() return TRUE) 
and which can be text or logical or anything else. Note that complex is not 
considered numeric if that matters.

You might then count the number of rows and columns of A.numeric and set A.text 
Aside for now. 

You then get B and it seems you can throw away any non-numeric columns. The 
resulting numeric columns can be in B.numeric.
The number of rows and columns of what remains need to conform to the 
dimensions of A in the sense that if a is M rows by N columns, then B must be N 
x anything and the result of the multiplication is M x anything. If the 
condition is not met, you need to fail gracefully.

You may also want to decide what to do with data that came with things like NA 
content. Or, if your design allows content that can be converted to numeric, 
check and make any conversions.

Then you can convert the data into matrices, perform the matrix multiplication 
and optionally restore any column names you want along with any of the 
non-numeric columns you held back and note there could possible be more than 
one. Obviously, getting multiple ones in the original order is harder.

I am not sure if you are interested in another tweak. For some purposes, 
rownames() and colnames() make sense instead of additional rows or columns.

A line of code like this applied to a data.frame will copy your id column as a 
rowname then remove the actual ID column.

> dat1
  ID  x  y  z
1  A 10 34 12
2  B 25 42 18
3  C 14 20  8
> rownames(dat1) <- dat1$ID
> dat1$ID <- NULL
> dat1
x  y  z
 A 10 34 12
 B 25 42 18
 C 14 20  8

> result <- as.matrix(dat1) %*% mat2
> result
   weight weiht2
 A  24.58  30.18
 B  35.59  44.09
 C  17.10  21.30

There are functions (perhaps in packages, with names like column_to_rownames() 
in the tidyverse packages that can be used and you can also reverse the process.

Just some thoughts. The point is that it is often wiser to not mix text with 
numeric and rownames and colnames provide a way to include the text for the 
purposes you want and not for others where they are in the way. And here is an 
oddity I found:

> dat2
  ID weight weiht2
1  A   0.25   0.35
2  B   0.42   0.52
3  C   0.65   0.75
> temp <- data.frame(dat2, row.names=1)
> temp
   weight weiht2
 A   0.25   0.35
 B   0.42   0.52
 C   0.65   0.75

As shown, when you create a data.frame you can move any column by NUMBER into 
rownames. So consider your early code and note read.table supports the option 
row.names=1 and passes it on so in one step:

> ?read.table
> dat1 <-read.table(text="ID, x, y, z
+ A, 10,  34, 12
+ B, 25,  42, 18
+ C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F, row.names=1)
> dat1
   x  y  z
A 10 34 12
B 25 42 18
C 14 20  8

You can make it a matrix immediately:

mat1 <- as.matrix(read.table(
  text = text,
  sep = ",",
  header = TRUE,
  stringsAsFactors = F,
  row.names = 1
))


-Original Message-
From: Val  
Sent: Friday, August 4, 2023 2:03 PM
To: avi.e.gr...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] Multiply

Thank you,  Avi and Ivan.  Worked for this particular Example.

Yes, I am looking for something with a more general purpose.
I think Ivan's suggestion works for this.

multiplication=as.matrix(dat1[,-1]) %*% as.matrix(dat2[match(dat1[,1],
dat2[,1]),-1])
Res=data.frame(ID = dat1[,1], Index = multiplication)

On Fri, Aug 4, 2023 at 10:59 AM  wrote:
>
> Val,
>
> A data.frame is not quite the same thing as a matrix.
>
> But as long as everything is numeric, you can convert both data.frames to
> matrices, perform the computations needed and, if you want, convert it back
> into a data.frame.
>
> BUT it must be all numeric and you violate that requirement by having a
> character column for ID. You need to eliminate that temporarily:
>
> dat1 <- read.table(text="ID, x, y, z
>  A, 10,  34, 12
>  B, 25,  42, 18
>  C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)
>
> mat1 <- as.matrix(dat1[,2:4])
>
> The result is:
>
> > mat1
>   x  y  z
> [1,] 10 34 12
> [2,] 25 42 18
> [3,] 14 20  8
>
> Now do the second matrix, perhaps in one step:
>
> mat2 <- as.matrix(read.table(text="ID, weight, weiht2
>  A,  0.25, 0.35
>  B,  0.42, 0.52
>  C,  0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F)[,2:3])
>
>
> Do note some people use read.csv() instead 

Re: [R] Multiply

2023-08-04 Thread avi.e.gross
Val,

A data.frame is not quite the same thing as a matrix.

But as long as everything is numeric, you can convert both data.frames to
matrices, perform the computations needed and, if you want, convert it back
into a data.frame.

BUT it must be all numeric and you violate that requirement by having a
character column for ID. You need to eliminate that temporarily:

dat1 <- read.table(text="ID, x, y, z
 A, 10,  34, 12
 B, 25,  42, 18
 C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)

mat1 <- as.matrix(dat1[,2:4])

The result is:

> mat1
  x  y  z
[1,] 10 34 12
[2,] 25 42 18
[3,] 14 20  8

Now do the second matrix, perhaps in one step:

mat2 <- as.matrix(read.table(text="ID, weight, weiht2
 A,  0.25, 0.35
 B,  0.42, 0.52
 C,  0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F)[,2:3])


Do note some people use read.csv() instead of read.table, albeit it simply
calls read.table after setting some parameters like the comma.

The result is what you asked for, including spelling weight wrong once.:

> mat2
 weight weiht2
[1,]   0.25   0.35
[2,]   0.42   0.52
[3,]   0.65   0.75

Now you wanted to multiply as in matrix multiplication.

> mat1 %*% mat2
 weight weiht2
[1,]  24.58  30.18
[2,]  35.59  44.09
[3,]  17.10  21.30

Of course, you wanted different names for the columns and you can do that
easily enough:

result <- mat1 %*% mat2

colnames(result) <- c("index1", "index2")


But this is missing something:

> result
 index1 index2
[1,]  24.58  30.18
[2,]  35.59  44.09
[3,]  17.10  21.30

Do you want a column of ID numbers on the left? If numeric, you can keep it
in a matrix in one of many ways but if you want to go back to the data.frame
format and re-use the ID numbers, there are again MANY ways. But note mixing
characters and numbers can inadvertently convert everything to characters.

Here is one solution. Not the only one nor the best one but reasonable:

recombined <- data.frame(index=dat1$ID, 
 index1=result[,1], 
 index2=result[,2])


> recombined
  index index1 index2
1 A  24.58  30.18
2 B  35.59  44.09
3 C  17.10  21.30

If for some reason you need a more general purpose way to do this for
arbitrary conformant matrices, you can write a function that does this in a
more general way but perhaps a better idea might be a way to store your
matrices in files in a way that can be read back in directly or to not
include indices as character columns but as row names.






-Original Message-
From: R-help  On Behalf Of Val
Sent: Friday, August 4, 2023 10:54 AM
To: r-help@R-project.org (r-help@r-project.org) 
Subject: [R] Multiply

Hi all,

I want to multiply two  data frames as shown below,

dat1 <-read.table(text="ID, x, y, z
 A, 10,  34, 12
 B, 25,  42, 18
 C, 14,  20,  8 ",sep=",",header=TRUE,stringsAsFactors=F)

dat2 <-read.table(text="ID, weight, weiht2
 A,  0.25, 0.35
 B,  0.42, 0.52
 C,  0.65, 0.75",sep=",",header=TRUE,stringsAsFactors=F)

Desired result

ID  Index1 Index2
1  A 24.58 30.18
2  B 35.59 44.09
3  C 17.10 21.30

Here is my attempt,  but did not work

dat3 <- data.frame(ID = dat1[,1], Index = apply(dat1[,-1], 1, FUN=
function(x) {sum(x*dat2[,2:ncol(dat2)])} ), stringsAsFactors=F)


Any help?

Thank you,

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Off-topic: ChatGPT Code Interpreter

2023-07-18 Thread avi.e.gross
Hadley,

Thanks and I know many such things exist. I simply found it interesting that 
what was mentioned seemed simpler as just being a converter of text to make a 
bitmap type image. Now if I want a simulated image of a cat riding a motorcycle 
while holding an Esperanto Flag, sure, I would not easily do it directly or 
even in a standard programming language. 

Of course I may have misunderstood "text" to mean the actual text, as compared 
to a somewhat natural language description using text. R does not easily do 
that.

Then again, there are ways to connect your R program to the Wolfram Knowledge 
base to pass through natural language queries ...

Avi

-Original Message-
From: Hadley Wickham  
Sent: Tuesday, July 18, 2023 6:10 PM
To: avi.e.gr...@gmail.com
Cc: Jim Lemon ; Ebert,Timothy Aaron ; 
R-help 
Subject: Re: [R] Off-topic: ChatGPT Code Interpreter

> I am not sure what your example means but text to image conversion can be
> done quite easily in many programming environments and does not need an AI
> unless you are using it to hunt for info.  I mean you can open up many Paint
> or Photo programs and look at the menus and often one allows you to write
> using whatever font/size/color/background you want to add a layer on the
> image. There are plenty of free resources on-line that I sometimes use to
> write something in a large fiery font or whatever and when I get the result
> I want, I save it as graphics.

I would recommend you try out one of the many text-to-image AI
services like https://www.midjourney.com/ or
https://openai.com/dall-e-2. These services are much more
sophisticated than you might imagine.

Hadley

-- 
http://hadley.nz

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Off-topic: ChatGPT Code Interpreter

2023-07-18 Thread avi.e.gross
Jim,

I am not sure what your example means but text to image conversion can be
done quite easily in many programming environments and does not need an AI
unless you are using it to hunt for info.  I mean you can open up many Paint
or Photo programs and look at the menus and often one allows you to write
using whatever font/size/color/background you want to add a layer on the
image. There are plenty of free resources on-line that I sometimes use to
write something in a large fiery font or whatever and when I get the result
I want, I save it as graphics.

If you mean that you found something other than a human who would listen to
you and maybe ask a few questions and then do it for you, good for you.
Since most people are not programmers, there is plenty of room for that kind
of thing.

Although R is not particularly designed to do what you are saying, a quick
search indicates plenty of packages using R for this kind of thing.

What gets me is an AI can do one of several things. It may give you a result
and you take it or leave it. Or, it can look around at the internet and
knowledge bases and throw a program at you, perhaps in R, and you would then
need to validate if it makes sense given your knowledge about R. If it gave
you a program in a language you did not know, would you blindly try using
it?

To be fair, many years ago the barrier was higher. To figure out what a
function did, or even find such a function, often meant reading through
copious amounts of reference books, or lots of existing code looking for an
example of such use, or ask someone who might have to do the same. Often you
ended up writing code using other more primitive commands  that did what you
wanted. Obviously internet searches and other tools and the vast number of
people who are sharing this kind of info, make this easier. In some ways, an
AI can do much of the searching for you but with results that may be
surprising.



-Original Message-
From: R-help  On Behalf Of Jim Lemon
Sent: Monday, July 17, 2023 7:24 PM
To: Ebert,Timothy Aaron 
Cc: R-help 
Subject: Re: [R] Off-topic: ChatGPT Code Interpreter

I haven't really focused on the statistical capabilities of AI, that
marriage of massive memory and associative learning. I am impressed by
its ability to perform text-to-image conversion, something I have
recently needed. My artistic ability is that of the average three year
old, yet I can employ AI to translate my mental images into realistic
pictures. Perhaps we really are learning about how we think. As far as
I am aware, it just does what we tell it to do. Like other tools, it
is as good or bad as the user.

Jim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Off-topic: ChatGPT Code Interpreter

2023-07-18 Thread avi.e.gross
Just to bring it back to R, I want to point out that what many R programmers do 
is not that different. If you develop some skills at analyzing some kinds of 
data and have a sort of toolchest based on past work, then a new project along 
similar lines may move very quickly.  After a while, you may put out a package 
that allows even novices to do a decent job as the code handles so many of the 
details or provides default options you can over-ride.

So if an AI has the ability to use the same tools you provided, and applies 
them properly, is there much difference?

I have seen EXPERTS do horribly when yanked just outside their field of 
expertise. They may use a tool outside the bounds it was designed for, as an 
example, or pick an unfamiliar model that is not applicable or optimal.

The problem with AI is compounded as some kinds of peer feedback may be missing 
while other things have not yet been programmed well that allow some 
selectivity and so on.

But if a simple regression often works well enough, why can't an AI use it too? 

I will say a lot of what people do in R is cleaning the data and getting it 
into the right form. Sometimes humans struggle to detect if say two names are 
the same and someone made a spelling mistake. And humans will still get things 
like that wrong unless they can go back and consult external resources to see 
if say the school has two teachers with similar names or whether all the 
students named should be combined under the same teacher for some purposes.

I will point out there is no good reason to think an Ai would necessarily use R 
or use base R rather than some of the functions in the tidyverse. I am studying 
Wolfram and an amazing amount of work can be done in one liners that rely on 
many thousands of built-in functionality. So why would you want an AI to write 
assembly code or do it in a language like C but rather pick a language well 
suited for whatever task.


-Original Message-
From: R-help  On Behalf Of Spencer Graves
Sent: Monday, July 17, 2023 3:13 PM
To: Bert Gunter ; R-help 
Subject: Re: [R] Off-topic: ChatGPT Code Interpreter

  I don't know about ChatGPT, but Daniel Kahneman won the 2002 Nobel 
Memorial Prize in Economics,[1] even though he's not an economist, for 
his leadership in creating a new subfield in the intersection of human 
psychology and economics now called "behavioral economics".[2] Then in 
2009 Kahneman and Gary Klein published an article on, "Conditions for 
intuitive expertise: a failure to disagree", which concluded that expert 
intuition is learned from frequent, rapid, high-quality feedback. 
People you do not learn from frequent, rapid, high-quality feedback can 
be beaten by simple heuristics developed by intelligent lay people.[3] 
That includes most professions, which Kahneman Sibony and Sunstein call 
"respect-experts".


  Kahneman Sibony and Sunstein further report that with a little data, 
a regression model can outperform a simple heuristic, and with massive 
amounts of data, artificial intelligence can outperform regression 
models.[4]


  An extreme but real example of current reality was describe in an 
article on "Asylum roulette":  With asylum judges in the same 
jurisdiction with cases assigned at random, one judge approved 5 percent 
of cases while another approved 88 percent.[5] However, virtually all 
"respect-experts" are influenced in their judgements by time of day and 
whether their favorite sports team won or lost the previous day.  That 
level of noise can be reduced dramatically by use of appropriate 
artificial intelligence.


  Comments?
  Spencer Graves


[1]


https://en.wikipedia.org/wiki/Daniel_Kahneman


[2]


https://en.wikipedia.org/wiki/Behavioral_economics


[3]


https://www.researchgate.net/publication/26798603_Conditions_for_Intuitive_Expertise_A_Failure_to_Disagree


[4]


Daniel Kahneman; Olivier Sibony; Cass Sunstein (2021). Noise: A Flaw in 
Human Judgment (Little, Brown and Company).


[5]


https://en.wikipedia.org/wiki/Refugee_roulette


On 7/17/23 1:46 PM, Bert Gunter wrote:
> This is an **off-topic** post about the subject line, that I thought
> might be of interest to the R Community. I hope this does not offend
> anyone.
> 
> The widely known ChatGPT software now offers what  is called a "Code
> Interpreter," that, among other things, purports to do "data
> analysis."  (Search for articles with details.) One quote, from the
> (online) NY Times, is:
> 
> "Arvind Narayanan, a professor of computer science at Princeton
> University, cautioned that people should not become overly reliant on
> code interpreter for data analysis as A.I. still produces inaccurate
> results and misinformation.
> 
> 'Appropriate data analysis requires just a lot of critical thinking
> about the data,” he said.' "
> 
> Amen. ... Maybe.
> 
> (As this is off-topic, if you wish to reply to me, probably better to
> do so privately).
> 
> Cheers to all,
> 

Re: [R] Variable and value labels

2023-07-13 Thread avi.e.gross
Anupam,
 
Thanks for explaining you are talking about factors.
 
I see my friend Adrian has pointed out reasons you may want to use a package he 
built called “declared” but my answer will be within the regular R domain as 
you asked.
 
You should read up a bit on factors in a book, not just blindly searching. A 
factor, loosely speaking started off as a mixture of the original values as 
characters and a numbered vector referencing them. When you want the original 
labels, you can see them and when you want the integer indices you can see them.
 
> greek <- c("Alpha", "Beta", "Gamma", "Alphabeta", "Beta", "Alpha", "Alpha 
> Male")
> greek
[1] "Alpha"  "Beta"   "Gamma"  "Alphabeta"  "Beta"   "Alpha"
 
[7] "Alpha Male"
> 
> facgreek <- factor(greek)
> facgreek
[1] Alpha  Beta   Gamma  Alphabeta  Beta   Alpha  Alpha Male
Levels: Alpha Alpha Male Alphabeta Beta Gamma
> 
> levels(facgreek)
[1] "Alpha"  "Alpha Male" "Alphabeta"  "Beta"   "Gamma" 
> labels(facgreek)
[1] "1" "2" "3" "4" "5" "6" "7"
> str(facgreek)
Factor w/ 5 levels "Alpha","Alpha Male",..: 1 4 5 3 4 1 2
> as.numeric(facgreek)
[1] 1 4 5 3 4 1 2
 
You can play with all kinds of things in base R such as getting the nth item as 
a label, or finding all the items currently mapped to the key of 1 and so on.
 
> as.numeric(facgreek)
[1] 1 4 5 3 4 1 2
> facgreek[5]
[1] Beta
Levels: Alpha Alpha Male Alphabeta Beta Gamma
> facgreek[as.numeric(facgreek) == 1]
[1] Alpha Alpha
 
 
> as.character(facgreek)
[1] "Alpha"  "Beta"   "Gamma"  "Alphabeta"  "Beta"   "Alpha"
 
[7] "Alpha Male"
> as.integer(facgreek)
[1] 1 4 5 3 4 1 2
 
 
Now when plotting, it depends on what you use. Base R comes with the usual plot 
functions as well as others like lattice and there are packages like ggplot2. 
Some of these may even convert a vector into a factor internally. In some 
cases, you may want to tell it to use a factor in a certain way, such as by 
re-ordering the order of the levels of a factor so the display is graphed in 
that order.
 
And as I answered another person, some graphing functions allow you to do other 
kinds of labeling on top of the plot that may meet your needs.
 
I may be the opposite of you as I did not use R much before 2003. 
 
But seriously, we often are like new users even when we once knew a bit about 
something. The R from before 2003 (or was it S?) has evolved quite a bit. If, 
like me, you have used lots of other computer languages in between, then they 
often blend in your mind as they have overlapping paradigms and methods and of 
course quirks.
 
And, of course, I sympathize with adjusting from other environments designed 
for somewhat more specific purposes like Stata as R is more of a general 
purpose programming language. Often people start with those others and then 
come to R because they need to be able to do more or just fine-tune things or …
 
 
 
From: Anupam Tyagi  
Sent: Thursday, July 13, 2023 2:51 AM
To: avi.e.gr...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] Variable and value labels
 
Thanks, Avi. By labels I mean human readable descriptions of variables and 
values of factor variables. In a plot, I want labels to be used for labelling 
axes on which a factor is plotted, and variable labels for axes 
names/descriptions in a plot. I may have borrowed the terminology of variable 
and value labels from Stata software, which I use.
 
I use a lot of packages. So, I have nothing against packages. But for 
labelling, I sometimes worry that I may get tied to a package for something as 
basic as assigning labels, and some function/packages may not pick up the 
labels correctly/well when plotting or displaying results. Maybe I am worried 
for nothing.
 
I have not used R much after 2003. In the past few months I have begun to use R 
again with R Studio, mostly for plotting and visualization of data. So, you can 
think of me as a new user.
 
On Thu, 13 Jul 2023 at 00:14, mailto:avi.e.gr...@gmail.com> > wrote:
Anupam,

Your question, even after looking at other messages, remains a bit unclear.

What do you mean by "labels"? What you mean by variables and values and how
is that related to factors?

An example or two would be helpful so we can say more than PROBABLY.
Otherwise, you risk having many people here waste lots of time sending
answers to questions you did not ask.

And why an insistence on not using packages? If you are doing something for
a class, sure, you may need to use the basics you were supposedly taught in
class or a textbook. Otherwise, a good set of packages makes code much
easier to write and often more reliable. Realistically, quite a bit of what
some call base R is actually packages deemed useful enough to be included at
startup and that can change.

If you are new to R, note you can attach arbitrary attributes to a variable
and you can have things like named lists where some or all the items have
names as an attribute.

Factors are part 

[R] Just to you

2023-07-12 Thread avi.e.gross
John,

I am a tad puzzled at why your code does not work so I tried replicating it.

Let me say you are not plotting what you think. When you plot points using
characters, it LOOKS like it did something but not really. It labels four
equally apart lines (when your data is not linear) and you are getting
nosense. But when you try for lines, using otherwise valid code, it fails.

Based on your earlier post, I sort of understood what you did but find it
roundabout and not necessary. And you made all columns of type character!

You had two perfectly good vectors of type character and floating point. You
eventually wanted a data.frame or the like. I assume your code is an example
of something more complex because normally code like this works fine:

> temp <- data.frame(Time=Time, Median=Medians)
> temp
Time Median
1 Age.25 128.25
2 Age.35 148.75
3 Age.45 158.50
4 Age.55 168.75

Alternatively, these two lines let you make a data.frame with default names
and rename it, skipping the matrix part as that nonsense makes all the
columns character and you need floating point for a graph!

> temp <- data.frame(Time, Medians)
> temp
Time Medians
1 Age.25  128.25
2 Age.35  148.75
3 Age.45  158.50
4 Age.55  168.75
> colnames(temp) <- c("Newname1", "Newname2")
> temp
  Newname1 Newname2
1   Age.25   128.25
2   Age.35   148.75
3   Age.45   158.50
4   Age.55   168.75

Now in your code using ggplot, as stated above it only looks like it works.
Using my temp, the points are where they belong. Yes, it breaks when adding
lines because the code is trying to group things as one axis is categorical.
One solution is to tell it not to group:

This works for me:

ggplot(temp,aes(x=Time,y=Median))+
+ geom_point()+
+ geom_line(aes(group=NA))

I cannot attach a graph to messages on this forum but I suggest you modify
your code to include this. Either do not use the matrix intermediate and
keep your numbers as numbers, or convert the darn column back to numeric.

And for now, try my ungrouping.

Alternate suggestion, don't graph as above. Consider three columns and graph
numbers versus numbers adding a label wherever you want:

library(ggplot2)

Time <- c(25, 35, 45, 55)
Timelabels <- paste("age.", Time, sep="")
Medians<-c(128.25,148.75,158.5,168.75)
mydata=data.frame(Time=Time, Median=Medians)
rownames(mydata) <- Timelabels

ggplot(data=mydata, aes(x=Time, y=Medians)) + geom_line() + geom_point()
+geom_text(label=rownames(mydata))

ggplot(data=mydata, aes(x=Time, y=Medians)) +
  geom_line() + 
  geom_point(size=3, color="red") +
  geom_label(label=rownames(mydata),alpha=0.4)

There are likely better ways but my point is it may be best to plot numbers
against numbers.













-Original Message-
From: R-help  On Behalf Of Sorkin, John
Sent: Wednesday, July 12, 2023 9:17 PM
To: r-help@r-project.org (r-help@r-project.org) 
Subject: [R] ggplot: Can plot graphs with points, can't plot graph with
points and line

I am trying to plot four points, and join the points with lines. I can plot
the points, but I can't plot the points and the line.
I hope someone can help my with my ggplot code.

# load ggplot2
if(!require(ggplot2)){install.packages("ggplot2")}
library(ggplot2)

# Create data
Time   <- c("Age.25","Age.35","Age.45","Age.55")
Medians<-c(128.25,148.75,158.5,168.75)
themedians <- matrix(data=cbind(Time,Medians),nrow=4,ncol=2)
dimnames(themedians) <- list(NULL,c("Time","Median"))
# Convert to dataframe the data format used by ggplot
themedians <- data.frame(themedians)
themedians

# This plot works
ggplot(themedians,aes(x=Time,y=Median))+
  geom_point()
# This plot does not work!
ggplot(themedians,aes(x=Time,y=Median))+
  geom_point()+
  geom_line()

Thank you,
John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Variable and value labels

2023-07-12 Thread avi.e.gross
Anupam,

Your question, even after looking at other messages, remains a bit unclear.

What do you mean by "labels"? What you mean by variables and values and how
is that related to factors?

An example or two would be helpful so we can say more than PROBABLY.
Otherwise, you risk having many people here waste lots of time sending
answers to questions you did not ask.

And why an insistence on not using packages? If you are doing something for
a class, sure, you may need to use the basics you were supposedly taught in
class or a textbook. Otherwise, a good set of packages makes code much
easier to write and often more reliable. Realistically, quite a bit of what
some call base R is actually packages deemed useful enough to be included at
startup and that can change.

If you are new to R, note you can attach arbitrary attributes to a variable
and you can have things like named lists where some or all the items have
names as an attribute.

Factors are part of base R and are a completely different concept. You can
use base R to get or set the base levels of a factor and many other things
and there are advantages sometimes in using a vector in factor mode than
plain but also sometimes disadvantages. 

If you ask a more specific and properly explained question, maybe we can
help you.

Specifically, please tell us how you plan on using your labels. As an
example, if I make a named list like this:

mylist <- list(pi=3.14, e=2.7, 666)

then I can access all elements as in mylist[[2]] without a name but
mylist$pi lets me access that item by name and mylist[["e"]] and I can also
change the current values similarly. But without explaining what you want,
my explanation likely is not what you need.

But do note that even if you do not USE a package, you can sometimes use it
indirectly by examining the code for a function you like. If it is primarily
written in R, you may see how it does something and take a part of the code
and use it yourself.



-Original Message-
From: R-help  On Behalf Of Anupam Tyagi
Sent: Tuesday, July 11, 2023 11:49 PM
To: r-help mailing list 
Subject: [R] Variable and value labels

Hello,

is there an easy way to do variable and value labels (for factor variables)
in base-R, without using a package. If not, what is an easy and good way to
do labels, using an add-on package.

-- 
Anupam.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Create a variable lenght string that can be used in a dimnames statement

2023-07-04 Thread avi.e.gross
Interesting to read all the answers. Personally, I was a bit irked to see
that using a combination of assignments using rownames() and colnames() did
not work as one canceled what the other had done.

But it turns out if we listed to what John really wanted versus what he said
he wanted, then a fairly simple parameterized answer is to add the row and
column names when creating the matrix as in:

# Create a matrix of size M by N rows, initialiazed to NA
# and also add row names that look like row1, row2, ... rowM
# as well as column names that look like col1, col2, ... colN
# Set the parameters and the rest is a one-liner, wrapped a bit
# for legibility:
M <- 2
N <- 4
rowpref <- "row"
colpref <- "col"

myvalues <- matrix(data=NA, 
   nrow=M, 
   ncol=N, 
   dimnames=list(rows=paste("row", seq(M), sep=""), 
 cols=paste("col", seq(N), sep="")))

The resulting value is:

> myvalues
  cols
rows   col1 col2 col3 col4
  row1   NA   NA   NA   NA
  row2   NA   NA   NA   NA



-Original Message-
From: R-help  On Behalf Of Sorkin, John
Sent: Tuesday, July 4, 2023 12:17 AM
To: Rolf Turner ; Bert Gunter

Cc: r-help@r-project.org (r-help@r-project.org) ;
Achim Zeileis 
Subject: Re: [R] Create a variable lenght string that can be used in a
dimnames statement

My life is complete.
I have inspired a fortune!
John


From: Rolf Turner 
Sent: Monday, July 3, 2023 6:34 PM
To: Bert Gunter
Cc: Sorkin, John; r-help@r-project.org (r-help@r-project.org); Achim Zeileis
Subject: Re: [R]  Create a variable lenght string that can be used in a
dimnames statement


On Mon, 3 Jul 2023 13:40:41 -0700
Bert Gunter  wrote:

> I am not going to try to sort out your confusion, as others have
> already tried and failed.



Fortune nomination!!!

cheers,

Rolf Turner

--
Honorary Research Fellow
Department of Statistics
University of Auckland
Stats. Dep't. (secretaries) phone:
 +64-9-373-7599 ext. 89622
Home phone: +64-9-480-4619

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Help with regex replacements

2023-06-27 Thread avi.e.gross
Chris,

Consider breaking up your task into multiple passes.

And do them in whatever order preserves what you need.

First, are you talking about brackets as in square brackets, or as in your 
example, parentheses?

If you are sure you have no nested brackets, your requirement seems to be that 
anything matching [ stuff ] be replaced with nothing. Or if using parentheses, 
something similar.

Your issue here is both sets of symbols are special so you must escape them so 
they are seen as part of the pattern and not the instructions.

The idea would be to pass through the text once and match all instances on a 
line and then replace with nothing or whatever is needed. But there is no 
guarantee some of your constructs will be on the same line completely so be 
wary.

 

-Original Message-
From: R-help  On Behalf Of Chris Evans via R-help
Sent: Tuesday, June 27, 2023 1:16 PM
To: r-help@r-project.org
Subject: [R] Help with regex replacements

I am sure this is easy for people who are good at regexps but I'm 
failing with it.  The situation is that I have hundreds of lines of 
Ukrainian translations of some English. They contain things like this:

1"Я досяг того, чого хотів"2"Мені вдалося зробити бажане"3"Я досяг 
(досягла) того, чого хотів (хотіла)"4"Я досяг(-ла) речей, яких хотілося 
досягти"5"Я досяг/ла того, чого хотів/ла"6"Я досяг\\досягла того, чого 
прагнув\\прагнула."7"Я досягнув(ла) того, чого хотів(ла)"

Using dput():

tmp <- structure(list(Text = c("Я досяг того, чого хотів", "Мені вдалося 
зробити бажане", "Я досяг (досягла) того, чого хотів (хотіла)", "Я 
досяг(-ла) речей, яких хотілося досягти", "Я досяг/ла того, чого 
хотів/ла", "Я досяг\\досягла того, чого прагнув\\прагнула", "Я 
досягнув(ла) того, чого хотів(ла)" )), row.names = c(NA, -7L), class = 
c("tbl_df", "tbl", "data.frame" )) Those show four different ways 
translators have handled gendered words: 1) Ignore them and (I'm 
guessing) only give the masculine 2) Give the feminine form of the word 
(or just the feminine suffix) in brackets 3) Give the feminine 
form/suffix prefixed by a forward slash 4) Give the feminine form/suffix 
prefixed by backslash (here a double backslash) I would like just to 
drop all these feminine gendered options. (Don't worry, they'll get back 
in later.) So I would like to replace 1) anything between brackets with 
nothing! 2) anything between a forward slash and the next space with 
nothing 3) anything between a backslash and the next space with nothing 
but preserving the rest of the text. I have been trying to achieve this 
using str_replace_all() but I am failing utterly. Here's a silly little 
example of my failures. This was just trying to get the text I wanted to 
replace (as I was trying to simplify the issues for my tired wetware): > 
tmp %>%+ as_tibble() %>% + rename(Text = value) %>% + mutate(Text = 
str_replace_all(Text, fixed("."), "")) %>% + filter(row_number() < 4) 
%>% + mutate(Text2 = str_replace(Text, "\\(.*\\)", "\\1")) Errorin 
`mutate()`:ℹIn argument: `Text2 = str_replace(Text, "\\(.*\\)", 
"\\1")`.Caused by error in `stri_replace_first_regex()`:!Trying to 
access the index that is out of bounds. (U_INDEX_OUTOFBOUNDS_ERROR) Run 
`rlang::last_trace()` to see where the error occurred. I have tried 
gurgling around the internet but am striking out so throwing myself on 
the list. Apologies if this is trivial but I'd hate to have to clean 
these hundreds of lines by hand though it's starting to look as if I'd 
achieve that faster by hand than I will by banging my ignorance of R 
regexp syntax on the problem. TIA, Chris

-- 
Chris Evans (he/him)
Visiting Professor, UDLA, Quito, Ecuador & Honorary Professor, 
University of Roehampton, London, UK.
Work web site: https://www.psyctc.org/psyctc/
CORE site: http://www.coresystemtrust.org.uk/
Personal site: https://www.psyctc.org/pelerinage2016/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiplying two vectors of the same size to give a third vector of the same size

2023-06-20 Thread avi.e.gross
I was rushing out Phil so let me amend what I wrote. As others noted, this is 
fairly beginner stuff. If you have more such questions, besides reading up, 
please consider sending questions to the Tutor mailing list where there is more 
patience. 

You wanted to change selected small values to 0.0.

So you do not want the code I supplied as illustration as  it removes small 
elements making a smaller  vector.

Assume this test:

A <- c(NA, 0.3, 0.6, NA, 0.9)
B <- c(NA, 0.2, 0.6, 0.9, 0.9)

C <- A * B
print(C)

The result at this point is:

[1]   NA 0.06 0.36   NA 0.81

As expected, anything with an NA in either vector will generate an NA in the 
componentwise multiplication. Note the second item at 0.06 is under your 
threshold of 0.1.

What you want is not this:

Result <- C[C < 0.1]
> Result
[1]   NA 0.06   NA

That threw away anything above the threshold.

What you want may be this:

C[C < 0.1] <- 0.0
print(C)

This returns 

[1]   NA 0.00 0.36   NA 0.81

Everything that is NA or at or above 0.1 is kept and anything below 0.1 is 
zeroed and kept.

Of course, if you do not want to keep the NA, that can trivially be removed:

C[!is.na(C)]
[1] 0.00 0.36 0.81





-Original Message-
From: Philip Rhoades  
Sent: Tuesday, June 20, 2023 1:04 PM
To: avi.e.gr...@gmail.com
Cc: r-help@r-project.org
Subject: Re: [R] Multiplying two vectors of the same size to give a third 
vector of the same size

avi,


On 2023-06-21 01:55, avi.e.gr...@gmail.com wrote:
> Phil,
> 
> What have you tried. This seems straightforward enough.
> 
> Could you clarify what you mean by NULL?


I guess in R in would just be an empty cell? - ie NOT a zero.


> In R, it is common to use NA or a more specific version of it.


Ah yes, that would be it I think.


> So assuming you have two vectors containing floats with some NA, then:
> 
> C <- A*B
> 
> Will give you the products one at a time if the lengths are the same. 
> NA
> times anything is NA.


Right - yes that works! - thanks!


> Your second condition is also simple as you want anything below a 
> threshold
> to be set to a fixes value.
> 
> Since you already have C, above, your condition of:
> 
> threshold <- 0.1
> C < threshold
> 
> The last line returns a Boolean vector you can use to index C to get 
> just
> the ones you select as TRUE and thus can change:
> 
> Result <- C[C < threshold]


Ah, I see . .


> And you can of course do all the above as a one-liner.


Yes.


> Is that what you wanted?


Exactly except I meant:

   Result <- C[C > threshold]

Thanks!

Phil.


> -Original Message-
> From: R-help  On Behalf Of Philip Rhoades 
> via
> R-help
> Sent: Tuesday, June 20, 2023 11:38 AM
> To: r-help@r-project.org
> Subject: [R] Multiplying two vectors of the same size to give a third 
> vector
> of the same size
> 
> People,
> 
> I am assuming that what I want to do is easier in R than say Ruby.
> 
> I want to do what the Subject says ie multiply the cells in the same
> position of two vectors (A and B) to give a result in the same position
> in a third vector (C) BUT:
> 
> - The values in the cells of A and B are floats between 0.0 and 1.0 or
> NULL
> 
> - If there is a NULL in the multiplication, then the result in the cell
> for C is also a NULL
> 
> - If there is a value less than (say) 0.01 in the multiplication, then
> the result in the cell for C is 0.0
> 
> Any suggestions appreciated!
> 
> Phil.
> 
> --
> Philip Rhoades
> 
> PO Box 896
> Cowra  NSW  2794
> Australia
> E-mail:  p...@pricom.com.au
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

-- 
Philip Rhoades

PO Box 896
Cowra  NSW  2794
Australia
E-mail:  p...@pricom.com.au

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiplying two vectors of the same size to give a third vector of the same size

2023-06-20 Thread avi.e.gross
Phil,

What have you tried. This seems straightforward enough.

Could you clarify what you mean by NULL?

In R, it is common to use NA or a more specific version of it.

So assuming you have two vectors containing floats with some NA, then:

C <- A*B

Will give you the products one at a time if the lengths are the same. NA
times anything is NA.

Your second condition is also simple as you want anything below a threshold
to be set to a fixes value.

Since you already have C, above, your condition of:

threshold <- 0.1
C < threshold

The last line returns a Boolean vector you can use to index C to get just
the ones you select as TRUE and thus can change:

Result <- C[C < threshold]

And you can of course do all the above as a one-liner.

Is that what you wanted?


-Original Message-
From: R-help  On Behalf Of Philip Rhoades via
R-help
Sent: Tuesday, June 20, 2023 11:38 AM
To: r-help@r-project.org
Subject: [R] Multiplying two vectors of the same size to give a third vector
of the same size

People,

I am assuming that what I want to do is easier in R than say Ruby.

I want to do what the Subject says ie multiply the cells in the same 
position of two vectors (A and B) to give a result in the same position 
in a third vector (C) BUT:

- The values in the cells of A and B are floats between 0.0 and 1.0 or 
NULL

- If there is a NULL in the multiplication, then the result in the cell 
for C is also a NULL

- If there is a value less than (say) 0.01 in the multiplication, then 
the result in the cell for C is 0.0

Any suggestions appreciated!

Phil.

-- 
Philip Rhoades

PO Box 896
Cowra  NSW  2794
Australia
E-mail:  p...@pricom.com.au

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with filling dataframe's column

2023-06-14 Thread avi.e.gross
Richard, it is indeed possible for different languages to choose different 
approaches.
 
If your point is that an R  named list can simulate a Python dictionary (or for 
that manner, a set) there is some validity to that. You can also use 
environments similarly.
 
Arguably there are differences including in things like what notations are 
built into the language. If you look the other way, Python chose to make lists 
a major feature which can hold any combination of things and can even be used 
to emulate a matrix with sub-lists and also had a tuple version that is similar 
but immutable and initially neglected something as simple as a vector 
containing just one kind of content. If you look at it now, many people simply 
load numpy (and often pandas) to get functionality that is faster and comes by 
default in R.
 
I think this discussion was about my (amended) offhand remark suggesting R 
factors stored plain text in a vector attached to the variable and the offset 
was the number stored in the main factor vector. If that changed to internally 
use something hashed like a dictionary, fine. I have often made data structures 
such as in your example to store named items but did not call it a dictionary 
but simply a named list. In one sense, the two map into each other but I could 
argue there remain differences. For example, you can use something immutable 
like a tuple as a key in python. 
 
This is not an argument about which language is better. Each has developed to 
fill ideas and has been extended and quite a few things can now be done in 
either one. Still, it can be interesting to combine the two inside RSTUDIO so 
each does some of what it may do better or faster or in a way you find more 
natural.
 
 
From: Richard O'Keefe  
Sent: Wednesday, June 14, 2023 10:34 PM
To: avi.e.gr...@gmail.com
Cc: Bert Gunter ; R-help@r-project.org
Subject: Re: [R] Problem with filling dataframe's column
 
Consider
 
  m <- list(foo=c(1,2),"B'ar"=as.matrix(1:4,2,2),"!*#"=c(FALSE,TRUE))
 
It is a collection of elements of different types/structures, accessible
via string keys (and also by position).  Entries can be added:
 
  m[["fred"]] <- 47
 
Entries can be removed:
 
  m[["!*#"]] <- NULL
 
How much more like a Python dictionary do you need it to be?
 
 
 
On Wed, 14 Jun 2023 at 11:25, mailto:avi.e.gr...@gmail.com> > wrote:
Bert,

I stand corrected. What I said may have once been true but apparently the 
implementation seems to have changed at some level.

I did not factor that in.

Nevertheless, whether you use an index as a key or as an offset into an 
attached vector of labels, it seems to work the same and I think my comment 
applies well enough that changing a few labels instead of scanning lots of 
entries can sometimes be a good think. As far as I can tell, external interface 
seem the same for now. 

One issue with R for a long time was how they did not do something more like a 
Python dictionary and it looks like …

ABOVE

From: Bert Gunter mailto:bgunter.4...@gmail.com> > 
Sent: Tuesday, June 13, 2023 6:15 PM
To: avi.e.gr...@gmail.com  
Cc: javad bayat mailto:j.bayat...@gmail.com> >; 
R-help@r-project.org  
Subject: Re: [R] Problem with filling dataframe's column

Below.


On Tue, Jun 13, 2023 at 2:18 PM mailto:avi.e.gr...@gmail.com>   > > wrote:
>
>  
> Javad,
>
> There may be nothing wrong with the methods people are showing you and if it 
> satisfied you, great.
>
> But I note you have lots of data in over a quarter million rows. If much of 
> the text data is redundant, and you want to simplify some operations such as 
> changing some of the values to others I multiple ways, have you done any 
> learning about an R feature very useful for dealing with categorical data 
> called "factors"?
>
> If you have a vector or a column in a data.frame that contains text, then it 
> can be replaced by a factor that often takes way less space as it stores a 
> sort of dictionary of all the unique values and just records numbers like 
> 1,2,3 to tell which one each item is.

-- This is false. It used to be true a **long time ago**, but R has for quite a 
while used hashing/global string tables to avoid this problem. See here 

  for details/references.
As a result, I think many would argue that working with strings *as strings,* 
not factors, if often a better default, though of course there are still 
situations where factors are useful (e.g. in ordering results by factor levels 
where the desired level order is not alphabetical).

**I would appreciate correction/ clarification if my claims are wrong or 
misleading! **

In any case, please do check such claims before making them on this list.

Cheers,
Bert



[[alternative HTML version deleted]]

__

Re: [R] Problem with filling dataframe's column

2023-06-13 Thread avi.e.gross
Bert,
 
I stand corrected. What I said may have once been true but apparently the 
implementation seems to have changed at some level.
 
I did not factor that in.
 
Nevertheless, whether you use an index as a key or as an offset into an 
attached vector of labels, it seems to work the same and I think my comment 
applies well enough that changing a few labels instead of scanning lots of 
entries can sometimes be a good think. As far as I can tell, external interface 
seem the same for now. 
 
One issue with R for a long time was how they did not do something more like a 
Python dictionary and it looks like …
 
ABOVE
 
From: Bert Gunter  
Sent: Tuesday, June 13, 2023 6:15 PM
To: avi.e.gr...@gmail.com
Cc: javad bayat ; R-help@r-project.org
Subject: Re: [R] Problem with filling dataframe's column
 
Below.


On Tue, Jun 13, 2023 at 2:18 PM mailto:avi.e.gr...@gmail.com> > wrote:
>
>  
> Javad,
>
> There may be nothing wrong with the methods people are showing you and if it 
> satisfied you, great.
>
> But I note you have lots of data in over a quarter million rows. If much of 
> the text data is redundant, and you want to simplify some operations such as 
> changing some of the values to others I multiple ways, have you done any 
> learning about an R feature very useful for dealing with categorical data 
> called "factors"?
>
> If you have a vector or a column in a data.frame that contains text, then it 
> can be replaced by a factor that often takes way less space as it stores a 
> sort of dictionary of all the unique values and just records numbers like 
> 1,2,3 to tell which one each item is.
 
-- This is false. It used to be true a **long time ago**, but R has for quite a 
while used hashing/global string tables to avoid this problem. See here 

  for details/references.
As a result, I think many would argue that working with strings *as strings,* 
not factors, if often a better default, though of course there are still 
situations where factors are useful (e.g. in ordering results by factor levels 
where the desired level order is not alphabetical).
 
**I would appreciate correction/ clarification if my claims are wrong or 
misleading! **
 
In any case, please do check such claims before making them on this list.
 
Cheers,
Bert
 
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with filling dataframe's column

2023-06-13 Thread avi.e.gross
 
Javad,
 
There may be nothing wrong with the methods people are showing you and if it 
satisfied you, great.
 
But I note you have lots of data in over a quarter million rows. If much of the 
text data is redundant, and you want to simplify some operations such as 
changing some of the values to others I multiple ways, have you done any 
learning about an R feature very useful for dealing with categorical data 
called "factors"?
 
If you have a vector or a column in a data.frame that contains text, then it 
can be replaced by a factor that often takes way less space as it stores a sort 
of dictionary of all the unique values and just records numbers like 1,2,3 to 
tell which one each item is. 
 
You can access the values using levels(whatever) and also change them. There 
are packages that make this straightforward such as forcats which is one of the 
tidyverse packages that also includes many other tools some find useful but are 
beyond the usual scope of this mailing list.
 
As an example, if you have a vector in mydata$col1 then code like:
 
mydata$col1 <- factor(mydata$col1)
 
No matter which way you do it, you can now access the levels and make whatever 
changes, and save the changes. One example could be to apply some variant of 
grep to make the substitution. There is a family of functions build in such as 
sub() that matches a Regular Expression and replaces it with what you want.
 
This has a similar result to changing all entries without doing all the work. I 
mean if item 5 used to be "OLD" and is now "NEW" then any of you quarter 
million entries that have a 5 will now be seen as having a value of "NEW".
 
I will stop here and suggest you may want to read some book that explains R as 
a unified set of features with some emphasis on using it for the features it is 
intended to have that can make life easier, rather than using just features it 
shares with most languages. Some of your questions indicate you have less 
grounding and are mainly following recipes you stumble across. 
 
Otherwise, you will have a collection of what you call "codes" and others like 
me call programming and that don't necessarily fit well together.
 
 
-Original Message-
From: R-help r-help-boun...@r-project.org  
 On Behalf Of javad bayat
Sent: Tuesday, June 13, 2023 3:47 PM
To: Eric Berger ericjber...@gmail.com  
Cc: R-help@r-project.org  
Subject: Re: [R] Problem with filling dataframe's column
 
Dear all;
I used these codes and I get what I wanted.
Sincerely
 
pat = c("Level 12","Level 22","0")
data3 = data2[-which(data2$Layer == pat),]
dim(data2)
[1] 281549  9
dim(data3)
[1] 244075  9
 
On Tue, Jun 13, 2023 at 11:36 AM Eric Berger <  
ericjber...@gmail.com> wrote:
 
> Hi Javed,
> grep returns the positions of the matches. See an example below.
> 
> > v <- c("abc", "bcd", "def")
> > v
> [1] "abc" "bcd" "def"
> > grep("cd",v)
> [1] 2
> > w <- v[-grep("cd",v)]
> > w
> [1] "abc" "def"
> >
> 
> 
> On Tue, Jun 13, 2023 at 8:50 AM javad bayat <  
> j.bayat...@gmail.com> wrote:
> >
> > Dear Rui;
> > Hi. I used your codes, but it seems it didn't work for me.
> >
> > > pat <- c("_esmdes|_Des Section|0")
> > > dim(data2)
> > [1]  281549  9
> > > grep(pat, data2$Layer)
> > > dim(data2)
> > [1]  281549  9
> >
> > What does grep function do? I expected the function to remove 3 rows of
> the
> > dataframe.
> > I do not know the reason.
> >
> >
> >
> >
> >
> >
> > On Mon, Jun 12, 2023 at 5:16 PM Rui Barradas < 
> >  ruipbarra...@sapo.pt>
> wrote:
> >
> > > Às 23:13 de 12/06/2023, javad bayat escreveu:
> > > > Dear Rui;
> > > > Many thanks for the email. I tried your codes and found that the
> length
> > > of
> > > > the "Values" and "Names" vectors must be equal, otherwise the results
> > > will
> > > > not be useful.
> > > > For some of the characters in the Layer column that I do not need to
> be
> > > > filled in the LU column, I used "NA".
> > > > But I need to delete some of the rows from the table as they are
> useless
> > > > for me. I tried this code to delete entire rows of the dataframe
> which
> > > > contained these three value in the Layer column: It gave me the
> following
> > > > error.
> > > >
> > > >> data3 = data2[-grep(c("_esmdes","_Des Section","0"), data2$Layer),]
> > > >   Warning message:
> > > >In grep(c("_esmdes", "_Des Section", "0"), data2$Layer) :
> > > >argument 'pattern' has length > 1 and only the first element
> will
> > > be
> > > > used
> > > >
> > > >> data3 = data2[!grepl(c("_esmdes","_Des Section","0"), data2$Layer),]
> > > >  Warning message:
> > > >  In grepl(c("_esmdes", "_Des Section", "0"), data2$Layer) :
> > > >  argument 'pattern' has length > 1 and only the first element
> will be
> > > > used
> > > >
> > > > How can I do this?
> > > > 

Re: [R] Problem with filling dataframe's column

2023-06-11 Thread avi.e.gross
The problem being discussed is really a common operation that R handles
quite easily in many ways.

The code shown has way too many things that do not fit to make much sense
and is not written the way many R programmers would write it.

Loops like the one used are legal but not needed. 

As has been noted, use of "==" for assignment is the wrong choice. Not using
some method to refer to a specific cell  would still result in odd behavior.


One accepted and common method s to do this vectorized as you are dealing
with two vectors of the same size.

Code like:

Matches <- Data2$Layer == "Level 12"

Will result in a Boolean vector containing TRUE where it found a match,
FALSE otherwise.

Now you can use the above as a sort of indexing as in:

data2$LU[Matches] <- "Park"

Only the indexes marked TRUE will be selected and set to the new value.

Of course, the two lines can be combined as:

data2$LU[Data2$Layer == "Level 12"] <- "Park"

There are also alternatives where people use functions like ifelse() also in
base R.

And, of course, some people like alternate packages such as in the Tidyverse
where you might have used a mutate() or other methods.

-Original Message-
From: R-help  On Behalf Of javad bayat
Sent: Sunday, June 11, 2023 4:05 PM
To: R-help@r-project.org
Subject: [R] Problem with filling dataframe's column

Dear R users;
I am trying to fill a column based on a specific value in another column of
a dataframe, but it seems there is a problem with the codes!
The "Layer" and the "LU" are two different columns of the dataframe.
How can I fix this?
Sincerely


for (i in 1:nrow(data2$Layer)){
  if (data2$Layer == "Level 12") {
  data2$LU == "Park"
  }
  }




-- 
Best Regards
Javad Bayat
M.Sc. Environment Engineering
Alternative Mail: bayat...@yahoo.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding a numeric class to a data.frame

2023-06-05 Thread avi.e.gross
Jeff,

I wish I could give you an answer to a very specific question.

You have lots of numbers in a vector representing whatever "probabilities"
mean something to you. There are currently no names associated with them.
And you want to make some kind of graph using ggplot.

So, to be quite clear, ggplot tends to like a data.frame or one of several
other such tabular constructs when making graphs, or have some data coerced
into such a format. BUT I am aghast at the concept of giving it a data.frame
with one row and thousands of un-named columns. First, the columns will have
semi-numerical names by default and second, they cannot be used by ggplot
unless you specify a name.

What you normally need is not lots of columns but lots of rows. One column
suffices for some purposes and multiple columns are often present for many
purposes. 

But what are you graphing as in probability versus what? Is that item
correlated with each result in some way? 

You eventually need to probably make a data.frame with two or more such
columns with names for the columns. You need to tell ggplot something like 

ggplot(mydata, aes(x=whatever, y=whatever, ...)) + geom_line(or whatever)
...

But as you release info this slowly, I think I will now drop out of this
conversation.

Good luck.

-Original Message-
From: Jeff Reichman  
Sent: Monday, June 5, 2023 7:29 AM
To: avi.e.gr...@gmail.com; r-help@r-project.org
Subject: RE: [R] Adding a numeric class to a data.frame

Avi

But I don't have a column header to call. Do I simply use column position 

> pred_probability 
 1  2  3  5  8 
0.001156612672 0.000926702837 0.008162332353 0.001544764162 0.000919503109
..
> str(pred_probability )
 Named num [1:6964] 0.001157 0.000927 0.008162 0.001545 0.00092 ...
 - attr(*, "names")= chr [1:6964] "1" "2" "3" "5" ...
>

Jeff

-Original Message-
From: avi.e.gr...@gmail.com  
Sent: Sunday, June 4, 2023 9:58 PM
To: 'Jeff Reichman' ; r-help@r-project.org
Subject: RE: [R] Adding a numeric class to a data.frame

Jeff,

The number of items is not relevant except insofar as your vector of
probabilities is in the same order as the other vector and the same length.

If for example you had a vector of test scores for 10,000 tests and you
calculated the probability in the data of having a 100, then the probability
of a 99 and so on, then you could make another vector of 10,000 giving the
probability of the corresponding entries.

So before calling ggplot, assuming you have two vectors called orig and
prob, you make a data.frame like

Df <- data.frame(orig=orig, prob=prob)

You use that in ggplot.

You can of course add additional columns. Or if your data is in another
format, do things like long to wide conversion and many other things.

If you already have a data.frame with one or more columns including orig,
you can attache the probabilities with something as simple as:

Df$prob = prob

If you are using ggplot, you may as well be using elements of the tidyverse
that provide a different take on how to do some things compared to base R
but that is not something easily discussed here.



-Original Message-
From: Jeff Reichman 
Sent: Sunday, June 4, 2023 10:21 PM
To: avi.e.gr...@gmail.com; r-help@r-project.org
Subject: RE: [R] Adding a numeric class to a data.frame

Yes - I could have done that but I have over 5,000 calculated probabilities.
So yes a little more detail would have helped. I'm needing to add those
probability back into the original data.frame from which the model was
created as I'm going  to be using ggplot2 so I need the probabilities and
original dataframe to be one.

-Original Message-
From: avi.e.gr...@gmail.com 
Sent: Sunday, June 4, 2023 9:00 PM
To: 'Jeff Reichman' ; r-help@r-project.org
Subject: RE: [R] Adding a numeric class to a data.frame

Jeff R, it would be helpful if your intent was understood.

For example, did you want output as a column of labels c("A", "B", "C") and
another adjacent of c(0.0011566127, 0.0009267028, 0.0081623324) then you
could do:

data.frame(labels=c("A", "B", "C"), data=c(0.0011566127, 0.0009267028,
0.0081623324))
  labels data
1  A 0.0011566127
2  B 0.0009267028
3  C 0.0081623324

If you wanted your columns labeled with the data in multiple columns, try
this:

> result <- data.frame(t(c(0.0011566127, 0.0009267028, 0.0081623324))) 
> result
   X1   X2  X3
1 0.001156613 0.0009267028 0.008162332
> names(result) <- c("A", "B", "C")
> result
AB   C
1 0.001156613 0.0009267028 0.008162332

But these are not solutions to your specified problem unless you explain
properly what you want to do and the exact expected output.



-Original Message-
From: R-help  On Behalf Of Jeff Reichman
Sent: Sunday, June 4, 2023 7:11 PM
To: r-help@r-project.org
Subject: [R] Adding a numeric class to a data.frame

R-Help Community

 

How do I add a 

Re: [R] Adding a numeric class to a data.frame

2023-06-04 Thread avi.e.gross
Jeff,

The number of items is not relevant except insofar as your vector of
probabilities is in the same order as the other vector and the same length.

If for example you had a vector of test scores for 10,000 tests and you
calculated the probability in the data of having a 100, then the probability
of a 99 and so on, then you could make another vector of 10,000 giving the
probability of the corresponding entries.

So before calling ggplot, assuming you have two vectors called orig and
prob, you make a data.frame like

Df <- data.frame(orig=orig, prob=prob)

You use that in ggplot.

You can of course add additional columns. Or if your data is in another
format, do things like long to wide conversion and many other things.

If you already have a data.frame with one or more columns including orig,
you can attache the probabilities with something as simple as:

Df$prob = prob

If you are using ggplot, you may as well be using elements of the tidyverse
that provide a different take on how to do some things compared to base R
but that is not something easily discussed here.



-Original Message-
From: Jeff Reichman  
Sent: Sunday, June 4, 2023 10:21 PM
To: avi.e.gr...@gmail.com; r-help@r-project.org
Subject: RE: [R] Adding a numeric class to a data.frame

Yes - I could have done that but I have over 5,000 calculated probabilities.
So yes a little more detail would have helped. I'm needing to add those
probability back into the original data.frame from which the model was
created as I'm going  to be using ggplot2 so I need the probabilities and
original dataframe to be one.

-Original Message-
From: avi.e.gr...@gmail.com  
Sent: Sunday, June 4, 2023 9:00 PM
To: 'Jeff Reichman' ; r-help@r-project.org
Subject: RE: [R] Adding a numeric class to a data.frame

Jeff R, it would be helpful if your intent was understood.

For example, did you want output as a column of labels c("A", "B", "C") and
another adjacent of c(0.0011566127, 0.0009267028, 0.0081623324) then you
could do:

data.frame(labels=c("A", "B", "C"), data=c(0.0011566127, 0.0009267028,
0.0081623324))
  labels data
1  A 0.0011566127
2  B 0.0009267028
3  C 0.0081623324

If you wanted your columns labeled with the data in multiple columns, try
this:

> result <- data.frame(t(c(0.0011566127, 0.0009267028, 0.0081623324))) 
> result
   X1   X2  X3
1 0.001156613 0.0009267028 0.008162332
> names(result) <- c("A", "B", "C")
> result
AB   C
1 0.001156613 0.0009267028 0.008162332

But these are not solutions to your specified problem unless you explain
properly what you want to do and the exact expected output.



-Original Message-
From: R-help  On Behalf Of Jeff Reichman
Sent: Sunday, June 4, 2023 7:11 PM
To: r-help@r-project.org
Subject: [R] Adding a numeric class to a data.frame

R-Help Community

 

How do I add a numeric class to a data .frame. 

 

For example, I have calculated the following probabilities

 

   123

0.0011566127 0.0009267028 0.0081623324

 

How would I add them back into my data.frame for example

 

My_df <- data.frame(col_1 = c('A', 'B', 'C')) such that I end up with

 

col_1   col_2

A  0.0011566127

 

Though I could use a cbind.

 

Jeff


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Adding a numeric class to a data.frame

2023-06-04 Thread avi.e.gross
Jeff R, it would be helpful if your intent was understood.

For example, did you want output as a column of labels c("A", "B", "C") and
another adjacent of c(0.0011566127, 0.0009267028, 0.0081623324) then you
could do:

data.frame(labels=c("A", "B", "C"), data=c(0.0011566127, 0.0009267028,
0.0081623324))
  labels data
1  A 0.0011566127
2  B 0.0009267028
3  C 0.0081623324

If you wanted your columns labeled with the data in multiple columns, try
this:

> result <- data.frame(t(c(0.0011566127, 0.0009267028, 0.0081623324)))
> result
   X1   X2  X3
1 0.001156613 0.0009267028 0.008162332
> names(result) <- c("A", "B", "C")
> result
AB   C
1 0.001156613 0.0009267028 0.008162332

But these are not solutions to your specified problem unless you explain
properly what you want to do and the exact expected output.



-Original Message-
From: R-help  On Behalf Of Jeff Reichman
Sent: Sunday, June 4, 2023 7:11 PM
To: r-help@r-project.org
Subject: [R] Adding a numeric class to a data.frame

R-Help Community

 

How do I add a numeric class to a data .frame. 

 

For example, I have calculated the following probabilities

 

   123

0.0011566127 0.0009267028 0.0081623324

 

How would I add them back into my data.frame for example

 

My_df <- data.frame(col_1 = c('A', 'B', 'C')) such that I end up with

 

col_1   col_2

A  0.0011566127

 

Though I could use a cbind.

 

Jeff


[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] extract parts of a list before symbol

2023-05-26 Thread avi.e.gross
Evan,

Yes, once you know a bit about the details, all kinds of functions are
available to solve problems without going the hard way.

But the names() function is taught fairly widely and did you also pick up on
the fact that it can be used on both sides so it also sets the names?

> # Create a list with mixed case names
> named <- list(age=69, Month="October", YEAR=1492)

> # Save the names to be reset later
> old_names <- names(named)

> # Show current names for the items in named
> print(names(named))
[1] "age"   "Month" "YEAR" 

> # Replace the current names with an upper case version starting with
"var_"
> names(named) <- paste0("var_", toupper(names(named)))

> # Show current namesfor the items in named
> print(names(named))
[1] "var_AGE"   "var_MONTH" "var_YEAR" 

> # Reset the names to the original names
> names(named) <- old_names

> # Show current namesfor the items in named
> print(names(named))
[1] "age"   "Month" "YEAR"

Look out for other such functions that work on both the LHS and RHS.

> evens <- seq(0,20,2)
> evens
 [1]  0  2  4  6  8 10 12 14 16 18 20
> length(evens)
[1] 11
> length(evens) <- 5
> evens
[1] 0 2 4 6 8

If you look just a bit under the hood, R stores all kinds of things about a
variable that you can change with the right function. You can change the
dimensions of a vector for example with dim() as well as get the current
dimensions.


-Original Message-
From: R-help  On Behalf Of Evan Cooch
Sent: Friday, May 26, 2023 10:38 AM
To: r-help@r-project.org
Subject: Re: [R] extract parts of a list before symbol

Many thanks to all. Wasn't even aware of the names function. That does 
the trick for present purposes.

On 5/26/2023 12:02 AM, avi.e.gr...@gmail.com wrote:
> All true Jeff, but why do things the easy way! LOL!
>
> My point was that various data structures, besides the list we started
with,
> store the names as an attribute. Yes, names(listname) works fine to
extract
> whatever parts they want. My original idea of using a data.frame was
because
> it creates names when they are absent.
>
> And you are correct that if the original list was not as shown with only
all
> items of length 1, converting to a data.frame fails.
>
> >From what you say, it is a harder think to write a function that returns
a
> "name" for column N given a list. As you note, you get a null when there
are
> no names.  You get empty strings when one or more (but not all) have no
> names. But it can be done.
>
> The OP initially was looking at a way to get a text version of a variable
> they could use using perhaps regular expressions to parse.  Of course that
> is not as easy as just looking at the names attribute in one of several
> ways. But it may help in a sense to deal with the cases mentioned above.
> The problem is that str() does not return anything except to stdout so it
> must be captured to do silly things.
>
>> test <- list(a=3,b=5,c=11)
>> str(test)
> List of 3
>   $ a: num 3
>   $ b: num 5
>   $ c: num 11
>
>> str(test[1])
> List of 1
>   $ a: num 3
>
>> str(test[2])
> List of 1
>   $ b: num 5
>
>> str(list(a=1, 2, c=3))
> List of 3
>   $ a: num 1
>   $  : num 2
>   $ c: num 3
>
>> str(list(1, 2, 3))
> List of 3
>   $ : num 1
>   $ : num 2
>   $ : num 3
>
>> text <- str(list(a=1, 2, c=3)[1])
> List of 1
>   $ a: num 1
>
>> text <- capture.output(str(list(a=1, 2, c=3)))
>> text
> [1] "List of 3"   " $ a: num 1" " $  : num 2" " $ c: num 3"
> So you could use some imaginative code that extracts what you want. I
> repeat, this is not a suggested way nor the best, just something that
seems
> to work:
>
>> sub("(^[\\$ ]*)(\\w+|)(:.*$)", "\\2", text[2:length(text)])
> [1] "a" ""  "c"
>
> Obviously the first line of output needs to be removed as it does not fit
> the pattern.
>
> Perhaps in this case a way less complex way is to use summary() rather
than
> str as it does return the output as text.
>
>> summary(list(a=1, 2, c=3)) -> text
>> text
>Length Class  Mode
> a 1  -none- numeric
>1  -none- numeric
> c 1  -none- numeric
>
> This puts the variable name, if any, at the start but parsing that is not
> trivial as it is not plain text.
>
> Bottom line, try not to do things the hard way. Just carefully use names()
> ...
>
> -Original Message-
> From: R-help  On Behalf Of Jeff Newmiller
> Sent: Thursday, May 25, 2023 10:32 PM
> To: r-help@r-project.org
> Subject: Re: [R] extract parts of a list before symbol
>
> What a remarkable set of detours, Avi, all deriving apparently from a few
> gaps in your understanding of R.
>
> As Rolf said, "names(test)" is the answer.
>
> a) Lists are vectors. They are not atomic vectors, but they are vectors,
so
> as.vector(test) is a no-op.
>
> test <- list( a = 1, b = 2, c=3 )
> attributes(test)
> attributes(as.vector(test))
>
> (Were you thinking of the unlist function? If so, there is no reason to
> convert the value of the list to an atomic vector in order to look at the
> value of an attribute of that list.)
>
> b) Data frames 

Re: [R] extract parts of a list before symbol

2023-05-25 Thread avi.e.gross
All true Jeff, but why do things the easy way! LOL!

My point was that various data structures, besides the list we started with,
store the names as an attribute. Yes, names(listname) works fine to extract
whatever parts they want. My original idea of using a data.frame was because
it creates names when they are absent.  

And you are correct that if the original list was not as shown with only all
items of length 1, converting to a data.frame fails.

>From what you say, it is a harder think to write a function that returns a
"name" for column N given a list. As you note, you get a null when there are
no names.  You get empty strings when one or more (but not all) have no
names. But it can be done.

The OP initially was looking at a way to get a text version of a variable
they could use using perhaps regular expressions to parse.  Of course that
is not as easy as just looking at the names attribute in one of several
ways. But it may help in a sense to deal with the cases mentioned above.
The problem is that str() does not return anything except to stdout so it
must be captured to do silly things.

> test <- list(a=3,b=5,c=11)

> str(test)
List of 3
 $ a: num 3
 $ b: num 5
 $ c: num 11

> str(test[1])
List of 1
 $ a: num 3

> str(test[2])
List of 1
 $ b: num 5

> str(list(a=1, 2, c=3))
List of 3
 $ a: num 1
 $  : num 2
 $ c: num 3

> str(list(1, 2, 3))
List of 3
 $ : num 1
 $ : num 2
 $ : num 3

> text <- str(list(a=1, 2, c=3)[1])
List of 1
 $ a: num 1

> text <- capture.output(str(list(a=1, 2, c=3)))
> text
[1] "List of 3"   " $ a: num 1" " $  : num 2" " $ c: num 3"
So you could use some imaginative code that extracts what you want. I
repeat, this is not a suggested way nor the best, just something that seems
to work:

> sub("(^[\\$ ]*)(\\w+|)(:.*$)", "\\2", text[2:length(text)])
[1] "a" ""  "c"

Obviously the first line of output needs to be removed as it does not fit
the pattern. 

Perhaps in this case a way less complex way is to use summary() rather than
str as it does return the output as text.

> summary(list(a=1, 2, c=3)) -> text
> text
  Length Class  Mode   
a 1  -none- numeric
  1  -none- numeric
c 1  -none- numeric

This puts the variable name, if any, at the start but parsing that is not
trivial as it is not plain text. 

Bottom line, try not to do things the hard way. Just carefully use names()
...

-Original Message-
From: R-help  On Behalf Of Jeff Newmiller
Sent: Thursday, May 25, 2023 10:32 PM
To: r-help@r-project.org
Subject: Re: [R] extract parts of a list before symbol

What a remarkable set of detours, Avi, all deriving apparently from a few
gaps in your understanding of R.

As Rolf said, "names(test)" is the answer.

a) Lists are vectors. They are not atomic vectors, but they are vectors, so
as.vector(test) is a no-op.

test <- list( a = 1, b = 2, c=3 )
attributes(test)
attributes(as.vector(test))

(Were you thinking of the unlist function? If so, there is no reason to
convert the value of the list to an atomic vector in order to look at the
value of an attribute of that list.)

b) Data frames are lists, with the additional constraint that all elements
have the same length, and that a names attribute and a row.names attribute
are both required. Converting a list to a data frame to get the names is
expensive in CPU cycles and breaks as soon as the list elements have a
variety of lengths.

c) All data in R is stored as vectors. Worrying about whether a data value
is a vector is pointless.

d) All objects can have attributes, including the name attribute. However,
not all objects must have a name attribute... including lists. Omitting a
name for any of the elements of a list in the constructor will lead to
having a zero-length character values in the name attribute where the names
were omitted. Omitting all names in the list constructor will cause no names
attribute to be created for that list.

test2 <- list( 1, 2, 3 )
attributes(test2)

e) The names() function returns the value of the names attribute. If that
attribute is missing, it returns NULL. For dataframes, the colnames function
is equivalent to the names function (I rarely use the colnames function).
For lists, colnames returns NULL... there are no "columns" in a list,
because there is no constraint on the (lengths of the) contents of a list.

names(test2)

f) The names attribute, if it exists, is just a character vector. It is
never necessary to convert the output of names() to a character vector. If
the names attribute doesn't exist, then it is up to the user to write code
that creates it.

names(test2) <- c( "A", "B", "C" )
attributes(test2)
names(test2)
# or use the argument names in the list function

names(test2) <- 1:3 # integer
names(test2) # character
attributes(test2)$names <- 1:3 # integer
attributes(test2) # character
test2[[ "2" ]] == 2  # TRUE
test2$`2`  == 2 # TRUE



On May 25, 2023 6:17:37 PM PDT, avi.e.gr...@gmail.com wrote:
>Evan,
>
>List names are less easy than data.frame column names so 

Re: [R] extract parts of a list before symbol

2023-05-25 Thread avi.e.gross
Evan,

List names are less easy than data.frame column names so try this:

> test <- list(a=3,b=5,c=11)
> colnames(test)
NULL
> colnames(as.data.frame(test))
[1] "a" "b" "c"

But note an entry with no name has one made up for it.


> test2 <- list(a=3,b=5, 666, c=11)
> colnames(data.frame(test2))
[1] "a""b""X666" "c"   

But that may be overkill as simply converting to a vector if ALL parts are
of the same type will work too:

> names(as.vector(test))
[1] "a" "b" "c"

To get one at a time:

> names(as.vector(test))[1]
[1] "a"

You can do it even simple by looking at the attributes of your list:

> attributes(test)
$names
[1] "a" "b" "c"

> attributes(test)$names
[1] "a" "b" "c"
> attributes(test)$names[3]
[1] "c"


-Original Message-
From: R-help  On Behalf Of Evan Cooch
Sent: Thursday, May 25, 2023 1:30 PM
To: r-help@r-project.org
Subject: [R] extract parts of a list before symbol

Suppose I have the following list:

test <- list(a=3,b=5,c=11)

I'm trying to figure out how to extract the characters to the left of 
the equal sign (i.e., I want to extract a list of the variable names, a, 
b and c.

I've tried the permutations I know of involving sub - things like 
sub("\\=.*", "", test), but no matter what I try, sub keeps returning 
(3, 5, 11). In other words, even though I'm trying to extract the 
'stuff' before the = sign, I seem to be successful only at grabbing the 
stuff after the equal sign.

Pointers to the obvious fix? Thanks...

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regex Split?

2023-05-05 Thread avi.e.gross
Leonard,

It can be helpful to spell out your intent in English or some of us have to go 
back to the documentation to remember what some of the operators do.

Your text being searched seems to be an example of items between comas with an 
optional space after some commas and in one case, nothing between commas.

So what is your goal for the example, and in general? You mention a bit 
unclearly at the end some of what you expect and I think it would be clearer if 
you also showed exactly the output you would want.

I saw some other replies that addressed what you wanted and am going to reply 
in another direction.

Why do things the hard way using things like lookahead or look behind? Would 
several steps get you the result way more clearly?

For the sake of argument, you either want what reading in a CSV file would 
supply, or something else. Since you are not simply splitting on commas, it 
sounds like something else. But what exactly else? Something as simple as this 
on just a comma produces results including empty strings and embedded leading 
or trailing spaces:

strsplit("a bc,def, adef ,,gh", ",")
[[1]]
[1] "a bc"   "def"" adef " ""   "gh"  

That can of course be handled by, for example, trimming the result after 
unlisting the odd way strsplit returns results:

library("stringr") 
str_squish(unlist(strsplit("a bc,def, adef ,,gh", ",")))

[1] "a bc" "def"  "adef" "" "gh"  

Now do you want the empty string to be something else, such as an NA? That can 
be done too with another step.

And a completely different variant can be used to read in your one-line CSV as 
text using standard overkill tools:

> read.table(text="a bc,def, adef ,,gh", sep=",")
V1  V2 V3 V4 V5
1 a bc def  adef  NA gh

The above is a vector of texts. But if you simply want to reassemble your 
initial string cleaned up a bit, you can use paste to put back commas, as in a 
variation of the earlier example:

> paste(str_squish(unlist(strsplit("a bc,def, adef ,,gh", ","))), collapse=",")
[1] "a bc,def,adef,,gh"

So my question is whether using advanced methods is really necessary for your 
case, or even particularly efficient. If efficiency matters, often, it is 
better to use tools without regular expressions such as paste0() when they meet 
your needs.

Of course, unless I know what you are actually trying to do, my remarks may be 
not useful. 



-Original Message-
From: R-help  On Behalf Of Leonard Mada via R-help
Sent: Thursday, May 4, 2023 5:00 PM
To: R-help Mailing List 
Subject: [R] Regex Split?

Dear R-Users,

I tried the following 3 Regex expressions in R 4.3:
strsplit("a bc,def, adef ,,gh", " |(?=,)|(?<=,)(?![ ])", perl=T)
# "a""bc"   ",""def"  ",""" "adef" ",""," "gh"

strsplit("a bc,def, adef ,,gh", " |(?https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Split String in regex while Keeping Delimiter

2023-04-12 Thread avi.e.gross
Sometimes you need to NOT use a regular expression and do things simpler. You 
have a fairly simple example that not only does not need great power but may be 
a pain to do using a very powerful technique, especially if you want to play 
with look-ahead and look behind.

Assuming you have a line with repeated runs on non-plussed text followed by one 
to three contiguous runs of a plus sign, can you write a short function that 
scans along till it finds a plus sign, then looks ahead till it sees a space or 
the end of the line. It then wraps up all the text till the last plus and adds 
a copy to a growing list or other structure. It then continues from the 
space(s) it ignores and repeats.

When done, you have what you want in the format you want.

A variant on this is to start from the end and scan backwards and stop at any 
plus sign. Keep what follows but strip any whitespace to the left of it. The 
result is the list in backwards order unless you used a stack to hold it.

There are quite a few variants that might apply and perhaps use of functions in 
modules. A dumb example might be to preprocess the string and replace all 
instances of 1 to 3 plus signs and an optional space  with itself and an added 
letter like a ":" and then a second pass using a regular expression becomes 
trivial as the colons disappear.

-Original Message-
From: R-help  On Behalf Of David Winsemius
Sent: Wednesday, April 12, 2023 6:03 PM
To: Emily Bakker 
Cc: r-help@r-project.org
Subject: Re: [R] Split String in regex while Keeping Delimiter

I thought replacing the spaces following instances of +++,++,+,- with "\n" and 
then reading with scan should succeed. Like Ivan Krylov I was fairly sure that 
you meant the minus sign to be "-" rather than "–", but perhaps your were using 
MS Word as an editor which is inconsistent with effective use of R. If so, 
learn to use a proper programming editor, and in any case learn to post to 
rhelp in plain text.

-- 
David

scan(text=gsub("([-+]){1}\\s", "\\1\n", dat), what="", sep="\n")



> On Apr 12, 2023, at 2:29 AM, Emily Bakker  wrote:
> 
> Hello List,
>  
> I have a dataset consisting of strings that I want to split while saving the 
> delimiter.
>  
> Some example data:
> “leucocyten + gramnegatieve staven +++ grampositieve staven ++”
> “leucocyten – grampositieve coccen +”
>  
> I want to split the strings such that I get the following result:
> c(“leucocyten +”,  “gramnegatieve staven +++”,  “grampositieve staven ++”)
> c(“leucocyten –“, “grampositieve coccen +”)
>  
> I have tried strsplit with a regular expression with a positive lookahead, 
> but I am not able to achieve the results that I want.
>  
> I have tried:
> as.list(strsplit(x, split = “(?=[\\+-]{1,3}\\s)+, perl=TRUE)
>  
> Which results in:
> c(“leucocyten “, “+”,  “gramnegatieve staven “, “+”, “+”, “+”,  
> “grampositieve staven ++”)
> c(“leucocyten “, “–“, “grampositieve coccen +”)
>  
>  
> Is there a function or regular expression that will make this possible?
>  
> Kind regards,
> Emily 
>  
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix scalar operation that saves memory?

2023-04-11 Thread avi.e.gross
The example given does not leave room for even a single copy of your matrix
so, yes, you need alternatives.

Your example was fairly trivial as all you wanted to do is subtract each
value from 100 and replace it. Obviously something like squaring a matrix
has no trivial way to do without multiple copies out there that won't fit.

One technique that might work is a nested loop that changes one cell of the
matrix at a time and in-place. A variant of this might be a singe loop that
changes a single row (or column) at a time and in place.

Another odd concept is to save your matrix in a file with some format you
can read back in such as a line or row at a time, and then do the
subtraction from 100 and write it back to disk in another file. If you need
it again, I assume you can read it in but perhaps you should consider how to
increase some aspects of your "memory".

Is your matrix a real matrix type or something like a list of lists or a
data.frame? You may do better with some data structures that are more
efficient than others.

Some OS allow you to use virtual memory that is mapped in and out from the
disk that allows larger things to be done, albeit often much slower. I also
note that you can remove some things you are not using and hope garbage
collection happens soon enough.

-Original Message-
From: R-help  On Behalf Of Shunran Zhang
Sent: Tuesday, April 11, 2023 10:21 PM
To: r-help@r-project.org
Subject: [R] Matrix scalar operation that saves memory?

Hi all,

I am currently working with a quite large matrix that takes 300G of 
memory. My computer only has 512G of memory. I would need to do a few 
arithmetic on it with a scalar value. My current code looks like this:

mat <- 100 - mat

However such code quickly uses up all of the remaining memory and got 
the R script killed by OOM killer.

Are there any more memory-efficient way of doing such operation?

Thanks,

S. Zhang

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Simple Stacking of Two Columns

2023-04-03 Thread avi.e.gross
I may be missing something but using the plain old c() combine function
seems to work fine:

df <- data.frame(left = 1:5, right = 6:10)
df.combined <- data.frame(comb = c(df$left, df$right))

df
  left right
11 6
22 7
33 8
44 9
5510

df.combined
   comb
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10   10



-Original Message-
From: R-help  On Behalf Of Heinz Tuechler
Sent: Monday, April 3, 2023 4:39 PM
To: r-help@r-project.org
Subject: Re: [R] Simple Stacking of Two Columns

Jeff Newmiller wrote/hat geschrieben on/am 03.04.2023 18:26:
> unname(unlist(NamesWide))
Why not:

NamesWide <- data.frame(Name1=c("Tom","Dick"),Name2=c("Larry","Curly"))
NamesLong <- data.frame(Names=with(NamesWide, c(Name1, Name2)))

>
> On April 3, 2023 8:08:59 AM PDT, "Sparks, John"  wrote:
>> Hi R-Helpers,
>>
>> Sorry to bother you, but I have a simple task that I can't figure out how
to do.
>>
>> For example, I have some names in two columns
>>
>> NamesWide<-data.frame(Name1=c("Tom","Dick"),Name2=c("Larry","Curly"))
>>
>> and I simply want to get a single column
>> NamesLong<-data.frame(Names=c("Tom","Dick","Larry","Curly"))
>>> NamesLong
>>  Names
>> 1   Tom
>> 2  Dick
>> 3 Larry
>> 4 Curly
>>
>>
>> Stack produces an error
>> NamesLong<-stack(NamesWide$Name1,NamesWide$Names2)
>> Error in if (drop) { : argument is of length zero
>>
>> So does bind_rows
>>> NamesLong<-dplyr::bind_rows(NamesWide$Name1,NamesWide$Name2)
>> Error in `dplyr::bind_rows()`:
>> ! Argument 1 must be a data frame or a named atomic vector.
>> Run `rlang::last_error()` to see where the error occurred.
>>
>> I tried making separate dataframes to get around the error in bind_rows
but it puts the data in two different columns
>> Name1<-data.frame(c("Tom","Dick"))
>> Name2<-data.frame(c("Larry","Curly"))
>> NamesLong<-dplyr::bind_rows(Name1,Name2)
>>> NamesLong
>>  c..TomDick.. c..LarryCurly..
>> 1  Tom
>> 2 Dick
>> 3Larry
>> 4Curly
>>
>> gather makes no change to the data
>> NamesLong<-gather(NamesWide,Name1,Name2)
>>> NamesLong
>>  Name1 Name2
>> 1   Tom Larry
>> 2  Dick Curly
>>
>>
>> Please help me solve what should be a very simple problem.
>>
>> Thanks,
>> John Sparks
>>
>>
>>
>>
>>
>>  [[alternative HTML version deleted]]
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] printing a data.frame without row numbers

2023-03-27 Thread avi.e.gross
Try:

print(data.frame(COL1=1:5, COL2=10:6), row.names=FALSE)



-Original Message-
From: R-help  On Behalf Of Dennis Fisher
Sent: Monday, March 27, 2023 1:05 PM
To: r-help@r-project.org
Subject: [R] printing a data.frame without row numbers

R 4.2.3
OS X

Colleagues,

I am printing a large number of tables using the print command.  A simple
example is:
print(data.frame(COL1=1:5, COL2=10:6))

The result in this case is:
  COL1 COL2
11   10
229
338
447
556

I would like to print the table WITHOUT the row numbers:
 COL1 COL2
1   10
29
38
47
56

Is there any simple way to accomplish this, short of writing my own print
method or outputting line-by-line using cat?

Dennis

Dennis Fisher MD
P < (The "P Less Than" Company)
Phone / Fax: 1-866-PLessThan (1-866-753-7784)
www.PLessThan.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] DOUBT

2023-03-21 Thread avi.e.gross
Your spelling of:

HH size

Is two word.

-Original Message-
From: R-help  On Behalf Of Nandini raj
Sent: Monday, March 20, 2023 1:17 PM
To: r-help@r-project.org
Subject: [R] DOUBT

Respected sir/madam
can you please suggest what is an unexpected symbol in the below code for
running a multinomial logistic regression

model <- multinom(adoption ~ age + education + HH size + landholding +
Farmincome + nonfarmincome + creditaccesibility + LHI, data=newdata)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] nth kludge

2023-03-08 Thread avi.e.gross
I see many are not thrilled with the concise but unintuitive way it is
suggested you use with the new R pipe function. 
 
I am wondering if any has created one of a family of functions that might be
more intuitive if less general.
 
Some existing pipes simply allowed you to specify where in an argument list
to put the results from the earlier pipeline as in:
 
. %>% func(first, . , last)
 
In the above common case, it substituted into the second position.
 
What would perhaps be a useful variant is a function that does not evaluate
it's arguments and expects a first argument passed from the pipe and a
second argument that is a number like 2 or 3 and  a third  argument that is
the (name of) a function and remaining arguments.
 
The above might look like:
 
. %>% the_nth(2, func, first , last)
 
The above asks to take the new implicitly passed first argument which I will
illustrate with a real argument as it would also work without a pipeline:
 
the_nth(piped, 2, func, first, last)
 
So it would make a list out of the remaining arguments that looks like
list(first, last) and interpolate piped at position 2 to make list(first,
piped, last) and then use something like do.call()
 
do.call(func, list(first, piped, last))
 
I am not sure if this is much more readable, but seems like a
straightforward function to write, and perhaps a decent version could make
it into the standard library some year that is general and more useful than
the darn anonymous lambda notation.
 

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-02-12 Thread avi.e.gross
Steven,

The default is drop=TRUE.

If you want to retain a data.frame and not have it reduced to a vector under 
some circumstances. 

https://win-vector.com/2018/02/27/r-tip-use-drop-false-with-data-frames/

-Original Message-
From: R-help  On Behalf Of Steven T. Yen
Sent: Sunday, February 12, 2023 5:19 PM
To: Andrew Simmons 
Cc: R-help Mailing List 
Subject: Re: [R] Removing variables from data frame with a wile card

In the line suggested by Andrew Simmons,

mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]

what does drop=FALSE do? Thanks.

On 1/14/2023 8:48 PM, Steven Yen wrote:
> Thanks to all. Very helpful.
>
> Steven from iPhone
>
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
>>
>> You'll want to use grep() or grepl(). By default, grep() uses 
>> extended regular expressions to find matches, but you can also use 
>> perl regular expressions and globbing (after converting to a regular 
>> expression).
>> For example:
>>
>> grepl("^yr", colnames(mydata))
>>
>> will tell you which 'colnames' start with "yr". If you'd rather you 
>> use globbing:
>>
>> grepl(glob2rx("yr*"), colnames(mydata))
>>
>> Then you might write something like this to remove the columns 
>> starting with yr:
>>
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>>
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>>>
>>> I have a data frame containing variables "yr3",...,"yr28".
>>>
>>> How do I remove them with a wild cardsomething similar to "del yr*"
>>> in Windows/doc? Thank you.
>>>
 colnames(mydata)
>>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>>   [6] ...
>>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>>  [66] "yr28"...
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] preserve class in apply function

2023-02-08 Thread avi.e.gross
Jorgen is correct that for many purposes, viewing a data.frame as a
collection of vectors of the same length allows you to code fairly complex
logic using whichever vectors you want and result in a vector answer, either
externally or as a new column. Text columns used to make some decisions in
the function are also usable using vectorized functions like ifelse(cond,
when_true, when_false).

And, although much can be done in base R, people often use the
dplyr/tidyverse function of mutate() to do such calculations in a slightly
less wordy way.

You may be looking at apply() as a way to operate one row at a time when an
R paradigm is to be able to operate on all rows sort of at once.

-Original Message-
From: R-help  On Behalf Of Jorgen Harmse via
R-help
Sent: Wednesday, February 8, 2023 11:10 AM
To: r-help@r-project.org; naresh_gurbux...@hotmail.com
Subject: Re: [R] preserve class in apply function

What are you trying to do? Why use apply when there is already a vector
addition operation?
df$x+df$y or as.numeric(df$x)+as.numeric(df$y) or
rowSums(as.numeric(df[c('x','y')])).

As noted in other answers, apply will coerce your data frame to a matrix,
and all entries of a matrix must have the same type.

Regards,
Jorgen Harmse.

Message: 1
Date: Tue, 7 Feb 2023 07:51:50 -0500
From: Naresh Gurbuxani 
To: "r-help@r-project.org" 
Subject: [R] preserve class in apply function
Message-ID:
 


Content-Type: text/plain; charset="us-ascii"


> Consider a data.frame whose different columns have numeric, character, 
> and factor data.  In apply function, R seems to pass all elements of a 
> row as character.  Is it possible to preserve numeric class?
>
>> mydf <- data.frame(x = rnorm(10), y = runif(10)) apply(mydf, 1, 
>> function(row) {row["x"] + row["y"]})
> [1]  0.60150197 -0.74201827  0.80476392 -0.59729280 -0.02980335  
> 0.31351909 [7] -0.63575990  0.22670658  0.55696314  0.39587314
>> mydf[, "z"] <- sample(letters[1:3], 10, replace = TRUE) apply(mydf, 
>> 1, function(row) {row["x"] + row["y"]})
> Error in row["x"] + row["y"] (from #1) : non-numeric argument to 
> binary operator
>> apply(mydf, 1, function(row) {as.numeric(row["x"]) + 
>> as.numeric(row["y"])})
> [1]  0.60150194 -0.74201826  0.80476394 -0.59729282 -0.02980338  
> 0.31351912 [7] -0.63575991  0.22670663  0.55696309  0.39587311
>> apply(mydf[,c("x", "y")], 1, function(row) {row["x"] + row["y"]})
> [1]  0.60150197 -0.74201827  0.80476392 -0.59729280 -0.02980335  
> 0.31351909 [7] -0.63575990  0.22670658  0.55696314  0.39587314





[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] preserve class in apply function

2023-02-08 Thread avi.e.gross



Naresh,

This is a common case where the answer to a question is to ask the right 
question.

Your question was how to make apply work. My question is how can you get the 
functionality you want done in some version of R.

Apply is a tool and it is only one of many tools and may be the wrong one for 
your task. 

For a data.frame there can be lots of tools you may investigate both in vase R 
and add-on packages like dplyr in the tidyverse.

As has been pointed out, a side-effect of apply is to make a matrix and R 
automagically figures out what the most specific kind of data type it can 
handle.

So solutions range from not including any columns that are not numeric, if that 
makes sense, to accepting they are all going to be of type character and in the 
function you apply, convert them individually back to what you want.

One straightforward solution is to make a loop indexed to the number of rows in 
your data.frame and process the variables in each row using [] notation. Not 
fast, but you see what you have. 

Another is functions like pmap() in the purr package. Yet another might be the 
rowwise() function in the dplyr package.

It depends on what you want to do. Note with multiple columns, sometimes your 
function may need to use a ... to receive them.
-Original Message-
From: R-help  On Behalf Of Naresh Gurbuxani
Sent: Tuesday, February 7, 2023 3:29 PM
To: PIKAL Petr 
Cc: r-help@r-project.org
Subject: Re: [R] preserve class in apply function

Thanks for all the responses.  I need to use some text columns to determine 
method applied to numeric columns. 

Split seems to be the way to go.

Sent from my iPhone

> On Feb 7, 2023, at 8:31 AM, PIKAL Petr  wrote:
> 
> Hi Naresh
> 
> If you wanted to automate the function a bit you can use sapply to 
> find numeric columns ind <- sapply(mydf, is.numeric)
> 
> and use it in apply construct
> apply(mydf[,ind], 1, function(row) sum(row)) [1]  2.13002569  
> 0.63305300  1.48420429  0.13523859  1.17515873 -0.98531131 [7]  
> 0.47044467  0.23914494  0.26504430  0.02037657
> 
> Cheers
> Petr
> 
>> -Original Message-
>> From: R-help  On Behalf Of Naresh 
>> Gurbuxani
>> Sent: Tuesday, February 7, 2023 1:52 PM
>> To: r-help@r-project.org
>> Subject: [R] preserve class in apply function
>> 
>> 
>>> Consider a data.frame whose different columns have numeric, 
>>> character, and factor data.  In apply function, R seems to pass all 
>>> elements of a row as character.  Is it possible to preserve numeric class?
>>> 
 mydf <- data.frame(x = rnorm(10), y = runif(10)) apply(mydf, 1, 
 function(row) {row["x"] + row["y"]})
>>> [1]  0.60150197 -0.74201827  0.80476392 -0.59729280 -0.02980335
>> 0.31351909
>>> [7] -0.63575990  0.22670658  0.55696314  0.39587314
 mydf[, "z"] <- sample(letters[1:3], 10, replace = TRUE) apply(mydf, 
 1, function(row) {row["x"] + row["y"]})
>>> Error in row["x"] + row["y"] (from #1) : non-numeric argument to 
>>> binary
>> operator
 apply(mydf, 1, function(row) {as.numeric(row["x"]) +
>> as.numeric(row["y"])})
>>> [1]  0.60150194 -0.74201826  0.80476394 -0.59729282 -0.02980338
>> 0.31351912
>>> [7] -0.63575991  0.22670663  0.55696309  0.39587311
 apply(mydf[,c("x", "y")], 1, function(row) {row["x"] + row["y"]})
>>> [1]  0.60150197 -0.74201827  0.80476392 -0.59729280 -0.02980335
>> 0.31351909
>>> [7] -0.63575990  0.22670658  0.55696314  0.39587314
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting data using subset function

2023-02-05 Thread avi.e.gross
In reading the post again, it sounds like the question is how to create a 
logical condition that translates as 1:N is TRUE. Someone hinted along those 
lines. 

So one WAY I might suggest is you construct a logical vector as shown below. I 
give an example of a bunch of 9 primes and you want only the first 5.

> vec <- c(1,2,3,5,7,11,13,17, 19)
> length(vec)
[1] 9
> N <- 5
> choose <- rep(TRUE, N)
> choose
[1] TRUE TRUE TRUE TRUE TRUE
> subset(vec, choose)# will fail as the remaining are recycled or 
> assumed to be true
[1]  1  2  3  5  7 11 13 17 19
> tot = length(vec)
> choose <- c(rep(TRUE, N), rep(FALSE, tot - N))
> choose
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE FALSE FALSE FALSE
> subset(vec, choose)
[1] 1 2 3 5 7

The end shows you need to create N Boolean TRUE values and tot-N FALSE to make 
a vector of the same length as the first so everything is indexed.

Does something like this meet your needs?

Realistically, the above technique generalizes to more complex cases decently. 
But sometimes, head(0 and other things I mentioned earlier work quite well.


-Original Message-
From: R-help  On Behalf Of Upananda Pani
Sent: Sunday, February 5, 2023 2:33 PM
To: Andrés González Carmona 
Cc: r-help 
Subject: Re: [R] Extracting data using subset function

Thank you. It means we can not use the subset function here.

Regards

On Mon, 6 Feb, 2023, 00:53 Andrés González Carmona,  wrote:

> From ?subset:
> Warning
>
> This is a convenience function intended for use interactively. For 
> programming it is better to use the standard subsetting functions like 
> [ , and in particular 
> the non-standard evaluation of argument subset can have unanticipated 
> consequences.
>
> El 05/02/2023 a las 15:07, Upananda Pani escribió:
>
> Dear All,
>
> I want to create a vector p and extract first 20 observations using 
> subset function based on logical condition.
>
> My code is below
>
> p <- 0:100
>
> I know i can extract the first 20 observations using the following command.
>
> q <- p[1:20]
>
> But I want to extract the first 20 observations using subset function 
> which requires a logical condition. I am not able to frame the logical 
> condition.
>
> The code should be
>
> q <- subset(p, logical condition)
>
> I am not able to do it. Please let me know what you think.
>
> Best regards,
> Upananda
>
>   [[alternative HTML version deleted]]
>
> __r-h...@r-project.org 
> mailing list -- To UNSUBSCRIBE and more, 
> seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r
> -help__;!!D9dNQwwGXtA!SLdwTGqSfhwUo4CfbUJFeL7hETw64hOG8MQ0FK_o5YdnvVHa
> Op9Qxs4D7d5e10hj3YQ8EuaFc8qbnkynoP5dEA$
> PLEASE do read the posting guide 
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.htm
> l__;!!D9dNQwwGXtA!SLdwTGqSfhwUo4CfbUJFeL7hETw64hOG8MQ0FK_o5YdnvVHaOp9Q
> xs4D7d5e10hj3YQ8EuaFc8qbnkwuss43hA$
> and provide commented, minimal, self-contained, reproducible code.
>
> --
> * Andrés González Carmona *
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting data using subset function

2023-02-05 Thread avi.e.gross
A major question is why you ask how to use the subset function rather than 
asking how to get your job done.

As you note, the simple way to get the first N items is to use indexing. If you 
absolutely positively insist on using subset, place your data into something 
like a data.frame and add a column and use that.

Say you have N items, so that length() or nrow() return N. You can add a column 
called index whose entries are 1:N and do your subset to get all rows where the 
condition is "index <= N" or something like that. 

If you want just the first N (not N at a time in several tries) simply used 
head(data, N) and that gets you the first N.

And if you are interested in learning other packages like dplyr in the 
tidyverse, it has oodles of ways to select based on row numbers. One example is 
top_n(...) and another is slice(first, last)  and in some contexts such as the 
filter() function, there is a sort of internal function you can use called n() 
that contains the index of the entry within any grouping so filter(mydata, n() 
<= 20) might get what you want.

Note that the above requires care as some things work on vectors and others 
assume a data.frame, albeit you can make a data.frame containing a single 
vector.


-Original Message-
From: R-help  On Behalf Of Upananda Pani
Sent: Sunday, February 5, 2023 2:33 PM
To: Andrés González Carmona 
Cc: r-help 
Subject: Re: [R] Extracting data using subset function

Thank you. It means we can not use the subset function here.

Regards

On Mon, 6 Feb, 2023, 00:53 Andrés González Carmona,  wrote:

> From ?subset:
> Warning
>
> This is a convenience function intended for use interactively. For 
> programming it is better to use the standard subsetting functions like 
> [ , and in particular 
> the non-standard evaluation of argument subset can have unanticipated 
> consequences.
>
> El 05/02/2023 a las 15:07, Upananda Pani escribió:
>
> Dear All,
>
> I want to create a vector p and extract first 20 observations using 
> subset function based on logical condition.
>
> My code is below
>
> p <- 0:100
>
> I know i can extract the first 20 observations using the following command.
>
> q <- p[1:20]
>
> But I want to extract the first 20 observations using subset function 
> which requires a logical condition. I am not able to frame the logical 
> condition.
>
> The code should be
>
> q <- subset(p, logical condition)
>
> I am not able to do it. Please let me know what you think.
>
> Best regards,
> Upananda
>
>   [[alternative HTML version deleted]]
>
> __r-h...@r-project.org 
> mailing list -- To UNSUBSCRIBE and more, 
> seehttps://urldefense.com/v3/__https://stat.ethz.ch/mailman/listinfo/r
> -help__;!!D9dNQwwGXtA!SLdwTGqSfhwUo4CfbUJFeL7hETw64hOG8MQ0FK_o5YdnvVHa
> Op9Qxs4D7d5e10hj3YQ8EuaFc8qbnkynoP5dEA$
> PLEASE do read the posting guide 
> https://urldefense.com/v3/__http://www.R-project.org/posting-guide.htm
> l__;!!D9dNQwwGXtA!SLdwTGqSfhwUo4CfbUJFeL7hETw64hOG8MQ0FK_o5YdnvVHaOp9Q
> xs4D7d5e10hj3YQ8EuaFc8qbnkwuss43hA$
> and provide commented, minimal, self-contained, reproducible code.
>
> --
> * Andrés González Carmona *
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem to run python code in r markdown

2023-01-20 Thread avi.e.gross
Kai,

Just FYI, this is mainly an R mailing list and although there ware ways to 
combine python with R (or sort of alone) within environments like RSTUDIO, this 
may not be an optimal place to discuss this. You are discussing what is no 
longer really "R markdown" and more just plain "markdown" that supports many 
languages in many environments and you are using one called flexdashboard.

I had a similar problem for a while because I had an amazing number of copies 
of python installed on my machine in various ways including directly and within 
Anaconda and several other ways. Each may have kept copies of modules like 
numpy in a different place. Unfortunately, when running python from RSTUDIO, it 
was looking in a place that did not have it and when I used "pip" to get the 
latest copy, it insisted I already had it!

So I eventually uninstalled all versions I could find and reinstalled just one 
version and then got the numpy and pandas modules installed under that version. 
Oh, I also replaced/updated RSTUDIO. Things work fine for now.

Some people may advise you on how to determine which version of python you are 
calling, or change it, and how to download numpy to the place it is expected, 
or change some environmental variable to point to it or any number of other 
solutions. Some like virtual environments, for example.

The bottom line is your setup does not have numpy installed as far as the 
software is concerned even if it is installed somewhere on your machine. When 
you get things aligned, you will be fine. 

-Original Message-
From: R-help  On Behalf Of Kai Yang via R-help
Sent: Friday, January 20, 2023 11:20 AM
To: R-help Mailing List 
Subject: [R] Problem to run python code in r markdown

Hi Team,I'm trying to run python in R markdown (flexdashboard). The code is 
below:

try Python=
```{r, include=FALSE, echo=TRUE}library(reticulate)py_install("numpy")
use_condaenv("base")
```
```{python}import numpy as np```

I got error message below:
Error in py_call_impl(callable, dots$args, dots$keywords) :   
ModuleNotFoundError: No module named 'numpy'Calls:  ... 
py_capture_output -> force ->  -> py_call_implIn addition: There 
were 26 warnings (use warnings() to see them)Execution halted


Based on message, the python can not find numpy package. But I'm sure I 
installed the package. I don't know how to fix the problem. please help Thank 
you,Kai















[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] foreign package: unable to read S-Plus objects

2023-01-17 Thread avi.e.gross
Just an idea if this is a one-time need to copy static data once used in
non-R to R. You are a bit vague about what you mean by "objects."

If you can find someone who uses S or S+ then maybe they can load the data
in and export it in some format usable for you and send you those files. If,
for example, your data is just a data.frame, they can send it as a .CSV file
or any format you want. Once you have it, you can read it into R and save it
any other way you wish.

If the data is something more complex like a dump of all kinds of variables,
this may not be practical but I have to wonder how much it and R have
diverged all these years and whether it would be something you can import
and continue running with.

I downloaded the files and they seem to be in a folder with 8 files that
have no suffixes with sizes ranging from 1KB to 469  KB so nothing gigantic.
The big files is simdata and two smaller files called simmax and simmean may
well be related  but I have to wonder if the way to view them is at the
folder level as a grouped entity or the individual file level.

I wonder if there is a trial version of the software you could get from
whoever sells it, like TIBCO.

Good luck with that. 

-Original Message-
From: R-help  On Behalf Of Jan van der Laan
Sent: Tuesday, January 17, 2023 9:09 AM
To: r-help@r-project.org
Subject: Re: [R] foreign package: unable to read S-Plus objects

You could try to see what stattransfer can make of it. They have a free
version that imports only part of the data. You could use that to see if
stattransfer would help and perhaps discover what format it is in.

HTH
Jsn


On 16-01-2023 23:22, Joseph Voelkel wrote:
> Dear foreign maintainers and others,
> 
> I am trying to import a number of S-Plus objects into R. The only way I
see how to do this is by using the foreign package.
> 
> However, when I try to do this I receive an error message. A snippet of
code and the error message follows:
> 
> read.S(file.path(Spath, "nrand"))
> Error in read.S(file.path(Spath, "nrand")) : not an S object
> 
> I no longer know the version of S-Plus in which these objects were
created. I do know that I have printed documentation, dated July 2001, from
S-Plus 6; and that all S-Plus objects were created in the 9/2004 -- 5/2005
range.
> 
> I am afraid that I simply have S-Plus objects that are not the S version 3
files that the foreign package can read, yes? But I am still hoping that it
may be possible to read these in.
> 
> I am not attaching some sample S-Plus objects to this email, because I  
> believe they will be stripped away as binary files. However, a sample 
> of these files may be found at
> 
> https://drive.google.com/drive/folders/1wFVa972ciP44Ob2YVWfqk8SGIodzAX
> Pv?usp=sharing  (simdat is the largest file, at 469 KB)
> 
> Thank you for any assistance you may provide.
> 
> R 4.2.2
> Microsoft Windows [Version 10.0.22000.1455]
> foreign_0.8-83
> 
> 
> Joe Voelkel
> Professor Emeritus
> RIT
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] return value of {....}

2023-01-15 Thread avi.e.gross
Again, John, we are comparing different designs in languages that are often
decades old and partially retrofitted selectively over the years.

Is it poor form to use global variables? Many think so. Discussions have
been had on how to use variables hidden in various ways that are not global,
such as within a package.

But note R still has global assignment operators like <<- and its partner
->> that explicitly can even create a global variable that did not exist
before the function began and that persists for that session. This is
perhaps a special case of the assign() function which can do the same for
any designated environment.

Although it may sometimes be better form to avoid things like this, it can
also be worse form when you want to do something decentralized with no
control over passing additional arguments to all kinds of functions.  

Some languages try to finesse their way past this by creating concepts like
closures that hold values and can even change the values without them being
globally visible. Some may use singleton objects or variables that are part
of a class rather than a single object (which is again a singleton.)

So is the way R allows really a bad thing, especially if rarely used?

All I know is MANY languages use scoping including functions that declare a
variable then create an inner function or many and return the inner
function(s) to the outside where the one getting it can later use that
function and access the variable and even use it as a way to communicate
with the multiple functions it got that were created in that incubator.
Nifty stuff but arguably not always as easy to comprehend!

This forum is not intended for BASHING any language, certainly not R. There
are many other languages to choose from and every one of them will have some
things others consider flaws. How many opted out of say a ++ operator as in
y = x++ for fairly good reasons and later added something like the Walrus
operator so you can now write y = (x := x + 1) as a way to do the same thing
and other things besides?

But to address your point, about a variable outside a function as defined in
a set of environments to search that includes a global context, I want to
note that it is just a set of maskings and your variable "x" can appear in
EVERY environment above you and you can get in real trouble if the order the
environments are put in place changes in some way. The arguably safer way
would be to get a specific value of x would be to not ask for it directly
but as get("x", envir=...) and specify the specific environment that ideally
is in existence at that time. Other arguments to get() let you specify a few
more things such as whether to search other places or supply a default.

Is it then poor technique to re-use the same name "x" in the same code for
some independent use? Probably, albeit if the new value plays a similar
role, just in another stretch of code, maybe not. I would comment it
carefully, and spell that out.

S first came out and before I even decided to become a Computer Scientist in
the mid to late  70's  and evolved multiple times. I first noticed it at
Bell Labs in the 80's. To a certain extent, R started as heavily influenced
by S and many programs could run on both. It too has changed over about
three decades. What kind of perfection can anyone expect over more recent
languages carefully produced with little concern about backward
compatibility?

And it remains very useful not necessarily based on the original language or
even the evolving core, but because of changes that maintained compatibility
as well as so many packages that met needs. Making changes, even if
"improvements" is likely to break all kinds of code unless it is something
like the new pipe operator that simply uses aspects that nobody would have
used before such as the new "|>" operator. 

The answer to too many questions about R remains BECAUSE that is how it was
done and whether you like it or not, may not change much any time soon. That
is why so many people like packages such as in the tidyverse because they
manage to make some changes, for better and often for verse.



-Original Message-
From: R-help  On Behalf Of Sorkin, John
Sent: Sunday, January 15, 2023 8:08 PM
To: Richard O'Keefe ; Valentin Petzel 
Cc: R help Mailing list 
Subject: Re: [R] return value of {}

Richard,
I sent my prior email too quickly:

A slight addition to your code shows an important aspect of R, local vs.
global variables:

x <- 137
f <- function () {
   a <- x
   x <- 42
   b <- x
   list(a=a, b=b)
   }
 f()
print(x)

When run the program produces the following:

> x <- 137
> f <- function () {
+a <- x
+x <- 42
+b <- x
+list(a=a, b=b)
+}
>  f()
$a
[1] 137

$b
[1] 42

> print(x)
[1] 137

The fist x, a <- x, invokes an x variable that is GLOBAL. It is known both
inside and outside the function.
The second x, x <- 42, defines an x that is LOCAL to the function, it is not

Re: [R] return value of {....}

2023-01-15 Thread avi.e.gross
Richard,

I appreciate your observations. As regularly noted, there are many possible
forks in the road to designing a language and it seems someone is determined
to try every possible fork.

Yes, some languages that are compiled, or like JavaScript, read the entire
function before executing it and promote some parts written further down to
the top so all variable declarations, as one example, are done before
executing any remaining code. If you look at R, if you decide to use a
library, you can ask for it near the top or just before you want to use it.
I think they get executed as needed so if you only load it in one branch of
an IF statement, ...

We can debate good and bad subjective choices in language design BUT for
practical purposes, what matters is that a feature IS a certain way and you
use it consistently with what is documented, not what you wish it was like.
R has an interpreter that arguably may be simple and keep reading small
sections of code and executing them and then getting more. Much of the code
is not even evaluated till much later or never and thus it may not be
trivial to do any kind of look-ahead and adjustment.

Many languages now look a bit silly after some changes are made or things
added that make the earlier way look clumsy. Some features may even be
deprecated and eventually removed, or the language forks again and people
argue that everyone should upgrade to the new version 12.x and so on.

Do note R lets you use rm(x) and also supports multiple environments
including new dynamic ones  so your example might have a region where an x
is used in quite a few ways such as asking some function call to be
evaluated in a specific environment. This flexibility would be harder if you
asked the interpreter to do things like some other languages that may not
support much. However, lots of languages with scoping rules will indeed
allow you to use variables in outer scopes or hidden scopes and so on. To
each their own.

-Original Message-
From: R-help  On Behalf Of Richard O'Keefe
Sent: Sunday, January 15, 2023 6:39 PM
To: Valentin Petzel 
Cc: R help Mailing list 
Subject: Re: [R] return value of {}

I wonder if the real confusino is not R's scope rules?
(begin .) is not Lisp, it's Scheme (a major Lisp dialect), and in Scheme,
(begin (define x ...) (define y ...) ...) declares variables x and y that
are local to the (begin ...) form, just like Algol 68.  That's weirdness 1.
Javascript had a similar weirdness, when the ECMAscript process eventually
addressed.  But the real weirdness in R is not just that the existence of
variables is indifferent to the presence of curly braces, it's that it's
*dynamic*.  In f <- function (...) {
   ... use x ...
   x <- ...
   ... use x ...
}
the two occurrences of "use x" refer to DIFFERENT variables.
The first occurrence refers to the x that exists outside the function.  It
has to: the local variable does not exist yet.
The assignment *creates* the variable, so the second occurrence of "use x"
refers to the inner variable.
Here's an actual example.
> x <- 137
> f <- function () {
+ a <- x
+ x <- 42
+ b <- x
+ list(a=a, b=b)
+ }
> f()
$a
[1] 137
$b
[1] 42

Many years ago I set out to write a compiler for R, and this was the issue
that finally sank my attempt.  It's not whether the occurrence of "use x" is
*lexically* before the creation of x.
It's when the assignment is *executed* that makes the difference.
Different paths of execution through a function may result in it arriving at
its return point with different sets of local variables.
R is the only language I routinely use that does this.

So rule 1: whether an identifier in an R function refers to an outer
variable or a local variable depends on whether an assignment creating that
local variable has been executed yet.
And rule 2: the scope of a local variable is the whole function.

If the following transcript not only makes sense to you, but is exactly what
you expect, congratulations, you understand local variables in R.

> x <- 0
> g <- function () {
+ n <- 10
+ r <- numeric(n)
+ for (i in 1:n) {
+ if (i == 6) x <- 100
+ r[i] <- x + i
+ }
+ r
+ }
> g()
 [1]   1   2   3   4   5 106 107 108 109 110


On Fri, 13 Jan 2023 at 23:28, Valentin Petzel  wrote:

> Hello Akshay,
>
> R is quite inspired by LISP, where this is a common thing. It is not 
> in fact that {...} returned something, rather any expression 
> evalulates to some value, and for a compound statement that is the 
> last evaluated expression.
>
> {...} might be seen as similar to LISPs (begin ...).
>
> Now this is a very different thing compared to {...} in something like 
> C, even if it looks or behaves similarly. But in R {...} is in fact an 
> expression and thus has evaluate to some value. This also comes with 
> some nice benefits.
>
> You do not need to use {...} for anything that is a single statement. 
> But you can in each possible place use {...} to turn multiple 
> 

Re: [R] Removing variables from data frame with a wile card

2023-01-15 Thread avi.e.gross
John,

As you said, you are new to the discussion so let me catch you up.

The original question was about removing many columns that shared a similar 
feature in the naming convention while leaving other columns in-place. Quite a 
few replies were given on how to do that including how to use a regular 
expression to gather the column names you want to remove.

It was only afterwards that the topic changed a bit to mention that some people 
used additional ways both in base R and also using packages like dplyr in the 
tidyverse.

As a general rule, most packages out there provide functionality that can be 
done in base R if you wish, and some are written purely in R while some augment 
that with parts re-done in C or something. If a package is well built and 
frequently used, it may well make your life as a programmer easier as the code 
need not be re-invented and debugged. Of course some packages are of poorer 
quality.

So we fully agree that unless asked for, the base R answers should be the focus 
HERE. Then again, languages are not static and sometimes we see things like 
pipes moved in a modified version into the main language.

Avi

-Original Message-
From: Sorkin, John  
Sent: Sunday, January 15, 2023 11:55 AM
To: Valentin Petzel ; avi.e.gr...@gmail.com
Cc: 'R-help Mailing List' 
Subject: Re: [R] Removing variables from data frame with a wile card

I am new to this thread. At the risk of presenting something that has been 
shown before, below I demonstrate how a column in a data frame can be dropped 
using a wild card, i.e. a column whose name starts with "th" using nothing more 
than base r functions and base R syntax. While additions to R such as tidyverse 
can be very helpful, many things that they do can be accomplished simply using 
base R.  

# Create data frame with three columns
one <- rep(1,10)
one
two <- rep(2,10)
two
three <- rep(3,10)
three
mydata <- data.frame(one=one, two=two, three=three) cat("Data frame with three 
columns\n") mydata

# Drop the column whose name starts with th, i.e. column three # Find the 
location of the column ColumToDelete <- grep("th",colnames((mydata))) cat("The 
colomumn to be dropped is the column called three, which is 
column",ColumToDelete,"\n") ColumToDelete

# Drop the column whose name starts with "th"
newdata2 <- mydata[,-ColumnToDelete]
cat("Data frame after droping column whose name is three\n")
newdata2

I hope this helps.
John



From: R-help  on behalf of Valentin Petzel 

Sent: Saturday, January 14, 2023 1:21 PM
To: avi.e.gr...@gmail.com
Cc: 'R-help Mailing List'
Subject: Re: [R] Removing variables from data frame with a wile card

Hello Avi,

while something like d$something <- ... may seem like you're directly modifying 
the data it does not actually do so. Most R objects try to be immutable, that 
is, the object may not change after creation. This guarantees that if you have 
a binding for same object the object won't change sneakily.

There is a data structure that is in fact mutable which are environments. For 
example compare

L <- list()
local({L$a <- 3})
L$a

with

E <- new.env()
local({E$a <- 3})
E$a

The latter will in fact work, as the same Environment is modified, while in the 
first one a modified copy of the list is made.

Under the hood we have a parser trick: If R sees something like

f(a) <- ...

it will look for a function f<- and call

a <- f<-(a, ...)

(this also happens for example when you do names(x) <- ...)

So in fact in our case this is equivalent to creating a copy with removed 
columns and rebind the symbol in the current environment to the result.

The data.table package breaks with this convention and uses C based routines 
that allow changing of data without copying the object. Doing

d[, (cols_to_remove) := NULL]

will actually change the data.

Regards,
Valentin

14.01.2023 18:28:33 avi.e.gr...@gmail.com:

> Steven,
>
> Just want to add a few things to what people wrote.
>
> In base R, the methods mentioned will let you make a copy of your original DF 
> that is missing the items you are selecting that match your pattern.
>
> That is fine.
>
> For some purposes, you want to keep the original data.frame and remove a 
> column within it. You can do that in several ways but the simplest is 
> something where you sat the column to NULL as in:
>
> mydata$NAME <- NULL
>
> using the mydata["NAME"] notation can do that for you by using a loop of 
> unctional programming method that does that with all components of your grep.
>
> R does have optimizations that make this less useful as a partial copy of a 
> data.frame retains common parts till things change.
>
> For those who like to use the tidyverse, it comes with lots of tools that let 
> you select columns that start with or end with or contain some pattern and I 
> find that way easier.
>
>
>
> -Original Message-
> From: R-help  On Behalf Of Steven Yen
> Sent: Saturday, January 14, 2023 7:49 AM
> To: Andrew 

Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread avi.e.gross
John,

 

I am very familiar with the evolving tidyverse and some messages a while back 
included people who wanted this forum to mainly stick to base R, so I leave out 
examples.

 

Indeed, the tidyverse is designed to make it easy to select columns with all 
kinds of conditions including using regular expressions that allow more 
precision (as does grep) so you want to match “yr” followed by exactly one or 
two digits. Some of the answers suggest starting with “yr” was enough. They 
also allow selecting on arbitrary considerations like whether the column 
contains numeric data. You can do most things in base R, albeit I find the 
tidyverse method easier most of the time and also able to do some extremely 
complicated things with some care, such as creating multiple new columns form a 
set of columns that each implement a different function like mean, and mode and 
standard deviation and make the new columns the same names as the one they are 
derived from but a different suffix reflecting what transformation was done.

 

One nice feature is the ideas behind how data streams through multiple steps 
with one or a few transformations in each step, and the intermediate parts you 
do not want, simply melt away. The part about selecting or deselecting columns 
can often be used in many of the verbs.

 

From: John Kane  
Sent: Saturday, January 14, 2023 4:07 PM
To: avi.e.gr...@gmail.com
Cc: R-help Mailing List 
Subject: Re: [R] Removing variables from data frame with a wile card

 

You rang sir?

 

library(tidyverse)
xx = 1:10 
yr1 = yr2 = yr3 = rnorm(10)
dat1 <- data.frame(xx , yr1, yr2, y3)

 

dat1  %>%  select(!starts_with("yr"))

 

or for something a bit more exotic as I have been trying to learn a bit about 
the "data.table package

 

library(data.table)

xx = 1:10 
yr1 = yr2 = yr3 = rnorm(10)

dat2 <- data.table(xx , yr1, yr2, yr3)

dat2[, !names(dat2) %like% "yr", with=FALSE ]
 

 

 

On Sat, 14 Jan 2023 at 12:28, mailto:avi.e.gr...@gmail.com> > wrote:

Steven,

Just want to add a few things to what people wrote.

In base R, the methods mentioned will let you make a copy of your original DF 
that is missing the items you are selecting that match your pattern.

That is fine.

For some purposes, you want to keep the original data.frame and remove a column 
within it. You can do that in several ways but the simplest is something where 
you sat the column to NULL as in:

mydata$NAME <- NULL

using the mydata["NAME"] notation can do that for you by using a loop of 
unctional programming method that does that with all components of your grep.

R does have optimizations that make this less useful as a partial copy of a 
data.frame retains common parts till things change.

For those who like to use the tidyverse, it comes with lots of tools that let 
you select columns that start with or end with or contain some pattern and I 
find that way easier.



-Original Message-
From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of Steven Yen
Sent: Saturday, January 14, 2023 7:49 AM
To: Andrew Simmons mailto:akwsi...@gmail.com> >
Cc: R-help Mailing List mailto:r-help@r-project.org> >
Subject: Re: [R] Removing variables from data frame with a wile card

Thanks to all. Very helpful.

Steven from iPhone

> On Jan 14, 2023, at 3:08 PM, Andrew Simmons   > wrote:
> 
> You'll want to use grep() or grepl(). By default, grep() uses 
> extended regular expressions to find matches, but you can also use 
> perl regular expressions and globbing (after converting to a regular 
> expression).
> For example:
> 
> grepl("^yr", colnames(mydata))
> 
> will tell you which 'colnames' start with "yr". If you'd rather you 
> use globbing:
> 
> grepl(glob2rx("yr*"), colnames(mydata))
> 
> Then you might write something like this to remove the columns starting with 
> yr:
> 
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> 
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen >  > wrote:
>> 
>> I have a data frame containing variables "yr3",...,"yr28".
>> 
>> How do I remove them with a wild cardsomething similar to "del yr*"
>> in Windows/doc? Thank you.
>> 
>>> colnames(mydata)
>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>   [6] ...
>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>  [66] "yr28"...
>> 
>> __
>> R-help@r-project.org   mailing list -- To 
>> UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.


Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread avi.e.gross
Valentin,

You are correct that R does many things largely behind the scenes that make 
some operations fairly efficient.

>From a programming point of view, though, many people might make a data.frame 
>and not think of it as a list of vectors of the same length that are kept that 
>way.

So if they made a copy of the original data with fewer columns, they might be 
tempted to think the original item was completely copied and the original is 
either around or if the identifier was re-used, will be garbage collected. As 
you note, the only thinks collected are the columns you chose not to include.

For some it seems cleaner to set a list item to NULL, which seems to remove it 
immediately. 

The real point I hoped to make is that using base R, you can indeed approach 
removing (multiple) columns in two logical ways. One is to seemingly remove 
them in the original object, even if your point is valid. The other is to make 
a copy of just what you want and ignore the rest and it may be kept around or 
not.

If someone really wanted to get down to the basics, they could get a reference 
to all the columns they want to keep, as in col1 <- mydata[["col1"] ] and use 
those to make a new data.frame, or many other variants on these methods.  

Many programming languages have some qualms (I mean designers and programmers, 
and just plain purists) about when "pointers" of sorts are used and whether 
things should be mutable and so on so I prefer to avoid religious wars.

-Original Message-
From: Valentin Petzel  
Sent: Saturday, January 14, 2023 1:21 PM
To: avi.e.gr...@gmail.com
Cc: 'R-help Mailing List' 
Subject: Re: [R] Removing variables from data frame with a wile card

Hello Avi,

while something like d$something <- ... may seem like you're directly modifying 
the data it does not actually do so. Most R objects try to be immutable, that 
is, the object may not change after creation. This guarantees that if you have 
a binding for same object the object won't change sneakily.

There is a data structure that is in fact mutable which are environments. For 
example compare

L <- list()
local({L$a <- 3})
L$a

with

E <- new.env()
local({E$a <- 3})
E$a

The latter will in fact work, as the same Environment is modified, while in the 
first one a modified copy of the list is made.

Under the hood we have a parser trick: If R sees something like

f(a) <- ...

it will look for a function f<- and call

a <- f<-(a, ...)

(this also happens for example when you do names(x) <- ...)

So in fact in our case this is equivalent to creating a copy with removed 
columns and rebind the symbol in the current environment to the result.

The data.table package breaks with this convention and uses C based routines 
that allow changing of data without copying the object. Doing

d[, (cols_to_remove) := NULL]

will actually change the data.

Regards,
Valentin

14.01.2023 18:28:33 avi.e.gr...@gmail.com:

> Steven,
> 
> Just want to add a few things to what people wrote.
> 
> In base R, the methods mentioned will let you make a copy of your original DF 
> that is missing the items you are selecting that match your pattern.
> 
> That is fine.
> 
> For some purposes, you want to keep the original data.frame and remove a 
> column within it. You can do that in several ways but the simplest is 
> something where you sat the column to NULL as in:
> 
> mydata$NAME <- NULL
> 
> using the mydata["NAME"] notation can do that for you by using a loop of 
> unctional programming method that does that with all components of your grep.
> 
> R does have optimizations that make this less useful as a partial copy of a 
> data.frame retains common parts till things change.
> 
> For those who like to use the tidyverse, it comes with lots of tools that let 
> you select columns that start with or end with or contain some pattern and I 
> find that way easier.
> 
> 
> 
> -Original Message-
> From: R-help  On Behalf Of Steven Yen
> Sent: Saturday, January 14, 2023 7:49 AM
> To: Andrew Simmons 
> Cc: R-help Mailing List 
> Subject: Re: [R] Removing variables from data frame with a wile card
> 
> Thanks to all. Very helpful.
> 
> Steven from iPhone
> 
>> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
>> 
>> You'll want to use grep() or grepl(). By default, grep() uses 
>> extended regular expressions to find matches, but you can also use 
>> perl regular expressions and globbing (after converting to a regular 
>> expression).
>> For example:
>> 
>> grepl("^yr", colnames(mydata))
>> 
>> will tell you which 'colnames' start with "yr". If you'd rather you 
>> use globbing:
>> 
>> grepl(glob2rx("yr*"), colnames(mydata))
>> 
>> Then you might write something like this to remove the columns starting with 
>> yr:
>> 
>> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
>> 
>>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>>> 
>>> I have a data frame containing variables "yr3",...,"yr28".
>>> 
>>> How do I remove them with a 

Re: [R] Removing variables from data frame with a wile card

2023-01-14 Thread avi.e.gross
Steven,

Just want to add a few things to what people wrote.

In base R, the methods mentioned will let you make a copy of your original DF 
that is missing the items you are selecting that match your pattern.

That is fine.

For some purposes, you want to keep the original data.frame and remove a column 
within it. You can do that in several ways but the simplest is something where 
you sat the column to NULL as in:

mydata$NAME <- NULL

using the mydata["NAME"] notation can do that for you by using a loop of 
unctional programming method that does that with all components of your grep.

R does have optimizations that make this less useful as a partial copy of a 
data.frame retains common parts till things change.

For those who like to use the tidyverse, it comes with lots of tools that let 
you select columns that start with or end with or contain some pattern and I 
find that way easier.



-Original Message-
From: R-help  On Behalf Of Steven Yen
Sent: Saturday, January 14, 2023 7:49 AM
To: Andrew Simmons 
Cc: R-help Mailing List 
Subject: Re: [R] Removing variables from data frame with a wile card

Thanks to all. Very helpful.

Steven from iPhone

> On Jan 14, 2023, at 3:08 PM, Andrew Simmons  wrote:
> 
> You'll want to use grep() or grepl(). By default, grep() uses 
> extended regular expressions to find matches, but you can also use 
> perl regular expressions and globbing (after converting to a regular 
> expression).
> For example:
> 
> grepl("^yr", colnames(mydata))
> 
> will tell you which 'colnames' start with "yr". If you'd rather you 
> use globbing:
> 
> grepl(glob2rx("yr*"), colnames(mydata))
> 
> Then you might write something like this to remove the columns starting with 
> yr:
> 
> mydata <- mydata[, !grepl("^yr", colnames(mydata)), drop = FALSE]
> 
>> On Sat, Jan 14, 2023 at 1:56 AM Steven T. Yen  wrote:
>> 
>> I have a data frame containing variables "yr3",...,"yr28".
>> 
>> How do I remove them with a wild cardsomething similar to "del yr*"
>> in Windows/doc? Thank you.
>> 
>>> colnames(mydata)
>>   [1] "year"   "weight" "confeduc"   "confothr" "college"
>>   [6] ...
>>  [41] "yr3""yr4""yr5""yr6" "yr7"
>>  [46] "yr8""yr9""yr10"   "yr11" "yr12"
>>  [51] "yr13"   "yr14"   "yr15"   "yr16" "yr17"
>>  [56] "yr18"   "yr19"   "yr20"   "yr21" "yr22"
>>  [61] "yr23"   "yr24"   "yr25"   "yr26" "yr27"
>>  [66] "yr28"...
>> 
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] return value of {....}

2023-01-10 Thread avi.e.gross
Fair enough, Akshay. Wondering why a design was chosen is reasonable.

 

There are languages like python that allow unpacking multiple values and it
is not uncommon to return multiple things from some constructs as in this:

 

>>> a,b,c = { 4, 5, 6 }

>>> a

4

>>> b

5

>>> c

6

 

But that is a bit of an illusion as the thing in curly braces is a very
different creature called a set with three items.  Python does not need or
use curly braces for the purposes used in R as indentation levels group
things.

 

It is different in design. I do not know what other computer languages you
have used, but R is NOT them and must be approached for what it is, and it
is a fairly simple paradigm for the last item in a sequence to be the value
of the sequence. You can return multiple things, if you wish, in other ways.
Such as making a list:

 

vals <- list(1+2, sin(.7), cos(.7))

 

> vals[2:3]

[[1]]

[1] 0.6442177

 

[[2]]

[1] 0.7648422

 

You get the idea. R returns ONE thing and it ca have parts, named or not.

 

 

From: akshay kulkarni  
Sent: Tuesday, January 10, 2023 9:33 AM
To: avi.e.gr...@gmail.com
Cc: 'R help Mailing list' 
Subject: Re: [R] return value of {}

 

Dear Avi,

 Thanks for your reply...your exhortations are indeed
justified...! But one caveat: I was not complaining about anything...just
was curious of the rationale of a particular designThanks again...

 

Thanking you,

Yours sincerely,

AKSHAY M KULKARNI   

  _  

From: R-help mailto:r-help-boun...@r-project.org> > on behalf of avi.e.gr...@gmail.com
  mailto:avi.e.gr...@gmail.com> >
Sent: Tuesday, January 10, 2023 4:39 AM
Cc: 'R help Mailing list' mailto:r-help@r-project.org> >
Subject: Re: [R] return value of {} 

 

Akshay,

Your question seems a tad mysterious to me as you are complaining about
NOTHING.

R was designed to return single values. The last statement executed in a
function body, for example, is the value returned even when not at the end.

Scoping is another issue entirely. What is visible is another discussion.

So, yes, if you can see ALL the variables, you might see the last one BUT
there often is no variable at the end. There is an expression that evaluates
to a value with no NAME attached. You cannot reference that unless the block
in curly braces returns that value.

You can design your own language any way you want. The people who designed R
did it this way. Mind you, the most common use of curly braces is probably
in function bodies, or if/else blocks and loops, not quite what you are
looking at and complaining about. The design is what it is.

Others require things like an explicit return() statement. R chose not to.

And if the value is redundant for you, who cares?

Did you know that when running a program in the interpreter the last value
is stored in a variable like this:

> x <- 6
> .Last.value
[1] 6

Why would that duplicate be needed or useful?

Consider a partial calculation you want to reuse in another context:

> y = x*2 + 2*x -3
> z <- .Last.value/2
> z
[1] 10.5

Yes, you could have used "y" ...




-Original Message-
From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of akshay kulkarni
Sent: Monday, January 9, 2023 12:06 PM
To: Valentin Petzel mailto:valen...@petzel.at> >
Cc: R help Mailing list mailto:r-help@r-project.org>
>
Subject: Re: [R] return value of {}

Dear Valentin,
  But why should {} "return" a value? It could
just as well evaluate all the expressions and store the resulting objects in
whatever environment the interpreter chooses, and then it would be left to
the user to manipulate any object he chooses. Don't you think returning the
last, or any value, is redundant? We are living in the 21st century world,
and the R-core team might,I suppose, have a definite reason for"returning"
the last value. Any comments?

Thanking you,
Yours sincerely,
AKSHAY M KULKARNI


From: Valentin Petzel mailto:valen...@petzel.at> >
Sent: Monday, January 9, 2023 9:18 PM
To: akshay kulkarni mailto:akshay...@hotmail.com> >
Cc: R help Mailing list mailto:r-help@r-project.org>
>
Subject: Re: [R] return value of {}

Hello Akshai,

I think you are confusing {...} with local({...}). This one will evaluate
the expression in a separate environment, returning the last expression.

{...} simply evaluates multiple expressions as one and returns the result of
the last line, but it still evaluates each expression.

Assignment returns the assigned value, so we can chain assignments like this

a <- 1 + (b <- 2)

conveniently.

So when is {...} useful? Well, anyplace where you want to execute complex
stuff in a function argument. E.g. you might do:

data %>% group_by(x) %>% summarise(y = {if(x[1] > 10) sum(y) else mean(y)})

Regards,
Valentin Petzel

09.01.2023 15:47:53 akshay kulkarni mailto:akshay...@hotmail.com> >:

> Dear members,
>  I have the 

Re: [R] return value of {....}

2023-01-09 Thread avi.e.gross
Akshay,

Your question seems a tad mysterious to me as you are complaining about
NOTHING.

R was designed to return single values. The last statement executed in a
function body, for example, is the value returned even when not at the end.

Scoping is another issue entirely. What is visible is another discussion.

So, yes, if you can see ALL the variables, you might see the last one BUT
there often is no variable at the end. There is an expression that evaluates
to a value with no NAME attached. You cannot reference that unless the block
in curly braces returns that value.

You can design your own language any way you want. The people who designed R
did it this way. Mind you, the most common use of curly braces is probably
in function bodies, or if/else blocks and loops, not quite what you are
looking at and complaining about. The design is what it is.

Others require things like an explicit return() statement. R chose not to.

And if the value is redundant for you, who cares?

Did you know that when running a program in the interpreter the last value
is stored in a variable like this:

> x <- 6
> .Last.value
[1] 6

Why would that duplicate be needed or useful?

Consider a partial calculation you want to reuse in another context:

> y = x*2 + 2*x -3
> z <- .Last.value/2
> z
[1] 10.5

Yes, you could have used "y" ...




-Original Message-
From: R-help  On Behalf Of akshay kulkarni
Sent: Monday, January 9, 2023 12:06 PM
To: Valentin Petzel 
Cc: R help Mailing list 
Subject: Re: [R] return value of {}

Dear Valentin,
  But why should {} "return" a value? It could
just as well evaluate all the expressions and store the resulting objects in
whatever environment the interpreter chooses, and then it would be left to
the user to manipulate any object he chooses. Don't you think returning the
last, or any value, is redundant? We are living in the 21st century world,
and the R-core team might,I suppose, have a definite reason for"returning"
the last value. Any comments?

Thanking you,
Yours sincerely,
AKSHAY M KULKARNI


From: Valentin Petzel 
Sent: Monday, January 9, 2023 9:18 PM
To: akshay kulkarni 
Cc: R help Mailing list 
Subject: Re: [R] return value of {}

Hello Akshai,

I think you are confusing {...} with local({...}). This one will evaluate
the expression in a separate environment, returning the last expression.

{...} simply evaluates multiple expressions as one and returns the result of
the last line, but it still evaluates each expression.

Assignment returns the assigned value, so we can chain assignments like this

a <- 1 + (b <- 2)

conveniently.

So when is {...} useful? Well, anyplace where you want to execute complex
stuff in a function argument. E.g. you might do:

data %>% group_by(x) %>% summarise(y = {if(x[1] > 10) sum(y) else mean(y)})

Regards,
Valentin Petzel

09.01.2023 15:47:53 akshay kulkarni :

> Dear members,
>  I have the following code:
>
>> TB <- {x <- 3;y <- 5}
>> TB
> [1] 5
>
> It is consistent with the documentation: For {, the result of the last
expression evaluated. This has the visibility of the last evaluation.
>
> But both x AND y are created, but the "return value" is y. How can this be
advantageous for solving practical problems? Specifically, consider the
following code:
>
> F <- function(X) {  expr; expr2; { expr5; expr7}; expr8;expr10}
>
> Both expr5 and expr7 are created, and are accessible by the code 
> outside of the nested braces right? But the "return value" of the 
> nested braces is expr7. So doesn't this mean that only expr7 should be 
> accessible? Please help me entangle this (of course the return value 
> of F is expr10, and all the other objects created by the preceding 
> expressions are deleted. But expr5 is not, after the control passes 
> outside of the nested braces!)
>
> Thanking you,
> Yours sincerely,
> AKSHAY M KULKARNI
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pipe operator

2023-01-04 Thread avi.e.gross
Yes, not every use of a word has the same meaning. The UNIX pipe was in many
ways a very different animal where the PIPE was a very real thing and looked
like a sort of temporary file in the file system with special properties.
Basically it was a fixed-size buffer that effectively was written into by a
process that was paused when the buffer was getting full and allowed to
continue when it was drained by a second process reading from it that also
was similarly managed. It assured many things that a temporary file would
not have supplied including uniqueness and privacy. Later they created a
related animal with persistence called a NAMED PIPE.

So the pipelines we are discussing in R do indeed run very synchronously in
whatever order they need to be run so one finishes producing an output into
an anonymous variable that can then be handed to the next function in the
pipeline. 

If you look at a language like Python or perhaps JavaScript, there are ways
to simulate a relatively asynchronous way to run functions on whatever data
is available from other functions, using ideas like generators and iterators
and more. You can create functions that call other functions just to get one
item such as the next prime number, and do some work and call it again so it
yields just one more value and so on. You can build these in chains so that
lots of functions stay resident in memory and only keep producing data just
in time as needed and perhaps even running on multiple processors in even
more parallelism.

R can possibly add such things and it has elements with things not being
evaluated till needed that can have interesting results and of course it is
possible to spawn additional processes, as with many languages, that are
linked together to run at once, but all such speculation is beyond the
bounds of what operators we call PIPES, such as %>% and |> are doing. It
remains very much syntactic sugar that makes life easier for some and annoys
others.

I note some code I see has people hedging their bets a bit about the missing
first argument. They harmlessly keep the first argument and call it a period
as in:
mutate(mydata, ...) %>%
filter( ., ...) %>%
group_by( ., ...) %>%
summarize( ., ...)


In the above, "..." means fill it in and not an alternate meaning, and the
point is the first argument is a period which is replaced by the
passed-along object but that would have been done without it by default. It
remains a reminder that there still is that first argument and I guess it
could be helpful in some ways too and avoids some potential confusion if
others read your code and look up a man page and understand what the second
and subsequent arguments match up to.


-Original Message-
From: R-help  On Behalf Of Richard O'Keefe
Sent: Wednesday, January 4, 2023 1:56 AM
To: Milan Glacier 
Cc: R-help Mailing List 
Subject: Re: [R] Pipe operator

This is both true and misleading.
The shell pipe operation came from functional programming.  In fact the
shell pipe operation is NOT "flip apply", which is what |> is, but it is
functional composition.  That is out =
let out = command
cmd1 | cmd2 = \x.cmd2(cmd1(x)).

Pragmatically, the Unix shell pipe operator does something very important,
which |> (and even functional composition doesn't in F#):
 out *interleaves* the computation of cmd1 and cmd2,
streaming the data.  But in R, x |> f() |> g() is by definition g(f(x)), and
if g needs the value of its argument, the *whole* of f(x) is evaluated
before g resumes.  This is much closer to what the pipe syntax in the MS-DOS
shell did, if I recall correctly.



On Wed, 4 Jan 2023 at 17:46, Milan Glacier  wrote:

> With 50 years of programming experience, just think about how useful 
> pipe operator is in shell scripting. The output of previous call 
> becomes the input of next call... Genious idea from our beloved unix 
> conversion...
>
>
> On 01/03/23 16:48, Sorkin, John wrote:
> >I am trying to understand the reason for existence of the pipe 
> >operator,
> %>%, and when one should use it. It is my understanding that the 
> operator sends the file to the left of the operator to the function 
> immediately to the right of the operator:
> >
> >c(1:10) %>% mean results in a value of 5.5 which is exactly the same 
> >as
> the result one obtains using the mean function directly, viz.
> mean(c(1:10)). What is the reason for having two syntactically 
> different but semantically identical ways to call a function? Is one 
> more efficient than the other? Does one use less memory than the other?
> >
> >P.S. Please forgive what might seem to be a question with an obvious
> answer. I am a programmer dinosaur. I have been programming for more 
> than
> 50 years. When I started programming in the 1960s the only pipe one 
> spoke about was a bong.
> >
> >John
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> 

Re: [R] Pipe operator

2023-01-03 Thread avi.e.gross
Boris,

There are MANY variations possible and yours does not seem that common or
useful albeit perfectly useful.

I am not talking about making it a one-liner, albeit I find the multi-line
version more useful.

The pipeline concept seems sort of atomic in the following sense. R allows
several in-line variants of assignment besides something like:

Assign("string", value)

And, variations on the above that are more useful when making multiple
assignments in a loop or using other environments.

What is more common is:

Name <- Expression

And of course occasionally:

Expression -> Name

So back to pipelines, you have two perfectly valid ways to do a pipeline and
assign the result. I showed a version like:

Name <-
Variable |>
Pipeline.item(...) |>
... |>
Pipeline.item(...)


But you can equally well assign it at the end:

Variable |>
Pipeline.item(...) |>
... |>
Pipeline.item(...) -> Name


I think a more valid use of assign is in mid-pipeline as one way to save an
intermediate result in a variable or perhaps in another environment, such as
may be useful when debugging:

Name <-
Variable |>
Pipeline.item(...) |>
assign("temp1", _) |>
... |>
Pipeline.item(...)

This works because assign(), like print() also returns a copy of the
argument that can be passed along the pipeline and thus captured for a side
effect. When done debugging, removing some lines makes it continue working
seamlessly.

BTW, your example does something I am not sure you intended:

  x |> cos() |> max(pi/4) |> round(3) |> assign("x", value = _)

I prefer showing it like this:

 x |> 
cos() |> 
max(pi/4) |> 
round(3) |> 
assign("x", value = _)

Did you notice you changed "x" by assigning a new value to the one you
started with? That is perfectly legal but may not have been intended.

And, yes, for completeness, there are two more assignment operators I
generally have no use for of <<- and ->> that work in a global sense.

And for even more completeness you can also use the operators above like
this:

> z = `<-`("x", 7)
> z
[1] 7
> x
[1] 7

For even more completeness, the example we are using can use the above
notation with a silly twist. Placing the results in z instead, I find the
new pipe INSISTS _ can only be used with a named argument. Duh, `<-` does
not have named arguments, just positional. So I see any valid name is just
ignored and the following works!

x |> cos() |> max(pi/4) |> round(3) |> `<-`("z", any.identifier = _)

And, frankly, many functions that need the pipe to feed a second or later
position can easily be changed to use the first argument. If you feel the
need to use "assign" make this function before using the pipeline:

assignyx <- function(x, y) assign(y, x)

Then your code can save a variable without an underscore and keyword:

x |> cos() |> max(pi/4) |> round(3) |> assignyx("x")

Or use the new lambda function somewhat designed for this case use which I
find a bit ugly but it is a matter of taste.

But to end this, there is no reason to make things complex in situations
like this. Just use a simple assignment pre or post as meets your needs.





-Original Message-
From: Boris Steipe  
Sent: Tuesday, January 3, 2023 2:01 PM
To: R-help Mailing List 
Cc: avi.e.gr...@gmail.com
Subject: Re: [R] Pipe operator

Working off Avi's example - would:

  x |> cos() |> max(pi/4) |> round(3) |> assign("x", value = _)

...be even more intuitive to read? Or are there hidden problems with that?



Cheers,
Boris


> On 2023-01-03, at 12:40, avi.e.gr...@gmail.com wrote:
> 
> John,
> 
> The topic has indeed been discussed here endlessly but new people 
> still stumble upon it.
> 
> Until recently, the formal R language did not have a built-in pipe 
> functionality. It was widely used through an assortment of packages 
> and there are quite a few variations on the theme including different 
> implementations.
> 
> Most existing code does use the operator %>% but there is now a 
> built-in |> operator that is generally faster but is not as easy to use in
a few cases.
> 
> Please forget the use of the word FILE here. Pipes are a form of 
> syntactic sugar that generally is about the FIRST argument to a 
> function. They are NOT meant to be used just for the trivial case you 
> mention where indeed there is an easy way to do things. Yes, they work 
> in such situations. But consider a deeply nested expression like this:
> 
> Result <- round(max(cos(x), 3.14159/4), 3)
> 
> There are MANY deeper nested expressions like this commonly used. The 
> above can be written linearly as in
> 
> Temp1 <- cos(x)
> Temp2 <- max(Temp1, 3.14159/4)
> Result <- round(Temp2, 3)
> 
> Translation, for some variable x, calculate the cosine and take the 
> maximum value of it as compared to pi/4 and round the result to three 
> decimal places. Not an uncommon kind of thing to do and sometimes you 
> can nest such things many layers deep and get hopelessly confused if 
> 

Re: [R] Pipe operator

2023-01-03 Thread avi.e.gross
Tim,

There are differences and this one can be huge.

The other pipe operators let you pass the current object to a later argument
instead of the first by using a period to represent where to put it. The new
one has a harder albeit flexible method by creating an anonymous function.

-Original Message-
From: R-help  On Behalf Of Ebert,Timothy Aaron
Sent: Tuesday, January 3, 2023 12:08 PM
To: Sorkin, John ; 'R-help Mailing List'

Subject: Re: [R] Pipe operator

The pipe shortens code and results in fewer variables because you do not
have to save intermediate steps. Once you get used to the idea it is useful.
Note that there is also the |> pipe that is part of base R. As far as I know
it does the same thing as %>%, or at my level of programing I have not
encountered a difference.

Tim

-Original Message-
From: R-help  On Behalf Of Sorkin, John
Sent: Tuesday, January 3, 2023 11:49 AM
To: 'R-help Mailing List' 
Subject: [R] Pipe operator

[External Email]

I am trying to understand the reason for existence of the pipe operator,
%>%, and when one should use it. It is my understanding that the operator
sends the file to the left of the operator to the function immediately to
the right of the operator:

c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the
result one obtains using the mean function directly, viz. mean(c(1:10)).
What is the reason for having two syntactically different but semantically
identical ways to call a function? Is one more efficient than the other?
Does one use less memory than the other?

P.S. Please forgive what might seem to be a question with an obvious answer.
I am a programmer dinosaur. I have been programming for more than 50 years.
When I started programming in the 1960s the only pipe one spoke about was a
bong.

John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat.ethz.
ch%2Fmailman%2Flistinfo%2Fr-help=05%7C01%7Ctebert%40ufl.edu%7C73edce5d4
e084253a39008daedaa653f%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C6380836
13362415015%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJB
TiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=fV9Ca3OAleDX%2BwuPJIONYStrA
daQhXTsq61jh2pLtDY%3D=0
PLEASE do read the posting guide
https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.r-proje
ct.org%2Fposting-guide.html=05%7C01%7Ctebert%40ufl.edu%7C73edce5d4e0842
53a39008daedaa653f%7C0d4da0f84a314d76ace60a62331e1b84%7C0%7C0%7C638083613362
415015%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6I
k1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=YUnV9kE1RcbB3BwM5gKwKwc3qNKhIVNF
txOxKmpbGrQ%3D=0
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Pipe operator

2023-01-03 Thread avi.e.gross
John,

The topic has indeed been discussed here endlessly but new people still
stumble upon it.

Until recently, the formal R language did not have a built-in pipe
functionality. It was widely used through an assortment of packages and
there are quite a few variations on the theme including different
implementations.

Most existing code does use the operator %>% but there is now a built-in |>
operator that is generally faster but is not as easy to use in a few cases.

Please forget the use of the word FILE here. Pipes are a form of syntactic
sugar that generally is about the FIRST argument to a function. They are NOT
meant to be used just for the trivial case you mention where indeed there is
an easy way to do things. Yes, they work in such situations. But consider a
deeply nested expression like this:

Result <- round(max(cos(x), 3.14159/4), 3)

There are MANY deeper nested expressions like this commonly used. The above
can be written linearly as in

Temp1 <- cos(x)
Temp2 <- max(Temp1, 3.14159/4)
Result <- round(Temp2, 3)

Translation, for some variable x, calculate the cosine and take the maximum
value of it as compared to pi/4 and round the result to three decimal
places. Not an uncommon kind of thing to do and sometimes you can nest such
things many layers deep and get hopelessly confused if not done somewhat
linearly.

What pipes allow is to write this closer to the second way while not seeing
or keeping any temporary variables around. The goal is to replace the FIRST
argument to a function with whatever resulted as the value of the previous
expression. That is often a vector or data.frame or list or any kind of
object but can also be fairly complex as in a list of lists of matrices.

So you can still start with cos(x) OR you can write this where the x is
removed from within and leaves cos() empty:

x %>% cos
or
x |> cos()

In the previous version of pipes the parentheses after cos() are optional if
there are no additional arguments but the new pipe requires them.

So continuing the above, using multiple lines, the pipe looks like:

Result <-
  x %>%
  cos() %>%
  max(3.14159/4) %>%
  round(3)

This gives the same result but is arguably easier for some to read and
follow. Nobody forces you to use it and for simple cases, most people don't.

There is a grouping of packages called the tidyverse that makes heavy use of
pipes routine as they made most or all their functions such that the first
argument is the one normally piped to and it can be very handy to write code
that says, read in your data into a variable (a data.frame or tibble often)
and PIPE IT to a function that renames some columns and PIPE the resulting
modified object to a function that retains only selected rows and pipe that
to a function that drops some of the columns and pipe that to a function
that groups the items or sorts them and pipe that to a function that does a
join with another object or generates a report or so many other things.

So the real answer is that piping is another WAY of doing things from a
programmers perspective. Underneath it all, it is mostly syntactic sugar and
the interpreter rearranges your code and performs the steps in what seems
like a different order at times. Generally, you do not need to care.



-Original Message-
From: R-help  On Behalf Of Sorkin, John
Sent: Tuesday, January 3, 2023 11:49 AM
To: 'R-help Mailing List' 
Subject: [R] Pipe operator

I am trying to understand the reason for existence of the pipe operator,
%>%, and when one should use it. It is my understanding that the operator
sends the file to the left of the operator to the function immediately to
the right of the operator:

c(1:10) %>% mean results in a value of 5.5 which is exactly the same as the
result one obtains using the mean function directly, viz. mean(c(1:10)).
What is the reason for having two syntactically different but semantically
identical ways to call a function? Is one more efficient than the other?
Does one use less memory than the other? 

P.S. Please forgive what might seem to be a question with an obvious answer.
I am a programmer dinosaur. I have been programming for more than 50 years.
When I started programming in the 1960s the only pipe one spoke about was a
bong.  

John

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Integer division

2022-12-20 Thread avi.e.gross
Documentation specifics aside, and I am not convinced that is an issue here, 
there is a responsibility on programmers on how to use routines like this by 
testing small samples and seeing if the results match expectations.

Since negative numbers were possible, that would have been part of such tests.

And there are many ways to do things and the method chosen does not strike me 
as a particularly great method of finding out about the first digit unless you 
are guaranteed to have exactly five digits. It may be efficient but it can 
likely fail in many cases where the data is not as expected such as more than 
five digits or not containing a number.

So many programmers would first filter the data to check for various 
conditions. Some might simply try to convert the numbers into character strings 
(perhaps the absolute value of the number instead) and look at the first 
character instead, or handle it differently if it is a minus sign. 

Many programming languages contain families of functions for some tasks when 
there are many possible ways to do something that can get somewhat different 
results. There is absolutely NO reason to assume that any one member of a 
family of functions will do what you expect and you may need to either explore 
others that are similar or make your own.

Something as simple as this might give you what you want:

first_of_five <- function(numb) abs(numb) %/% 1





-Original Message-
From: R-help  On Behalf Of Göran Broström
Sent: Tuesday, December 20, 2022 1:53 AM
To: Richard O'Keefe 
Cc: r-help@r-project.org
Subject: Re: [R] Integer division

Thanks Richard,

the "rounding claim" was my mistake (as I replied to Martin), I should said 
"truncates toward zero" as you explain.

However, my point was that these two mathematical functions should be defined 
in the documentation, as you also say. And I was surprised that there is no 
consensus regarding the definition of such elementary functions.

Göran

On 2022-12-20 03:01, Richard O'Keefe wrote:
> The Fortran '08 standard says <<
> One operand of type integer may be divided by another operand of type 
> integer. Although the mathematical quotient of two integers is not 
> necessarily an integer, Table 7.2 specifies that an expression 
> involving the division operator with two operands of type integer is 
> interpreted as an expression of type integer. The result of such an 
> operation is the integer closest to the mathematical quotient and 
> between zero and the mathematical quotient inclusively. >> Another way 
> to say this is that integer division in Fortran TRUNCATES towards 
> zero.  It does not round and never has.
> 
> C carefully left the behaviour of integer division (/) unspecified, 
> but introduced the div(,) function with the same effect as Fortran 
> (/).  Later versions of the C standard tightened this up, and the C17 
> standard reads << The result of the / operator is the quotient from 
> the division of the first operand by the second; the result of the % 
> operator is the remainder. In both operations, if the value of the 
> second operand is zero, the behavior is undefined.
> When integers are divided, the result of the / operator is the 
> algebraic quotient with any fractional part discarded. 107) If the 
> quotient a/b is representable, the expression (a/b)*b + a%b shall 
> equal a ; otherwise, the behavior of both a/b and a%b is undefined.>>
> 
> That is, C17 TRUNCATES the result of division towards zero.  I don't 
> know of any C compiler that rounds, certainly gcc does not.
> 
> 
> The Java 15 Language Specification says << Integer division rounds 
> toward 0. >> which also specified truncating division.
> 
> 
> The help for ?"%/%" does not say what the result is.
> Or if it does, I can't find it.  Either way, this is a defect in the 
> documentation.  It needs to be spelled out very clearly.
> R version 4.2.2 Patched (2022-11-10 r83330) -- "Innocent and Trusting"
>  > c(-8,8) %/% 3
> [1] -3  2
> so we deduce that R *neither* rounds *not* truncates, but returns the 
> floor of the quotient.
> It is widely argued that flooring division is more generally useful 
> than rounding or truncating division, but it is admittedly surprising.
> 
> On Tue, 20 Dec 2022 at 02:51, Göran Broström  > wrote:
> 
> I have a long vector x with five-digit codes where the first digit of
> each is of special interest, so I extracted them through
> 
>   > y <- x %/% 1
> 
> but to my surprise y contained the value -1 in some places. It turned
> out that x contains -1 as a symbol for 'missing value' so in effect I
> found that
> 
>   > -1 %/% 1 == -1
> 
> Had to check the help page for "%/%", and the first relevant comment I
> found was:
> 
> "Users are sometimes surprised by the value returned".
> 
> No surprise there. Further down:
> 
> ‘%%’ indicates ‘x mod y’ (“x modulo y”) and ‘%/%’ indicates
>integer division.  It is 

Re: [R] Amazing AI

2022-12-19 Thread avi.e.gross
Boris,

What you are telling us is not particularly new or spectacular in a sense.
It has often been hard to grade assignments students do when they choose an
unexpected path. I had one instructor who always graded my exams (in the
multiple courses I took with him) because unlike most of the sheep, I did
not bother memorizing the way something was done or proven and created my
own solutions on the fly during the test, often in ways he had to work hard
at following and that almost amused him as mine tended to be correct, albeit
not what he would have thought of.

Your issue is not particularly about R as similar scenarios can be found in
many languages and environments.

In programming, it is arguably worse as it is common to be able to do things
so many ways. But plain old R has so many packages available, often with
source code, that any student who finds one that does something they want,
may be able to copy and modify some of the functions involved into their own
code already and fool an evaluator into thinking they did it on their own.
That is a tad harder, as many packages improve the code efficiency by
(re)writing many parts in C/C++.

I have seen things like a GUI that lets you click on various check boxes and
other controls and then use those instructions to read in data files, do
various operations on them, and provide output. Some allow quite a bit of
functionality and also offer you the opportunity to see the R code it
generates, and let you adjust that if it does not quite meet your needs.
Much of it involves including various packages and calling functions in
them, but if your students are allowed to use such things, how would you
know how little actual work they did?

I echo what someone else wrote. Training students for their future jobs in
an uncertain and changing future, may be more effective in teaching them how
to find ever better or different ways to get things done, or even switch to
growing areas near their field.  All kinds of automation of jobs are
happening and will continue to happen in the knowledge professions. People
who read manuals cover to cover and keep consulting them constantly are a
rarity. Many people often first do web searches or consult experts including
online versions of the documentation. Many will happily use software that
lets them do more and more with fewer lines of code written by them and
especially when that software has been used and tested long enough to be
relatively free of bugs when used as directed. Why would anyone these days
want to constantly re-invent the wheel and write routines to read in data
from files using known formats when you can use ones that exist and, if
needed in special cases, make some tweaks such as converting a column it
made into integer, back into the floating point you want for some later
reason?

But if your students are using something that is error-prone when used the
way they are using it, that is a problem as they are not only not learning
some basics or techniques you want them to know, but relying on what may be
bad tools without taking the time and effort to check the result or make
their own tweaks. Such software may not provide a way to do something like
treat multiple entries of various kinds as being NA, as an example. So you
would need your own code to check the result between some steps and do your
own further conversions so that "." and "" and "NA" and "-" all become NA,
again, just a made up example.

Yes, some students will easily fool you when grading but that already
happens when someone hires out getting some work done and claims it as their
own.

-Original Message-
From: R-help  On Behalf Of Boris Steipe
Sent: Monday, December 19, 2022 3:16 PM
To: Milan Glacier 
Cc: r-help@r-project.org
Subject: Re: [R] Amazing AI

Exactly. But not just "error prone", rather: eloquently and confidently
incorrect. And that in itself is a problem. When I evaluate students' work,
I implicitly do so from a mental model of the student - aptitude, ability,
experience, language skills etc. That's useful for summative assessment,
since it helps efficiency - but that won't work anymore. I see a need to
assess much more carefully, require fine-grained referencing, check every
single fact ... and that won't scale. And then there is also the spectre of
having to decide when this crosses the line to "concoction" - i.e. an actual
academic offence ...

Best,
Boris



> On 2022-12-19, at 03:58, Milan Glacier  wrote:
> 
> [You don't often get email from n...@milanglacier.com. Learn why this 
> is important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On 12/18/22 19:01, Boris Steipe wrote:
>> Technically not a help question. But crucial to be aware of, especially
for those of us in academia, or otherwise teaching R. I am not aware of a
suitable alternate forum. If this does not interest you, please simply
ignore - I already know that this may be somewhat OT.
>> 
>> Thanks.
>> 

Re: [R] interval between specific characters in a string...

2022-12-03 Thread avi.e.gross
This may be a fairly dumb and often asked question about some functions like 
strsplit()  that return a list of things, often a list of ONE thing that be 
another list or a vector and needs to be made into something simpler..

The examples shown below have used various methods to convert the result to a 
vector but why is this not a built-in option for such a function to simplify 
the result either when possible or always?

Sure you can subset it with " [[1]]" or use unlist() or as.vector() to coerce 
it back to a vector. But when you have a very common idiom and a fact that many 
people waste lots of time figuring out they had a LIST containing a single 
vector and debug, maybe it would have made sense to have either a sister 
function like strsplit_v() that returns what is actually wanted or allow 
strsplit(whatever, output="vector") or something giving the same result.

Yes, I understand that when there is a workaround, it just complicates the 
base, but there could be a package that consistently does things like this to 
make the use of such functions easier.



-Original Message-
From: R-help  On Behalf Of Hervé Pagès
Sent: Saturday, December 3, 2022 6:50 PM
To: Bert Gunter ; Rui Barradas 
Cc: r-help@r-project.org; Evan Cooch 
Subject: Re: [R] interval between specific characters in a string...

On 03/12/2022 07:21, Bert Gunter wrote:
> Perhaps it is worth pointing out that looping constructs like lapply() 
> can be avoided and the procedure vectorized by mimicking Martin 
> Morgan's
> solution:
>
> ## s is the string to be searched.
> diff(c(0,grep('b',strsplit(s,'')[[1]])))
>
> However, Martin's solution is simpler and likely even faster as the 
> regex engine is unneeded:
>
> diff(c(0, which(strsplit(s, "")[[1]] == "b"))) ## completely 
> vectorized
>
> This seems much preferable to me.

Of all the proposed solutions, Andrew Hart's solution seems the most
efficient:

   big_string <- strrep("abaaabbabaaabaaab", 50)

   system.time(nchar(strsplit(big_string, split="b", fixed=TRUE)[[1]]) + 1)
   #user  system elapsed
   #   0.736   0.028   0.764

   system.time(diff(c(0, which(strsplit(big_string, "", fixed=TRUE)[[1]] == 
"b"
   #user  system elapsed
   #  2.100   0.356   2.455

The bigger the string, the bigger the gap in performance.

Also, the bigger the average gap between 2 successive b's, the bigger the gap 
in performance.

Finally: always use fixed=TRUE in strsplit() if you don't need to use the regex 
engine.

Cheers,

H.


> -- Bert
>
>
>
>
>
> On Sat, Dec 3, 2022 at 12:49 AM Rui Barradas  wrote:
>
>> Às 17:18 de 02/12/2022, Evan Cooch escreveu:
>>> Was wondering if there is an 'efficient/elegant' way to do the 
>>> following (without tidyverse). Take a string
>>>
>>> abaaabbabaaab
>>>
>>> Its easy enough to count the number of times the character 'b' shows 
>>> up in the string, but...what I'm looking for is outputing the 'intervals'
>>> between occurrences of 'b' (starting the counter at the beginning of 
>>> the string). So, for the preceding example, 'b' shows up in 
>>> positions
>>>
>>> 2, 6, 7, 13, 17
>>>
>>> So, the interval data would be: 2, 4, 1, 6, 4
>>>
>>> My main approach has been to simply output positions (say, something 
>>> like unlist(gregexpr('b', target_string))), and 'do the math' 
>>> between successive positions. Can anyone suggest a more elegant approach?
>>>
>>> Thanks in advance...
>>>
>>> __
>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>> Hello,
>>
>> I don't find your solution inelegant, it's even easy to write it as a 
>> one-line function.
>>
>>
>> char_interval <- function(x, s) {
>> lapply(gregexpr(x, s), \(y) c(head(y, 1), diff(y))) }
>>
>> target_string <-"abaaabbabaaab"
>> char_interval('b', target_string)
>> #> [[1]]
>> #> [1] 2 4 1 6 4
>>
>>
>> Hope this helps,
>>
>> Rui Barradas
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

--
Hervé Pagès

Bioconductor Core Team
hpages.on.git...@gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and 

Re: [R] interval between specific characters in a string...

2022-12-02 Thread avi.e.gross
Evan, there are oodles of ways to do many things in R, and mcu of what the
tidyverse supplies can often be done as easily, or easier, outside it.

Before presenting a solution, I need to make sure I am answering the same
question or problem you intend.

Here is the string you have as an example:

st <- "abaaabbabaaab"

Is the string a string testing for single characters called "b" with any
other characters being either just "a" or at least non-"b" and of any length
but at least a few?

If so, ONE METHOD is to convert the string to a vector for reasons that will
become clear. For oddball reasons, this is a way to do it:

> unlist(strsplit(st,""))
[1] "a" "b" "a" "a" "a" "b" "b" "a" "a" "a" "a" "a" "b" "a" "a" "a" "b"

The result is a vector you can examine to see if they are equal to "b" or
not as a TRUE/FALSE vector:

> unlist(strsplit(st,"")) == "b"
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE
TRUE FALSE FALSE FALSE
[17]  TRUE

Now you can ask for the indices which are TRUE, meaning at what offset from
the beginning are there instances of the letter "b":

> which(unlist(strsplit(st,"")) == "b")
[1]  2  6  7 13 17

This shows the second the integer offsets for the letter "b" are the second,
sixth and so on to seventeenth. Again, if I understood you, you want a
measure of how far apart instances of "b" are with adjacent ones being 1
apart. Again, many methods but I chose one where I sort of slid over the
above values by sliding in a zero from the front and removing the last
entry. 

So save that in a variable  first:

indices <- which(unlist(strsplit(st,"")) == "b")
indices_shifted <- c(0, head(indices, -1))

The two contain:

> indices
[1]  2  6  7 13 17
> indices_shifted
[1]  0  2  6  7 13
> indices - indices_shifted 
[1] 2 4 1 6 4

The above is the same as your intended result.

If you want to be cautious, handle edge cases like not having any "b" or an
empty string.

Here is the consolidated code:

st <- "abaaabbabaaab"
indices <- which(unlist(strsplit(st,"")) == "b")
indices_shifted <- c(0, head(indices, -1))
result <- indices - indices_shifted

There are many other ways to do this and of course some are more
straightforward and some more complex.

Consider a loop using a vector version of the string where each time you see
a b", you remember the last index you saw it and put out the number
representing the gap.

Fairly low tech.


-Original Message-
From: R-help  On Behalf Of Evan Cooch
Sent: Friday, December 2, 2022 12:19 PM
To: r-help@r-project.org
Subject: [R] interval between specific characters in a string...

Was wondering if there is an 'efficient/elegant' way to do the following
(without tidyverse). Take a string

abaaabbabaaab

Its easy enough to count the number of times the character 'b' shows up in
the string, but...what I'm looking for is outputing the 'intervals' 
between occurrences of 'b' (starting the counter at the beginning of the
string). So, for the preceding example, 'b' shows up in positions

2, 6, 7, 13, 17

So, the interval data would be: 2, 4, 1, 6, 4

My main approach has been to simply output positions (say, something like
unlist(gregexpr('b', target_string))), and 'do the math' between successive
positions. Can anyone suggest a more elegant approach?

Thanks in advance...

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] add specific fields in for loop

2022-11-15 Thread avi.e.gross
Kai,

 

I have read all the messages exchanged so far and what I have not yet seen is a 
clear explanation of what you want to do. I mean not as R code that may have 
mistakes, but as what your goal is.

 

Your code below was a gigantic set of nested if statements that is not trivial 
to parse. 

 

So help explain a bit or you may keep getting great solutions to problems you 
are not trying to solve.

 

You have a data.frame you called “df” that seems to currently have no relation 
to the rest of the code. You do seem to have a data.frame called “try2.un” 
instead so I assume you want an answer using that.

 

Your code seems to want to make a new column called “ab2” by using info 
currently held in columns “data1” through “data5” but you want a solution that 
is more general. First I want to see what your code does do and make sure that 
is what you want.

 

Your code starts like this (see below for the complete code):

 

  ifelse(grepl("ab2",try2.un$data1), try2.un$data1, # else clauses below

 

The above uses the logical version of grep, lgrep, and it seems that you are 
asking for all of the items in the column vector data1 to be searched for the 
unanchored presence of the string “ab2” and the first result is a vector of 
TRUE/FALSE. For those that are TRUE, meaning “ab2” was found, you want the 
actual result copied into the new column named “ab2” and for those marked as 
FALSE, continue with the next code line. I note you do not show any 
initialization for the new column to something like NA and depend on the final 
nested ifelse to set that as a default.

 

If what I wrote above is correct, then for any rows where data1 did not contain 
the specified text, you now search in data2:



 ifelse(grepl("ab2",try2.un$data2), try2.un$data2,

 

In this design, anything found in multiple places will only match the first 
place found. Anything not found anywhere ends up with an NA.

 

So in English, IFF the above is what you want, you want a search across all 
columns for the designated search string of “ab2” but only keep the first.

 

To make a loop I suggest something like this:

 

  try2.un$ab2 <- NA

 

Then choose what columns you want but do NOT choose “ab2”. If you want ALL 
other columns, then BEFORE the above line, save the current names as in:

 

  loop.cols <- names(try2.un)

 

If you only want a subset, use some code that narrows down what you want. You 
have not told us enough to make a suggestion. The point remains to have a 
variable (vector) that can be used in a loop that holds exactly the columns you 
want and in the right order. Unless I read you wrong, the order MATTERS as the 
first match wins and if the columns have different matches like “I am ab2” and 
“ab2 was my mother” you get the idea that you are keeping the exact text of the 
first match.

 

If my guess of your need was wrong, the rest is not going to make much sense.

 

So here is a loop:

 

  for (i in loop.cols) { print(i)}

 

I used “i” because you seem to like it. I prefer a more useful name. All the 
above does is print the names so you see if what you are doing makes sense.

 

Now rewrite that to do what you want and find a way to only update an NA value. 
You may want to think about what that means.

 

One idea is 

  try2.un$ab2 <-

ifelse(is.na(try2.un$ab2) && grepl("ab2",try2.un[i]), 

   try2.un[i], 

   try2.un$ab2)

 

The above, which I have not tried, would be run in a loop and checks both 
whether an entry is still NA, and whether the current ith column has what you 
want. If both are true, it selects the value for those entries/rows from the 
column being looped on. If not, it retains the current non-NA setting from an 
earlier iteration of the loop.

 

You need to flesh this out for yourself as I am not supplying complete and 
tested code.

 

But note this is a very different meaning that some of us guessed and may still 
not be what you want. There are many such questions about doing something the 
same to each of the selected columns in a data.frame as in replacing all values 
of 999 with NA. In many such cases the order does not matter. Other such 
questions may want to check if any of the columns matches and simply return 
TRUE/FALSE in a new column or externally. Some of such requests are potentially 
simpler and easier. 

 

So you need to be very clear on what you want. I am going by what I think your 
sample code DOES and am not too sure it is exactly what you want.

 

 

From: Kai Yang  
Sent: Tuesday, November 15, 2022 1:53 PM
To: 'R-help Mailing List' ; avi.e.gr...@gmail.com
Subject: Re: [R] add specific fields in for loop

 

Hello Bert and Avi,

Sorry, it is typo. it should be:

 

for (i in colnames(df)){
  ..
}

 

below is the code I'm currently using

 

try2.un$ab2 <-

 

  ifelse(grepl("ab2",try2.un$data1), try2.un$data1,

 

 ifelse(grepl("ab2",try2.un$data2), try2.un$data2,

 


Re: [R] add specific fields in for loop

2022-11-15 Thread avi.e.gross
Kai,

As Bert pointed out, it may not be clear what you want.

As a GUESS, you have some arbitrary data.frame object with multiple columns and 
you want to do something on selected columns. Consider changing your idea to be 
in several stages for simplicity and then optionally later rewriting it.

So step 1 is to get a vector of column names. The normal way to do this in base 
R is not with a function called columns(df) but colnames(df) ...

Step 2 is to use one of many techniques that take that vector of names and 
select the ones you want to keep. In base R there are many ways to do that 
including using regular expressions as in the "grep" family of functions. You 
may end up with a new vector of names perhaps shorter or in a different order.

Step 3 is to use those names in your loop. If you want say to convert a column 
from character to numeric, and your loop index is "current" you might write 
something like:
df[current] <- as.numeric(df[current])

There are many ways and it depends on what exactly you want to do. There are 
packages designed to make some of these things fairly simple, such as dplyr 
where you can ask to match names that start or end a certain way or that are of 
certain types.

Avi

-Original Message-
From: R-help  On Behalf Of Kai Yang via R-help
Sent: Tuesday, November 15, 2022 11:18 AM
To: R-help Mailing List 
Subject: [R] add specific fields in for loop

Hi Team,
I can write a for loop like this:
for (i in columns(df)){
  ..
}

But it will working on all column in dataframe df. If I want to work on some of 
specific fields (say: the fields' name content 'date'), how should I modify the 
for loop? I changed the code below, but it doesn't work.
for (i in columns(df) %in% 'date' ){
  .
}


Thank you,
Kai

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading very large text files into R

2022-09-30 Thread avi.e.gross
Tim and others,

A point to consider is that there are various algorithms in the functions
used to read in formatted data into data.frame form and they vary. Some do a
look-ahead of some size to determine things and if they find a column that
LOOKS LIKE all integers for say the first thousand lines, they go and read
in that column as integer. If the first floating point value is thousands of
lines further along, things may go wrong.

So asking for line/row 16 to have an extra 16th entry/column may work fine
for an algorithm that looks ahead and concludes there are 16 columns
throughout. Yet a file where the first time a sixteenth entry is seen is at
line/row 31,459 may well just set the algorithm to expect exactly 15 columns
and then be surprised as noted above.

I have stayed out of this discussion and others have supplied pretty much
what I would have said. I also see the data as flawed and ask which rows are
the valid ones. If a sixteenth column is allowed, it would be better if all
other rows had an empty sixteenth column. If not allowed, none should have
it.

The approach I might take, again as others have noted, is to preprocess the
data file using some form of stream editor such as AWK that automagically
reads in a line at a time and parses lines into a collection of tokens based
on what separates them such as a comma. You can then either write out just
the first 15 to the output stream if your choice is to ignore a spurious
sixteenth, or write out all sixteen for every line, with the last being some
form of null most of the time. And, of course, to be more general, you could
make two passes through the file with the first one determining the maximum
number of entries as well as what the most common number of entries is, and
a second pass using that info to normalize the file the way you want. And
note some of what was mentioned could often be done in this preprocessing
such as removing any columns you do not want to read into R later. Do note
such filters may need to handle edge cases like skipping comment lines or
treating the row of headers differently.

As some have shown, you can create your own filters within a language like R
too and either read in lines and pre-process them as discussed or continue
on to making your own data.frame and skip the read.table() type of
functionality. For very large files, though, having multiple variations in
memory at once may be an issue, especially if they are not removed and
further processing and analysis continues.

Perhaps it might be sensible to contact those maintaining the data and point
out the anomaly and ask if their files might be saved alternately in a
format that can be used without anomalies.

Avi

-Original Message-
From: R-help  On Behalf Of Ebert,Timothy Aaron
Sent: Friday, September 30, 2022 7:27 AM
To: Richard O'Keefe ; Nick Wray 
Cc: r-help@r-project.org
Subject: Re: [R] Reading very large text files into R

Hi Nick,
   Can you post one line of data with 15 entries followed by the next line
of data with 16 entries? 

Tim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Converting a Date variable from character to Date

2022-09-29 Thread avi.e.gross
I am not replying to the earlier request just to the part right below my
message.

A simple suggestion when sending people code is to add NOTHING except proper
comments.

Can we assume the extra asterisks are superfluous and not in your code?

I mean your column is named "Period" and not "*Period" and your meaningless
call to format(...) was not to *format(...)* ...

And I note in R upper and lower case are not interchangeable. CPI as a
column name does not match:

class(inflation.2$cpi)

I am not clear what the above is supposed to do. What do you want to set the
class of a column in a data.frame to? If I am guessing correctly, the normal
way people do a change from one TYPE to another looks more like:

after <- as.character(before)

In your case, your data is of type character and you want to make it a date
of one kind or another. If you do a little searching, you may find a bunch
of ways to convert properly formatted strings to dates or date/time types.
Your data is NOT a standard date format so none of the standard ones will
work.

I am guessing  "2022m1" may mean first month in 2022 and goes as high as
2022m12 before shifting to 2023. Good luck with that. It is far easier if
your data looked like "2022-01-01" or some such format that might be read
easily. You need to do one of many things I will not show here to break that
date into parts or have it parsed properly as with a function like
strptime() using a package. 

As a general comment, I hope your meaning of command line is within the R in
interpreter rather than other meanings like for some shell utility.

And note that generally the R method of handling a data.frame using base R
or a package like dplyr requires most changes to be saved into the same or a
new variable. Your sample code makes no sense to me. 

So assuming at some point your code got the data you want into a data.frame
with a character column called inflation.1$Period, then base R would allow
you to call some function that does the conversion, which I am calling
doit() here) this way:

inflation.1$Period <- doit(inflation.1$Period)

Good Luck. You need to show a bit more knowledge of R before people can help
you with more advanced tasks.


-Original Message-
From: R-help  On Behalf Of Admire Tarisirayi
Chirume
Sent: Thursday, September 29, 2022 12:36 PM
To: Jeff Newmiller 
Cc: r-help mailing list 
Subject: [R] Converting a Date variable from character to Date

Kindly request assistance to *convert a Date variable from a character to be
recognized as a date*.
NB: kindly take note that the data is in a csv file called *inflation*. I
have included part of the file content herewith with the header for
assistance.


My data looks like this:
*PeriodCPI*
2022m1 4994
2022m2 5336
2022m3 5671
2022m4 6532
2022m5 7973
2022m610365
2022m712673
2022m814356
2022m914708

 I used the following command lines.


class(inflation.2$cpi)
inflation.2$cpi <- as.numeric(as.character(inflation.2$cpi))
*format(as.Date(inflation.2$period), "%Y-%m")*

Having run the command lines above, the variable *period* in the attached
CSV file remains being read as a character variable. Kindly assist.

Thank you.


Alternative email: addtar...@icloud.com/tchir...@rbz.co.zw
Skype: admirechirume
Call: +263773369884
whatsapp: +818099861504

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How long does it take to learn the R programming language?

2022-09-29 Thread avi.e.gross
Has anyone noticed something a tad unusual?

 

Someone shows up and seemingly politely asks a totally open-ended question and 
supplies NO DETTAILS about their personal status and experience that would be 
needed to tell hem whether it would take various amounts of time for him to 
learn enough R for whatever purposes.

 

Lots of people jump in and discuss it, and I choose this time to sit and wait 
and not point out the endless considerations others have nicely contributed.

 

What is missing is not a single polite reply from the original person 
acknowledging these efforts on his behalf, let alone ANSWERING some of the 
questions like whether he already has some experience with programming or what 
he wants to use R for.

 

As such, I am suspicious and won’t get involved with this and suggest others 
reconsider the need to keep discussing the topic unless it is for their own 
interest.

 

I have seen this many times now on multiple such boards. Either some people do 
not understand what is expected, or someone is trolling and just looking to get 
a reaction.

 

I prefer to deal with more focused questions if someone is asking for help such 
as what package might help them do a somewhat specific task or why they are 
getting an error message. A general question like whether R or Python or 
something else is better for a particular task might also be reasonable. But 
how long it takes to learn ANYTHING seems to be a very subjective question, let 
alone something as multi-faceted as a programming language that can be used for 
so many different things.

 

Just my two cents.

 

I will say it did not take me long to learn a decent amount of R and yet I keep 
learning and am very far from knowing a fraction of all there is to know and 
especially not things I have had no reason to know yet.

 

From: jim holtman  
Sent: Thursday, September 29, 2022 12:28 PM
To: Ebert,Timothy Aaron 
Cc: Avi Gross ; John Kane ; R. 
Mailing List 
Subject: Re: [R] How long does it take to learn the R programming language?

 

Still at it after 38 years.  First came across S at Bell Labs in 1984.

 

Thanks


Jim Holtman
Data Munger Guru
 
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

 

 

On Thu, Sep 29, 2022 at 7:09 AM Ebert,Timothy Aaron mailto:teb...@ufl.edu> > wrote:

Learning R takes an hour. Find an hourglass, flip it over. Meanwhile we will 
start increasing the size of the upper chamber and adding more sand. 

Mastery of R is an asymptotic function of time. 

While such answers might indicate trying for mastery is futile, you can learn 
enough R to be very useful long before "mastery."

Tim
-Original Message-
From: R-help mailto:r-help-boun...@r-project.org> > On Behalf Of Avi Gross
Sent: Wednesday, September 28, 2022 5:51 PM
To: John Kane mailto:jrkrid...@gmail.com> >
Cc: R. Mailing List mailto:r-help@r-project.org> >
Subject: Re: [R] How long does it take to learn the R programming language?

[External Email]

So is the proper R answer simply Inf?

On Wed, Sep 28, 2022, 5:39 PM John Kane mailto:jrkrid...@gmail.com> > wrote:

> + 1
>
> On Wed, 28 Sept 2022 at 17:36, Jim Lemon   > wrote:
>
> > Given some of the questions that are posted to this list, I am not 
> > sure that there is an upper bound to the estimate.
> >
> > Jim
> >
> > __
> > R-help@r-project.org   mailing list -- To 
> > UNSUBSCRIBE and more, see
> > https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fst
> > at.ethz.ch  
> > %2Fmailman%2Flistinfo%2Fr-helpdata=05%7C01%7Ctebert%4
> > 0ufl.edu  
> > %7C7229f6c17d764bd2742c08daa19bb65b%7C0d4da0f84a314d76ace60a
> > 62331e1b84%7C0%7C0%7C63787396320713%7CUnknown%7CTWFpbGZsb3d8eyJW
> > IjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C300
> > 0%7C%7C%7Csdata=8KNANsIMtWiElOAwn9pXvx%2BsueyNn329VkvFFx8Paew%3
> > Dreserved=0
> > PLEASE do read the posting guide
> > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww
> > .r-project.org  
> > %2Fposting-guide.htmldata=05%7C01%7Ctebert%40ufl.
> > edu%7C7229f6c17d764bd2742c08daa19bb65b%7C0d4da0f84a314d76ace60a62331
> > e1b84%7C0%7C0%7C63787396320713%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiM
> > C4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%
> > 7C%7Csdata=32nVjz3UeC4QK7dd2PHA76BywkYQP9ucuN%2FWFFAUX8k%3D
> > ;reserved=0 and provide commented, minimal, self-contained, 
> > reproducible code.
> >
>
>
> --
> John Kane
> Kingston ON Canada
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org   mailing list -- To 
> UNSUBSCRIBE and more, see
> https://nam10.safelinks.protection.outlook.com/?url=https%3A%2F%2Fstat
> .ethz.ch  
> 

Re: [R] Write text file in Fortran format

2022-09-22 Thread avi.e.gross
Javad,

After reading the exchanges, I conclude you are asking a somewhat different 
question than some of us expected and I see some have zoomed in on what you 
seem to want.

You seem to want to make a very focused change and save the results to be as 
identical as what you started with. You also note you really have no idea about 
the process that created the file or uses it so it is hard to know if just 
changing parts results in a valid file.

So maybe this looks like not one file but three files. You seem to have a 
header region of N lines you want kept intact but also placed in the output 
untouched. You seem to have a few lines at the end you also seem to want to 
leave untouched. You have lots of lines in the middle you want treated as if it 
contains multiple columns of data in some format that looks like it is 
whitespace-separated, or maybe tabs.

And I think you want to do this transformation perhaps only once so a general 
purpose solution is not required.

So how about you get your favorite text editor or tool and extract the first N 
lines from the file and place it in a file called HEADER. You may also then 
delete it from a copy of the original file if you wish.

Similarly, extract the last lines you want to keep from the file and place it 
in a file called FOOTER.

What is left behind in a copy of the file should then be something people here 
might easily work with. You can use one of many methods as you wish to read the 
data into some kind of data.frame and supply names for the columns and 
whatever. You can make your changes to what seems like one column. You can save 
the output conceptually to use the same format as the input and place it in a 
file called NEW_DATA.

Note no actual files are necessarily needed but conceptually you now have some 
text in places called HEADER, NEW_DATA and FOOTER and you can combine them 
carefully (as in no blank lines or two lines concatenated without a newline) 
and get a new output file that should look like the input.

If your data format in the middle section uses something like a tab, this 
should work. If it uses whitespace and perhaps a fixed-width per line, you have 
an issue to deal with if your changes need more room than is available or need 
adjustment to calculate how many spaces to remove or add to center or line up 
the columns visible.

I will end by saying some others have offered variations that are probably just 
as reasonable. The problem is we may suggest you skip the first lines and it 
turns out you want to preserve them. Most R functions and packages I know about 
have no interest in preserving what effectively are often just comments but it 
seems easy enough to do. Your lines at the end are a bigger problem if you use 
standard commands to read in the data BECAUSE many such programs do a 
look-ahead at perhaps the first thousand lines to try to determine the number 
of columns and the type of each. Your extra lines at the end may perturb that 
and there may be weird results in your data at the end or even things like an 
integer column being considered to be character.



-Original Message-
From: R-help  On Behalf Of javad bayat
Sent: Thursday, September 22, 2022 6:35 AM
To: Rui Barradas 
Cc: R-help@r-project.org
Subject: Re: [R] Write text file in Fortran format

These 2 lines were at the end of the text file, which I have attached but I had 
removed them to read the text file in R.
Just like the first 8 line that start with asterisk (*).






On Thu, 22 Sep 2022, 12:21 Rui Barradas,  wrote:

> Hello,
>
> Are those lines at the begining of the file?
>
> Rui Barradas
>
> Às 06:44 de 22/09/2022, javad bayat escreveu:
> > Dear all; Many thanks for your useful comments and codes.
> > I tried both Rui's and Jim's codes.
> > Jim's codes gave an error as below:
> > "Error in substr(inputline, 1, begincol3 - 1) :
> >object 'inputline' not found"
> > I don't know what's wrong.
> > The Rui's codes worked correctly for the attached file. But I have 
> > edited that file and removed 2 last lines from the file because I 
> > could not read it into R.
> > These 2 lines were:
> > "
> > ** CONCUNIT ug/m^3
> > ** DEPUNIT g/m^2
> > "
> > When I tried to run the code for my file that has these 2 lines at 
> > the
> end,
> > it gave me this error:
> > "
> >> df1 <- read.table(text = txt_table)
> > Error in read.table(text = txt_table) : no lines available in input 
> > "
> > The codes before the "df1 <- read.table(text = txt_table)" were run 
> > correctly.
> > Sincerely
> >
> >
> >
> > On Thu, Sep 22, 2022 at 6:58 AM javad bayat 
> wrote:
> >
> >> Dear all;
> >> I apologise, I didn't know that I have to cc the list.
> >> Thank you Mr Rui for reminding me.
> >> Let me clarify more.
> >> I have no knowledge of the FORTRAN language. The text file that has 
> >> been attached is a model's output file and I know that the format 
> >> is in
> FORTRAN.
> >> I want to write a text file exactly similar to the attached text 
> >> file

Re: [R] Remove line from data file

2022-09-19 Thread avi.e.gross
David,

As others have said, there are many possible answers for a vague enough 
question.

For one-time data it is often easiest to simply change the data source as you 
say you did in EXCEL.

Deleting the 18th row can easily be done in R and might make sense if you get 
daily data and decided the 18th reporting station is not reliable and should 
always be excluded. As has been shown, the usual paradigm in R is to filter the 
data through a set of conditions and a very simple one is to specify which 
indices in rows and/or columns to exclude.

If you already have your data in mydata.old, then you can make a mydata.new 
that excludes that 18th row with something as simple as:

mydata.new <- mydata.old[ -18, ]

Since your question was not focused, the answer mentioned that it is common to 
delete based on all kinds of conditions. An example would be if you did not 
want to remove row 18 specifically but any row where a column says the 
collector/reporter of the info was "Smith" which may remove many rows in the 
data, or you wanted only data with a column giving a date in 2021 and not 
before or after.

This filter method is not a deletion per se, but a selective retention, and 
often has the same result. If your goal includes making the deletion of 
selected data permanent, of course, it is wise then to save the altered data in 
another file so later use starts with what you want. 

Actually removing a row from an original data.frame is not something people 
normally do. A data.frame is a list of vectors of some length and generally 
does not allow operations that might produce vectors of unequal length. You can 
set the entire row to be filled with NA if you want but removing the actual row 
in-place is the kind of thing  I do not normally see. In a sense, R is wasteful 
that way as you often end up making near-copies of your data and sometimes 
simply re-assigning the result to the same variable and expecting the old 
version to be garbage collected. As I said, the paradigm is selection more than 
alteration/deletion.

It is, of course, possible to create your own data structures where you could 
do something closer to a deletion of a row while leaving the rest in place but 
there likely is no need for your small amount of data. 

Note columns in R can be deleted easily because they are a top level entry in a 
LIST. mydata$colname <- NULL or similar variants will remove a column cleanly 
in the original data.frame. But as noted, rows in R do not really exist other 
than as a construct that tries to bind the nth elements of each underlying 
vector representing the columns. 

Of course we now can have list-columns in things like tibbles which makes me 
wonder ... 




-Original Message-
From: R-help  On Behalf Of Parkhurst, David
Sent: Sunday, September 18, 2022 8:49 AM
To: CALUM POLWART 
Cc: R-help@r-project.org
Subject: Re: [R] Remove line from data file

Thank you for your reply.  I meant from the dataframe, but that s one of the 
terms I had forgotten.  I created that from read.csv, the csv file coming from 
Excel.  Last night I went ahead and made the change(s) using Excel.

For future reference, when I look at your solutions below, what do you mean by  
value to delete ?  Could that just be a row number?  I was wanting to delete 
something like the 18th row in the dataframe?

From: CALUM POLWART 
Date: Sunday, September 18, 2022 at 7:25 AM
To: Parkhurst, David 
Cc: R-help@r-project.org 
Subject: Re: [R] Remove line from data file From the file? Or the data frame 
once its loaded?

What format is the file? CSV?

Do you know the line that needs deleted?

mydf <- read.csv("myfile.csv")

mydf2 <- mydf[-columnName == "valuetodelete", ] # Note the - infront of column 
name # or perhaps columnName != "value to delete", ]

write.csv(mydf2, "mydeletedfile.csv")




On Sun, 18 Sep 2022, 10:33 Parkhurst, David, 
mailto:parkh...@indiana.edu>> wrote:
I ve been retired since  06 and have forgotten most of R.  Now I have a use for 
it.  I ve created a data file and need to delete one row from it.  How do I do 
that?

DFP (iPad)
__
R-help@r-project.org mailing list -- To 
UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] removing non-table lines

2022-09-18 Thread avi.e.gross
Adding to what Nick said, extra lines like those described often are in some 
comment format like beginning with "#" or some consistent characters that can 
be filtered out using comment.char='#' for example in read.csv() or 
comment="string" in the tidyverse function read_csv().

And, of course you can skip lines if that makes sense albeit it can be tricky 
with header lines.

-Original Message-
From: R-help  On Behalf Of Rui Barradas
Sent: Sunday, September 18, 2022 6:19 PM
To: Nick Wray ; r-help@r-project.org
Subject: Re: [R] removing non-table lines

Helo,

Unfortunatelly there are many files with a non tabular data section followed by 
the data. R's read.table has a skip argument:

skip
integer: the number of lines of the data file to skip before beginning to read 
data.

If you do not know how many lines to skip because it's not always the same 
number, here are some ideas.

Is there a pattern in the initial section? Maybe a end-of-section line or maybe 
the text lines come in a specified order and a last line in that order can be 
detected with a regex.

Is there a pattern in the tables' column headers? Once again a regex might be 
the solution.

Is the number of initial lines variable because there are file versions? 
If there are, did the versions evolve over time, a frequent case?

What you describe is not unfrequent, it's always a nuisance and error prone but 
it should be solvable once patterns are found. Inspect a small number of files 
with a text editor and try to find both common points and differences. That's 
half way to a solution.

Hope this helps,

Rui Barradas

Às 20:39 de 18/09/2022, Nick Wray escreveu:
> Hello - I am having to download lots of rainfall and temperature data 
> in csv form from the UK Met Office.  The data isn't a problem - it's 
> in nice columns and can be read into R easily - the problem is that in 
> each csv there are 60 or so lines of information first which are not 
> part of the columnar data.  If I read the whole csv into R the column 
> data is now longer in columns but in some disorganised form - if I 
> manually delete all the text lines above and download I get a nice 
> neat data table.  As the text lines can't be identified in R by line 
> numbers etc I can't find a way of deleting them in R and atm have to 
> do it by hand which is slow.  It might be possible to write a 
> complicated and dirty algorithm to rearrange the meteorological data 
> back into columns but I suspect that it might be hard to get right and 
> consistent across every csv sheet and any errors
> might be hard to spot.   I can't find anything on the net about this - has
> anyone else had to deal with this problem and if so do they have any 
> solutions using R?
> Thanks Nick Wray
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


  1   2   >