Re: [R] any and all

2024-04-12 Thread Dénes Tóth

Hi Avi,

As Duncan already mentioned, a reproducible example would be helpful to 
assist you better. Having said that, I think you misunderstand how 
`dplyr::filter` works: it performs row-wise filtering, so the filtering 
expression shall return a logical vector of the same length as the 
data.frame, or must be a single boolean value meaning "keep all" (TRUE) 
or "drop all" (FALSE). If you use `any()` or `all()`, they return a 
single boolean value, so you have an all-or-nothing filter in the end, 
which is probably not what you want.


Note also that you do not need to use `mutate` to use `filter` (read 
?dpylr::filter carefully):

```
filter(
  .data = mydata,
  !is.na(first.a) | !is.na(first.b),
  !is.na(second.a) | !is.na(second.b),
  !is.na(third.a) | !is.na(third.b)
)
```

Or you can use `base::subset()`:
```
subset(
  mydata,
  (!is.na(first.a) | !is.na(first.b))
  & (!is.na(second.a) | !is.na(second.b))
  & (!is.na(third.a) | !is.na(third.b))
)
```

Regards,
Denes

On 4/12/24 23:59, Duncan Murdoch wrote:

On 12/04/2024 3:52 p.m., avi.e.gr...@gmail.com wrote:
Base R has generic functions called any() and all() that I am having 
trouble

using.
It works fine when I play with it in a base R context as in:

all(any(TRUE, TRUE), any(TRUE, FALSE))

[1] TRUE

all(any(TRUE, TRUE), any(FALSE, FALSE))

[1] FALSE
But in a tidyverse/dplyr environment, it returns wrong answers.
Consider this example. I have data I have joined together with pairs of
columns representing a first generation and several other pairs 
representing

additional generations. I want to consider any pair where at least one of
the pair is not NA as a success. But in order to keep the entire row, 
I want

all three pairs to have some valid data. This seems like a fairly common
reasonable thing often needed when evaluating data.
So to make it very general, I chose to do something a bit like this:


We can't really help you without a reproducible example.  It's not 
enough to show us something that doesn't run but is a bit like the real 
code.


Duncan Murdoch


result <- filter(mydata,
  all(
    any(!is.na(first.a), !is.na(first.b)),
    any(!is.na(second.a), !is.na(second.b)),
    any(!is.na(third.a), !is.na(third.b
I apologize if the formatting is not seen properly. The above logically
should work. And it should be extendable to scenarios where you want at
least one of M columns to contain data as a group with N such groups 
of any

size.
But since it did not work, I tried a plan that did work and feels 
silly. I

used mutate() to make new columns such as:
result <-
   mydata |>
   mutate(
 usable.1 = (!is.na(first.a) | !is.na(first.b)),
 usable.2 = (!is.na(second.a) | !is.na(second.b)),
 usable.3 = (!is.na(third.a) | !is.na(third.b)),
 usable = (usable.1 & usable.2 & usable.3)
   ) |>
   filter(usable == TRUE)
The above wastes time and effort making new columns so I can check the
calculations then uses the combined columns to make a Boolean that can be
used to filter the result.
I know this is not the place to discuss dplyr. I want to check first 
if I am
doing anything wrong in how I use any/all. One guess is that the 
generic is

messed with by dplyr or other packages I libraried.
And, of course, some aspects of delayed evaluation can interfere in 
subtle

ways.
I note I have had other problems with these base R functions before and
generally solved them by not using them, as shown above. I would much 
rather

use them, or something similar.
Avi

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Functional Programming Problem Using purr and R's data.table shift function

2023-01-03 Thread Dénes Tóth

Hi Michael,

R returns the result of the last evaluated expression by default:
```
add_2 <- function(x) {
  x + 2L
}
```

is the same as and preferred over
```
add_2_return <- function(x) {
  out <- x + 2L
  return(out)
}
```

In the idiomatic use of R, one uses explicit `return` when one wants to 
break the control flow, e.g.:

```
add_2_if_number <- function(x) {
  ## early return if x is not numeric
  if (!is.numeric(x)) {
return(x)
  }
  ## process otherwise (usually more complicated steps)
  ## note: this part will not be reached for non-numeric x
  x + 2L
}
```

So yes, you should drop the last "%>% `[`" altogether as `[.data.table` 
already returns the whole (modified) data.table when `:=` is used.


Side note:: If you use >=R4.1.0 and you do not use special features of 
`%>%`, try the native `|>` operator first (see `?pipeOp`). 1) You do not 
depend an a user-contributed package, and 2) it works at the parser level.


Cheers,
Denes

On 1/2/23 18:59, Michael Lachanski wrote:

Dénes, thank you for the guidance - which is well-taken.

Your side note raises an interesting question: I find the piping %>% 
operator readable. Is there any downside to it? Or is the side note 
meant to tell me to drop the last: "%>% `[`"?


Thank you,


==
Michael Lachanski
PhD Student in Demography and Sociology
MA Candidate in Statistics
University of Pennsylvania
mikel...@sas.upenn.edu <mailto:mikel...@sas.upenn.edu>


On Sat, Dec 31, 2022 at 9:22 AM Dénes Tóth <mailto:toth.de...@kogentum.hu>> wrote:


Hi Michael,

Note that you have to be very careful when using by-reference
operations
in data.table (see `?data.table::set`), especially in a functional
programming approach. In your function, you avoid this problem by
calling `data.table(A)` which makes a copy of A even if it is already a
data.table. However, for large data.table-s, copying can be a very
expensive operation (esp. in terms of RAM usage), which can be totally
eliminated by using data.tables in the data.table-way (e.g., joining,
grouping, and aggregating in the same step by performing these
operations within `[`, see `?data.table`).

So instead of blindly functionalizing all your code, try to be
pragmatic. Functional programming is not about using pure functions in
*every* part of your code base, because it is unfeasible in 99.9% of
real-world problems. Even Haskell has `IO` and `do`; the point is that
the  imperative and functional parts of the code are clearly separated
and imperative components are (tried to be) as top-level as possible.

So when using data.table, a good strategy is to use pure functions for
performing within-data.table operations, e.g., `DT[, lapply(.SD, mean),
.SDcols = is.numeric]`, and when these operations alter `DT` by
reference, invoke the chains of these operations in "pure" wrappers -
e.g., calling `A <- copy(A)` on the top and then modifying `A` directly.

Cheers,
Denes

Side note: You do not need to use `DT[ , A:= shift(A, fill = NA, type =
"lag", n = 1)] %>% `[`(return(DT))`. `[.data.table` returns the result
(the modified DT) invisibly. If you want to let auto-print work, you
can
just use `DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)][]`.

Note that this also means you usually you do not need to use magrittr's
or base-R pipe when transforming data.table-s. You can do this instead:
```
DT[
    ## filter rows where 'x' column equals "a"
    x == "a"
][
    ## calculate the mean of `z` for each gender and assign it to `y`
    , y := mean(z), by = "gender"
][
    ## do whatever you want
    ...
]
```


On 12/31/22 13:39, Rui Barradas wrote:
 > Às 06:50 de 31/12/2022, Michael Lachanski escreveu:
 >> Hello,
 >>
 >> I am trying to make a habit of "functionalizing" all of my code as
 >> recommended by Hadley Wickham. I have found it surprisingly
difficult
 >> to do
 >> so because several intermediate features from data.table break
or give
 >> unexpected results using purrr and its data.table adaptation,
tidytable.
 >> Here is the a minimal working example of what has stumped me most
 >> recently:
 >>
 >> ===
 >>
 >> library(data.table); library(tidytable)
 >>
 >> minimal_failing_function <- function(A){
 >>    DT <- data.table(A)
 >>    DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)] %>% `[`
 >>    return(DT)}
 >> # works
 >> minimal_failing_function(c(1,2))
 >> # fails
 >> tidytable::pmap_dfr(.l = list(c(1,2)),
 >>  .f =

Re: [R] Functional Programming Problem Using purr and R's data.table shift function

2022-12-31 Thread Dénes Tóth

Hi Michael,

Note that you have to be very careful when using by-reference operations 
in data.table (see `?data.table::set`), especially in a functional 
programming approach. In your function, you avoid this problem by 
calling `data.table(A)` which makes a copy of A even if it is already a 
data.table. However, for large data.table-s, copying can be a very 
expensive operation (esp. in terms of RAM usage), which can be totally 
eliminated by using data.tables in the data.table-way (e.g., joining, 
grouping, and aggregating in the same step by performing these 
operations within `[`, see `?data.table`).


So instead of blindly functionalizing all your code, try to be 
pragmatic. Functional programming is not about using pure functions in 
*every* part of your code base, because it is unfeasible in 99.9% of 
real-world problems. Even Haskell has `IO` and `do`; the point is that 
the  imperative and functional parts of the code are clearly separated 
and imperative components are (tried to be) as top-level as possible.


So when using data.table, a good strategy is to use pure functions for 
performing within-data.table operations, e.g., `DT[, lapply(.SD, mean), 
.SDcols = is.numeric]`, and when these operations alter `DT` by 
reference, invoke the chains of these operations in "pure" wrappers - 
e.g., calling `A <- copy(A)` on the top and then modifying `A` directly.


Cheers,
Denes

Side note: You do not need to use `DT[ , A:= shift(A, fill = NA, type = 
"lag", n = 1)] %>% `[`(return(DT))`. `[.data.table` returns the result 
(the modified DT) invisibly. If you want to let auto-print work, you can 
just use `DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)][]`.


Note that this also means you usually you do not need to use magrittr's 
or base-R pipe when transforming data.table-s. You can do this instead:

```
DT[
  ## filter rows where 'x' column equals "a"
  x == "a"
][
  ## calculate the mean of `z` for each gender and assign it to `y`
  , y := mean(z), by = "gender"
][
  ## do whatever you want
  ...
]
```


On 12/31/22 13:39, Rui Barradas wrote:

Às 06:50 de 31/12/2022, Michael Lachanski escreveu:

Hello,

I am trying to make a habit of "functionalizing" all of my code as
recommended by Hadley Wickham. I have found it surprisingly difficult 
to do

so because several intermediate features from data.table break or give
unexpected results using purrr and its data.table adaptation, tidytable.
Here is the a minimal working example of what has stumped me most 
recently:


===

library(data.table); library(tidytable)

minimal_failing_function <- function(A){
   DT <- data.table(A)
   DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)] %>% `[`
   return(DT)}
# works
minimal_failing_function(c(1,2))
# fails
tidytable::pmap_dfr(.l = list(c(1,2)),
 .f = minimal_failing_function)


===
These should ideally give the same output, but do not. This also fails
using purrr::pmap_dfr rather than tidytable. I am using R 4.2.2 and I 
am on

Mac OS Ventura 13.1.

Thank you for any help you can provide or general guidance.


==
Michael Lachanski
PhD Student in Demography and Sociology
MA Candidate in Statistics
University of Pennsylvania
mikel...@sas.upenn.edu

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.

Hello,

Use map_dfr instead of pmap_dfr.


library(data.table)
library(tidytable)

minimal_failing_function <- function(A) {
   DT <- data.table(A)
   DT[ , A:= shift(A, fill = NA, type = "lag", n = 1)] %>% `[`
   return(DT)
}

# works
tidytable::map_dfr(.x = list(c(1,2)),
    .f = minimal_failing_function)
#> # A tidytable: 2 × 1
#>   A
#>   
#> 1    NA
#> 2 1


Hope this helps,

Rui Barradas

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] intersection in data frame

2022-10-13 Thread Dénes Tóth

Or if your data is really large, you can try data.table::dcast().

> library(data.table)
> dcast(ID ~ station, data = as.data.table(df1))
   ID xy xz
1: 12 15 20
2: 13 16 19

(Note: instead of `as.data.table()`, you can use `setDT` or create your 
object as a data.table in the first place.)



On 10/13/22 11:22 PM, Rui Barradas wrote:

Hello,

To reshape from long to wide format, here are two options:


df1 <- 'ID    station  value
12  xy    15
12  xz    20
13   xy   16
13   xz   19'
df1 <- read.table(textConnection(df1), header = TRUE)


# base R
reshape(df1, direction = "wide", idvar = "ID", timevar = "station")
#>   ID value.xy value.xz
#> 1 12   15   20
#> 3 13   16   19

# tidyverse
tidyr::pivot_wider(df1, ID, names_from = station)
#> # A tibble: 2 × 3
#>  ID    xy    xz
#>     
#> 1    12    15    20
#> 2    13    16    19


This question is StackOverflow question [1].

[1] 
https://stackoverflow.com/questions/5890584/how-to-reshape-data-from-long-to-wide-format 




Hope this helps,

Rui Barradas


Às 19:08 de 13/10/2022, Gábor Malomsoki escreveu:

Dears,

i need to create from a column of observations variables in a datafram 
like

this way:
example:
original:
ID    station  value
12  xy    15
12  xz    20
13   xy   16
13   xz   19

new df:

  ID  xy xz
12  15 20
13  16 19

i have been looking around for examples, but i could not find any how to
change my df.
I would like to make regression analysis on the values from different
production stations, so my df is very huge.

Please help on finding the package, description or anything else could 
help.


Thank you in advance!

Best regards
Malo

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] understanding as.list(substitute(...()))

2020-10-06 Thread Dénes Tóth

Hi Tim,

I have also asked a similar question a couple of months ago, and someone 
else did the same recently, maybe on r-devel.


We received no "official" response, but Deepayan Sarkar (R Core Team 
member) claimed that:


"
There is no documented reason for this to work (AFAIK), so again, I
would guess this is a side-effect of the implementation, and not a API
feature you should rely on. This is somewhat borne out by the
following:

> foo <- function(...) substitute({...()})
> foo(abc$de, fg[h], i)
{
   pairlist(abc$de, fg[h], i)
}
> foo(abc$de, fg[h], , i) # add a missing argument for extra fun
{
   as.pairlist(alist(abc$de, fg[h], , i))
}

which is not something you would expect to see at the user level. So
my recommendation: don't use ...() and pretend that you never
discovered it in the first place. Use match.call() instead, as
suggested by Serguei.

[Disclaimer: I have no idea what is actually going on, so these are
just guesses. There are some hints at
https://cran.r-project.org/doc/manuals/r-devel/R-ints.html#Dot_002ddot_002ddot-arguments
if you want to folllow up.]
"

Cheers,
Denes




On 10/6/20 8:38 AM, Tim Taylor wrote:

I probably need to be more specific.  What confuses me is not the use
of substitute, but the parenthesis after the dots.  It clearly works
and I can make guesses as to why but it is definitely not obvious.
The following function gives the same final result but I can
understand what is happening.

dots <- function (...) {
exprs <- substitute(list(...))
as.list(exprs[-1])
}

In the original, dots <- function(...) as.list(substitute(...())),
Does ...() get parsed in a special way?

Tim

On Tue, 6 Oct 2020 at 05:30, Bert Gunter  wrote:


You need to understand what substitute() does -- see ?substitute and/or a tutorial on 
"R computing on the language" or similar.

Here is a simple example that may clarify:


dots <- function(...) as.list(substitute(...()))
dots(log(foo))

[[1]]
log(foo)  ## a call, a language object


dots2 <- function(...) as.list(...)
dots2(log(foo))

Error in as.list(...) : object 'foo' not found
## substitute() does not evaluate its argument; as.list() does

Cheers,
Bert Gunter

"The trouble with having an open mind is that people keep coming along and sticking 
things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Oct 5, 2020 at 1:37 PM Tim Taylor  
wrote:


Could someone explain what is happening with the ...() of the
following function:

dots <- function(...) as.list(substitute(...()))

I understand what I'm getting as a result but not why.   ?dots and
?substitute leave me none the wiser.

regards
Tim

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] row combining 2972 files

2020-03-18 Thread Dénes Tóth




On 3/18/20 9:02 PM, Bert Gunter wrote:

Untested in the absence of example data, but I think

combined <- do.call(rbind, lapply(ls2972, function(x)get(x)[[2]]))


Or if you have largish data, use rbindlist() from the data.table package:

combined <- data.table::rbindlist(
  lapply(ls2972, function(x) get(x)[[2]])
)

However, it seems you are on the wrong track when you create 2972 lists 
in your workspace. (Note: there are no "list files" objects in R. Lists 
are objects, not files.) You should have one list of 2972 lists each 
having 4 data.frames.


E.g.:

x <- list(
  list(
data.frame(),
data.frame(x = 1),
data.frame(),
data.frame()
  ),
  list(
data.frame(),
data.frame(x = 2),
data.frame(),
data.frame()
  ),
  list(
data.frame(),
data.frame(x = 3),
data.frame(),
data.frame()
  )
)
keep <- lapply(x, "[[", 2L)
combined <- data.table::rbindlist(keep)


HTH,
Denes




should do it.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Wed, Mar 18, 2020 at 12:16 PM Yuan Chun Ding  wrote:


Hi R users,

I generated 2972 list files in R, each list includes four data frame files
, file names for those list file are VNTR13576, VNTR14689, etc.  the second
data frame in each list has the same 11 column names, but different number
of rows.

I can combine two dataframes by
list2972 <-ls(pat="VNTR.*.")
test <-rbind(get(list2972[16])[[2]],get(list2972[166])[[2]] )

I tried to combine all 2972 data frames from those 2972 list files using
do.call or lapply function, but not successful.

Can you help me?

Thank you very much!

Ding

--

-SECURITY/CONFIDENTIALITY WARNING-

This message and any attachments are intended solely for the individual or
entity to which they are addressed. This communication may contain
information that is privileged, confidential, or exempt from disclosure
under applicable law (e.g., personal health information, research data,
financial information). Because this e-mail has been sent without
encryption, individuals other than the intended recipient may be able to
view the information, forward it to others or tamper with the information
without the knowledge or consent of the sender. If you are not the intended
recipient, or the employee or person responsible for delivering the message
to the intended recipient, any dissemination, distribution or copying of
the communication is strictly prohibited. If you received the communication
in error, please notify the sender immediately by replying to this message
and deleting the message and any accompanying files from your system. If,
due to the security risks, you do not wish to receive further
communications via e-mail, please reply to this message and inform the
sender that you do not wish to receive further e-mail from the sender.
(LCP301)

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] "chi-square" | "chi-squared" | "chi squared" | "chi square" ?

2019-10-18 Thread Dénes Tóth

Dear Martin,

Others struggle with this inconsistency as well; I found this discussion 
useful: 
https://math.stackexchange.com/questions/1098138/chi-square-or-chi-squared


Denes


On 10/18/19 2:51 PM, Martin Maechler wrote:

As it's Friday ..

and I also really want to clean up help files and similar R documents,
both in R's own sources and in my new 'DPQ' CRAN package :

As a trained mathematician, I'm uneasy if a thing has
several easily confusable names, .. but as somewhat
humanistically educated person, I know that natural languages,
English in this case, are much more flexible than computer
languages or math...

Anyway, back to the question(s) .. which I had asked myself a
couple of months ago, and already remained slightly undecided:

The 0-th (meta-)question of course is

   0. Is it worth using only one written form for the
  χ² - distribution, e.g. "everywhere" in R?

The answer is not obvious, as already the first few words of the
(English) Wikipedia clearly convey:

The URL is  https://en.wikipedia.org/wiki/Chi-squared_distribution
and the main title therefore also
 "Chi-squared distribution"

Then it reads


This article is about the mathematics of the chi-squared
distribution. For its uses in statistics, see chi-squared
test. For the music [...]



In probability theory and statistics, the chi-square
distribution (also chi-squared or χ2-distribution) with k
degrees of freedom is the distribution of a sum of the squares
of k independent standard normal random variables.



The chi-square distribution is a special case of the gamma
distribution and is one of the most widely used probability
distributions in inferential statistics, notably in hypothesis
testing []
[]


So, in title and 1st paragraph its "chi-squared", but then
everywhere(?) the text used "chi-square".

Undoubtedly, Wilson & Hilferty (1931) has been an important
paper and they use "Chi-square" in the title;
also  Johnson, Kotz & Balakrishnan (1995)
see R's help page ?pchisq use  "Chi-square" in the title of
chapter 18 and then, diplomatically for chapter 29,
  "Noncentral χ²-Distributions" as title.

So it seems, that historically and using prestigious sources,
"chi-square" to dominate (notably if we do not count "χ²" as an
alternative).

Things look a bit different when I study R's sources; on one
hand, I find all 4 forms (s.Subject); then in the "R source
history", I see

   $ svn log -c11342
   
   r11342 | <> | 2000-11-14 ...

   Use `chi-squared'.
   

which changed 16 (if I counted correctly) cases of 'chi-square' to 
'chi-squared'.

I have not found any R-core internal (or public) reasoning about
that change, but had kept it in mind and often worked along that "goal".

As a consequence, "statistically" speaking, much of R's own use has been
standardized to use "chi-squared"; but as I mentioned, I still
find all  4  variants even in "R base" package help files
(which of course I now could quite quickly change  (using Emacs M-x grep, plus 
a script);
but

... "as it is Friday" ... I'm interested to hear what others
think, notably if you are native English (or "American" ;-)
speaking and/or have some extra good knowledge on such
matters...

Martin Maechler
ETH Zurich

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to create a new column based on the values from multiple columns which are matching a particular string?

2019-07-29 Thread Dénes Tóth

Hi Bert,

see inline.

On 7/30/19 1:12 AM, Bert Gunter wrote:

While Eric's solution is correct( mod "corner" cases like all NA's in
a row), it can be made considerably more efficient.

One minor improvement can be made by using the idiom
any(x == "A")
instead of matching via %in% for the simple case of matching just a
single value.

However, a considerable improvement can be made by getting fancy,
taking advantage of do.call() and the pmax() function to mostly
vectorize the calculation. Here are the details and timing on a large
data frame.

(Note: I removed the names in the %in% approach for simplicity. It has
almost no effect on timings.
I also moved the as.integer() call out of the function so that it is
called only once at the end, which improves efficiency a bit)

1. Eric's original:
fun1 <-function(df,what)
{
   as.integer(unname(apply(df,MARGIN = 1,function(v) { what %in% v })))
}

2. Using any( x == "A") instead:
fun2 <- function(df,what)
{
as.integer(unname(apply(df,MARGIN =1, function(x)any(x == what,
na.rm=TRUE
}

3. Getting fancy to use pmax()
fun3 <- function(df,what)
{
z <- lapply(df,function(x)as.integer((x==what)))
do.call(pmax,c(z,na.rm=TRUE))
}

Here are the timings:


bigdf <- df[rep(1:10,1e4), rep(1:5, 50)]
dim(bigdf)

[1] 10250


system.time(res1 <- fun1(bigdf, "A"))

user  system elapsed
   2.204   0.432   2.637


system.time(res2 <- fun2(bigdf, "A"))

user  system elapsed
   1.898   0.403   2.302


system.time(res3 <- fun3(bigdf, "A"))

user  system elapsed
   0.187   0.048   0.235

## 10 times faster!



all.equal(res1,res2)

[1] TRUE

all.equal(res1,res3)

[1] TRUE


NB: I freely admit that Eric's original solution may well be perfectly
adequate, and the speed improvement is pointless. In that case, maybe
this is at least somewhat instructive for someone.

Nevertheless, I would welcome further suggestions for improvement, as
I suspect my "fancy" approach is still a ways from what one can do (in
R code, without resorting to C++).


fun4 <- function(df, what)
{
  as.integer(rowSums(df == what, na.rm = TRUE) > 0)
}

The function above works for data.frame and matrix inputs as well. It is 
slower than fun3() if 'df' is a data.frame, but is faster if 'df' is a 
matrix (which is a more efficient representation of the data if it 
contains only character columns).


A note to Ana: 'df' is the name of a function in R (see ?stats::df); not 
a perfect choice for a variable name.


Cheers,
Denes




Cheers,
Bert


Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Jul 29, 2019 at 12:38 PM Eric Berger  wrote:


Read the help for apply and %in%

?apply
?%in%


Sent from my iPhone


On 29 Jul 2019, at 22:23, Ana Marija  wrote:

Thank you so much! Just to confirm here MARGIN=1 indicates that "A" should 
appear at least once per row?


On Mon, Jul 29, 2019 at 1:53 PM Eric Berger  wrote:
df$case <- apply(df,MARGIN = 1,function(v) { as.integer("A" %in% v) })



On Mon, Jul 29, 2019 at 9:02 PM Ana Marija  wrote:
sorry my bad, here is the edited version:

so the data frame is this:

df=data.frame(
   eye_problemsdisorders_f6148_0_1=c("A","C","D",NA,"D","A","C",NA,"B","A"),
   eye_problemsdisorders_f6148_0_2=c("B","C",NA,"A","C","B",NA,NA,"A","D"),
   eye_problemsdisorders_f6148_0_3=c("C","A","D","D","B","A",NA,NA,"A","B"),
   eye_problemsdisorders_f6148_0_4=c("D","D",NA,"B","A","C",NA,"C","A","B"),
   eye_problemsdisorders_f6148_0_5=c("C","C",NA,"D","B","C",NA,"D","D","B")

and I would need to put inside the column which would be named "case" and 
values inside would be: 1,1,0,1,1,1,0,0,1,1

so "case" column is where value "A" can be found in any column.


On Mon, Jul 29, 2019 at 12:53 PM Eric Berger  wrote:
You may have a typo/misstatement in your question.
You define a data frame with 5 columns, each of which has 10 elements, so your 
data frame has dimensions 10 x 5.
Then you request a new COLUMN which will have only 5 elements, which is not 
allowed. All columns of a data frame
must have the same length.


On Mon, Jul 29, 2019 at 8:42 PM Ana Marija  wrote:
I have data frame which looks like this:

df=data.frame(
   eye_problemsdisorders_f6148_0_1=c(A,C,D,NA,D,A,C,NA,B,A),
   eye_problemsdisorders_f6148_0_2=c(B,C,NA,A,C,B,NA,NA,A,D),
   eye_problemsdisorders_f6148_0_3=c(C,A,D,D,B,A,NA,NA,A,B),
   eye_problemsdisorders_f6148_0_4=c(D,D,NA,B,A,C,NA,C,A,B),
   eye_problemsdisorders_f6148_0_5=c(C,C,NA,D,B,C,NA,D,D,B))

In reality I have much more columns and they don't always match
"eye_problemsdisorders_f6148" this string, and there is much more rows.

What I would like to do is create a new column, say named "case" where I
would have value "1" for 

Re: [R] need help in if else condition

2019-07-10 Thread Dénes Tóth




On 7/10/19 5:54 PM, Richard O'Keefe wrote:

Expectation: ifelse will use the same "repeat vectors to match the longest"
rule that other vectorised functions do.  So
a <- 1:5
b <- c(2,3)
ifelse(a < 3, 1, b)
=> ifelse(T T F F F <<5>>, 1 <<1>>, 2 3 <<2>>)
=> ifelse(T T F F F <<5>>, 1 1 1 1 1 <<5>>, 2 3 2 3 2 <<5>>)
=> 1 1 2 3 2
and that is indeed the answer you get.  Entirely predictable and consistent
with
other basic operations in R.

The only tricky thing I see is that R has
a strict vectorised  ifelse(logical.vector, some.vector, another.vector)
AND
a non-strict non-vectorised if (logical.scalar) some.value else
another.value
AND
a statement form if (logical.scalar) stmt.1; else stmt.2;


Just for the records, there is a further form:
`if`(logical.scalar, stmt.1, stmt.2)

The main problem with ifelse is that 1) it is very slow, and 2) the mode 
of its return value can be unintuitive or not too predictable (see also 
the Value and Warning sections of ?ifelse). One has to be very careful 
and ensure that 'yes' and 'no' vectors have the same class, because 
ifelse will not warn you at all:

> ifelse(c(TRUE, TRUE), 1:2, LETTERS[1:2])
[1] 1 2
> ifelse(c(TRUE, FALSE), 1:2, LETTERS[1:2])
[1] "1" "B"

For options instead of base::ifelse, you might find this discussion helpful:
https://github.com/Rdatatable/data.table/issues/3657


Cheers,
Denes





On Thu, 11 Jul 2019 at 01:47, Eric Berger  wrote:


For example, can you predict what the following code will do?

a <- 1:5
b <- c(2,3)
ifelse( a < 3, 1, b)



On Wed, Jul 10, 2019 at 4:34 PM José María Mateos 
wrote:


On Wed, Jul 10, 2019, at 04:39, Eric Berger wrote:

1. The ifelse() command is a bit tricky in R. Avoiding it is often a

good

policy.


You piqued my curiosity, can you elaborate a bit more on this?

--
José María (Chema) Mateos || https://rinzewind.org

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] missRanger package

2018-11-13 Thread Dénes Tóth

Hi Rebecca,

I think it was me how suggested you the missRanger package, so this is 
actually a follow-up of you previous question about censored imputation 
of missing values (as far as I can remember).


The missRanger package uses predictive mean matching, so take a look at 
?missRanger::pmm and in general read a bit about what 'predictive mean 
matching' means.


In a nutshell: If your data is appropriate for this technique, you do 
not need to take care of explicit censoring - it will be done implicitly 
by the package.


Cheers,
Denes


On 11/12/2018 09:12 PM, Bert Gunter wrote:

You have asked what I believe is an incoherent question, and thus are
unlikely to receive any useful replies (of course, I may be wrong about
this...).

Please read and follow the posting guide linked below to to ask a question
that can be answered.


Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Nov 12, 2018 at 12:03 PM Rebecca Bingert 
wrote:


Hi,
does anybody know where I need to insert the censoring in the missRanger
package?
Regards,
Rebecca

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] randomForrest-imputation

2018-11-09 Thread Dénes Tóth

Hi,

The missRanger package performs predictive mean matching which should 
generate positive values if the non-missing values are positive.


Regards,
Denes


On 11/09/2018 01:11 PM, Rebecca Bingert wrote:

Hi!

How can I generate only positive data with randomForrest-imputation? I'm
working with laboratory values which are always positive.

Can anybody help out?

Thanks!

(P.S.: I needed to send this request again because there were problems 
by delivering it)


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding unique terms

2018-10-12 Thread Dénes Tóth




On 10/12/2018 04:36 AM, Jeff Newmiller wrote:

You said "add up"... so you did not mean to say that? Denes computed the mean...


Nice catch, Jeff. Of course I wanted to use 'sum' instead of 'mean'.




On October 11, 2018 3:56:23 PM PDT, roslinazairimah zakaria 
 wrote:

Hi Denes,

It works perfectly as I want!

Thanks a lot.

On Fri, Oct 12, 2018 at 6:29 AM Dénes Tóth 
wrote:




On 10/12/2018 12:12 AM, roslinazairimah zakaria wrote:

Dear r-users,

I have this data:

structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class =

"factor"),

  COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
  4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
  "BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class

=

"factor"),
  PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
  82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
  100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
  41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
  X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names =

c("STUDENT_ID",

"COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
"data.frame", row.names = c(NA,
-11L))

I want to combine the same Student ID and add up all the values for

PO1M,

PO1T,...,PO2T obtained by the same ID.


dat <- structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L,

1L,

2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class =

"factor"),

  COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
  4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
  "BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class =
"factor"),
  PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
  82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
  100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
  41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
  X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
  NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names =

c("STUDENT_ID",

"COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
"data.frame", row.names = c(NA,
-11L))

# I assume you would like to add up the values with na.rm = TRUE
meanFn <- function(x) mean(x, na.rm = TRUE)

# see ?aggregate
aggregate(dat[, c("PO1M", "PO1T", "PO2M")],
by = dat["STUDENT_ID"],
FUN = meanFn)

# if you have largish or large data
library(data.table)
dat2 <- as.data.table(dat)
dat2[, lapply(.SD, meanFn),
   by = STUDENT_ID,
   .SDcols = c("PO1M", "PO1T", "PO2M")]


Regards,
Denes




How do I do that?
Thank you for any help given.







__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding unique terms

2018-10-11 Thread Dénes Tóth




On 10/12/2018 12:12 AM, roslinazairimah zakaria wrote:

Dear r-users,

I have this data:

structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class = "factor"),
 COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
 4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
 "BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class =
"factor"),
 PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
 82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
 100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
 41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
 X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
 NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("STUDENT_ID",
"COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
"data.frame", row.names = c(NA,
-11L))

I want to combine the same Student ID and add up all the values for PO1M,
PO1T,...,PO2T obtained by the same ID.


dat <- structure(list(STUDENT_ID = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L), .Label = c("AA15285", "AA15286"), class = "factor"),
COURSE_CODE = structure(c(1L, 2L, 5L, 6L, 7L, 8L, 2L, 3L,
4L, 5L, 6L), .Label = c("BAA1113", "BAA1322", "BAA2113",
"BAA2513", "BAA2713", "BAA2921", "BAA4273", "BAA4513"), class =
"factor"),
PO1M = c(155.7, 48.9, 83.2, NA, NA, NA, 48.05, 68.4, 41.65,
82.35, NA), PO1T = c(180, 70, 100, NA, NA, NA, 70, 100, 60,
100, NA), PO2M = c(NA, NA, NA, 37, NA, NA, NA, NA, NA, NA,
41), PO2T = c(NA, NA, NA, 50, NA, NA, NA, NA, NA, NA, 50),
X = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA), X.1 = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("STUDENT_ID",
"COURSE_CODE", "PO1M", "PO1T", "PO2M", "PO2T", "X", "X.1"), class =
"data.frame", row.names = c(NA,
-11L))

# I assume you would like to add up the values with na.rm = TRUE
meanFn <- function(x) mean(x, na.rm = TRUE)

# see ?aggregate
aggregate(dat[, c("PO1M", "PO1T", "PO2M")],
  by = dat["STUDENT_ID"],
  FUN = meanFn)

# if you have largish or large data
library(data.table)
dat2 <- as.data.table(dat)
dat2[, lapply(.SD, meanFn),
 by = STUDENT_ID,
 .SDcols = c("PO1M", "PO1T", "PO2M")]


Regards,
Denes




How do I do that?
Thank you for any help given.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Erase content of dataframe in a single stroke

2018-09-27 Thread Dénes Tóth

Hi Luigi,

Actually I doubt that the original problem you try to solve requires the 
initialization of an empty data.frame with a particular structure. 
However, if you think you really need this step, I would write a 
function for it and also consider edge cases.


getSkeleton <- function(x, drop_levels = FALSE) {
  out <- x[numeric(0L), , drop = FALSE]
  if (isTRUE(drop_levels)) out <- droplevels(out)
  out
}

Note that it retains or drops factor levels depending on 'drop_levels'. 
It only matters if you have factors in your data.frame.
'drop = FALSE' is required to guard against silent conversion to a 
vector if 'x' has only one column.


Regards,
Denes



On 09/27/2018 11:11 AM, Jan van der Laan wrote:

Or

testdf <- testdf[FALSE, ]

or

testdf <- testdf[numeric(0), ]

which seems to be slightly faster.

Best,
Jan


Op 27-9-2018 om 10:32 schreef PIKAL Petr:

Hm

I would use


testdf<-data.frame(A=c(1,2),B=c(2,3),C=c(3,4))
str(testdf)

'data.frame':   2 obs. of  3 variables:
  $ A: num  1 2
  $ B: num  2 3
  $ C: num  3 4

testdf<-testdf[-(1:nrow(testdf)),]
str(testdf)

'data.frame':   0 obs. of  3 variables:
  $ A: num
  $ B: num
  $ C: num

Cheers
Petr


-Original Message-
From: R-help  On Behalf Of Jim Lemon
Sent: Thursday, September 27, 2018 10:12 AM
To: Luigi Marongiu ; r-help mailing list 

project.org>
Subject: Re: [R] Erase content of dataframe in a single stroke

Ah, yes, try 'as.data.frame" on it.

Jim

On Thu, Sep 27, 2018 at 6:00 PM Luigi Marongiu 


wrote:

Thank you Jim,
this requires the definition of an ad hoc function; strange that R
does not have a function for this purpose...
Anyway, it works but it changes the structure of the data. By
redefining the dataframe as I did, I obtain:


df

[1] A B C
<0 rows> (or 0-length row.names)

str(df)

'data.frame': 0 obs. of  3 variables:
  $ A: num
  $ B: num
  $ C: num

When applying your function, I get:


df

$A
NULL

$B
NULL

$C
NULL


str(df)

List of 3
  $ A: NULL
  $ B: NULL
  $ C: NULL

The dataframe has become a list. Would that affect downstream

applications?

Thank you,
Luigi
On Thu, Sep 27, 2018 at 9:45 AM Jim Lemon 

wrote:

Hi Luigi,
Maybe this:

testdf<-data.frame(A=1,B=2,C=3)

testdf

  A B C
1 1 2 3
toNull<-function(x) return(NULL)
testdf<-sapply(testdf,toNull)

Jim
On Thu, Sep 27, 2018 at 5:29 PM Luigi Marongiu

 wrote:

Dear all,
I would like to erase the content of a dataframe -- but not the
dataframe itself -- in a simple and fast way.
At the moment I do that by re-defining the dataframe itself in 
this way:



df <- data.frame(A = numeric(),

+   B = numeric(),
+   C = character())

# assign
A <- 5
B <- 0.6
C <- 103
# load
R <- cbind(A, B, C)
df <- rbind(df, R)
df

   A   B   C
1 5 0.6 103

# erase
df <- data.frame(A = numeric(),

+  B = numeric(),
+  C = character())

df

[1] A B C
<0 rows> (or 0-length row.names)
Is there a way to erase the content of the dataframe in a simplier
(acting on all the dataframe at once instead of naming each column
individually) and nicer (with a specific erasure command instead
of re-defyining the object itself) way?

Thank you.
--
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Best regards,
Luigi

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů 
obchodních partnerů PRECHEZA a.s. jsou zveřejněny na: 
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information 
about processing and protection of business partner’s personal data 
are available on website: 
https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou 
důvěrné a podléhají tomuto právně závaznému prohláąení o vyloučení 
odpovědnosti: https://www.precheza.cz/01-dovetek/ | This email and any 
documents attached to it may be confidential and are subject to the 
legally binding disclaimer: https://www.precheza.cz/en/01-disclaimer/


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE 

Re: [R] Creatng new variable based upon conditions

2018-07-26 Thread Dénes Tóth




On 07/26/2018 08:58 PM, JEFFERY REICHMAN wrote:

Given something like ...

x <- c(3,2,4,3,5,4,3,2,4,5)
y <- c("A","B","B","A","A","A","A","B","A","B")
xy <- data.frame(x,y)
xy$w <- ifelse(xy$y=="A",xy$w[,x]*10,xy$w[,x]*15 )


You should learn the basics about how to extract or replace part of an 
object, in particular data.frames. You can start by reading the help 
page of ?"Extract".


xy$w <- ifelse(xy$y=="A",xy$x*10,xy$x*15 )

HTH,
Denes




want to see

x y  w
1  3 A 30
2  2 B  30
3  4 B  60
4  3 A  30
5  5 A  50
6  4 A  40
7  3 A  30
8  2 B  30
9  4 A  40
10 5 B  75

but I get NA's

Jeff

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] initiate elements in a dataframe with lists

2018-07-25 Thread Dénes Tóth




On 07/25/2018 10:23 AM, Ivan Calandra wrote:

Just for my understanding:
Is a data.frame with list columns still a data.frame? Isn't it then a list?


A data.frame is a list of equally sized vectors - that is, each vector 
must be of the same length. It is not required that the vector is an 
atomic vector; it can be a list, too. By having equally sized vectors in 
a list you can arrange the list in a two-dimensional matrix-like format, 
append row names to them, and you get a data.frame.


Principally data.frame(x = 1:3, y = list(1:2, 1:3, 1:4)) should work, 
but it doesn't, as it was recognized by others, too:

https://stackoverflow.com/questions/9547518/create-a-data-frame-where-a-column-is-a-list

Cheers,
Denes




Ivan

--
Dr. Ivan Calandra
TraCEr, laboratory for Traceology and Controlled Experiments
MONREPOS Archaeological Research Centre and
Museum for Human Behavioural Evolution
Schloss Monrepos
56567 Neuwied, Germany
+49 (0) 2631 9772-243
https://www.researchgate.net/profile/Ivan_Calandra

On 25/07/2018 09:56, Juan Telleria Ruiz de Aguirre wrote:

Check tidyverse's purrr package:

https://github.com/rstudio/cheatsheets/raw/master/purrr.pdf

In the second page of the cheatsheet there is info on how to create list
columns within a data.frame :)

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Regexp bug or misunderstanding

2018-07-02 Thread Dénes Tóth

Hi Martin,

I assume you want to check whether a particular character string 
contains a digit. In this case you should use the following pattern: 
"[[:digit:]]" instead of "[:digit:]".


From ?regex:
"A character class is a list of characters enclosed between [ and ] 
which matches any single character in that list; unless the first 
character of the list is the caret ^, when it matches any character not 
in the list. ... Certain named classes of characters are predefined... 
For example, [[:alnum:]] means [0-9A-Za-z]"


So if you use simply "[:digit:]" as a pattern, it means: a character 
string which contains any of the following characters: ':', 'd', 'i', 
'g', 't'. Your second test case contains 'd', whereas the first case 
contains neither of the above characters.


HTH,
Denes



On 07/02/2018 02:52 PM, Martin Møller Skarbiniks Pedersen wrote:

Hi,

Have I found a bug in R? Or misunderstood something about grep() ?

Case 1 gives the expected output
Case 2 does not gives the expected output.
I expected integer(0) also for this case.

case 1:
grep("[:digit:]", "**ABAAbabaabackabaloneaban")
integer(0)

case 2:
grep("[:digit:]", "**ABAAbabaabackabaloneaband")
[1] 1

Regards
Martin

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Package 'data.table' in version R-3.5.0 not successfully being installed

2018-04-26 Thread Dénes Tóth

You might find this discussion useful, too:
https://github.com/Rdatatable/data.table/issues/2797


On 04/26/2018 11:01 PM, Henrik Bengtsson wrote:

If you're installing packages to the default location in your home
account and you didn't remove those library folders, you still have
you R 3.4 package installs there, e.g.


dir(dirname(.libPaths()[1]), full.names = TRUE)

[1] "/home/hb/R/x86_64-pc-linux-gnu-library/3.4"
[2] "/home/hb/R/x86_64-pc-linux-gnu-library/3.5"
[3] "/home/hb/R/x86_64-pc-linux-gnu-library/3.6"

/Henrik

On Thu, Apr 26, 2018 at 11:41 AM, Akhilesh Singh
 wrote:

You are right. I do take backups. But, this time I was too sure that
nothing will go wrong. But, this was over-confidence. I need to take more
care in future. Thanks anyway.

With regards,

Dr. A.K. Singh

On Thu 26 Apr, 2018, 11:49 PM Duncan Murdoch, 
wrote:


On 26/04/2018 1:54 PM, Akhilesh Singh wrote:

My thanks to Dr. John Fox and Dr. Duncan Murdoch. But, I have upgraded
all my R-3.4.3 libraries to R-3.5.0, and I have not backed-up copies of
old version. So, I would give a try each to the solutions suggested by
John Fox and Dengan Murdoch.


Here is some unsolicited advice:  I would strongly recommend that you
make it a higher priority to have backups available.  In my experience
computer hardware is becoming quite reliable, but software isn't, and
the person next to the keyboard isn't either.  (My last desperate need
for a backup was due to a hardware failure 2 years ago, but it wasn't
the manufacturer's fault:  my laptop accidentally drowned.)

Backups can save you a lot of grief in the event of a mistake, or a
software or hardware failure.  But even in the case of routine events
like software updates that don't go as planned, they can save time.

Duncan Murdoch




With regards,

Dr. A.K. Singh

On Thu 26 Apr, 2018, 9:44 PM Duncan Murdoch, > wrote:

 On 26/04/2018 10:33 AM, Fox, John wrote:
  > Dear A.K. Singh,
  >
  > As you discovered, the data.table package has an error under R
 3.5.0 that prevents CRAN from distributing a Windows binary for the
 package. The reason that you weren't able to install the package
 from source is apparently that you haven't installed the R
 package-building tools for Windows. See
 .
  >
  > Because a number of users of my Rcmdr and car packages have
 contacted me with a similar issue, as a temporary work-around I've
 placed a Windows binary for the data.table package on my website at
 <

https://socialsciences.mcmaster.ca/jfox/.Pickup/data.table_1.10.4-3.zip>.

 You should be able to install the package from there via the command
  >
  >
   install.packages("

https://socialsciences.mcmaster.ca/jfox/.Pickup/data.table_1.10.4-3.zip;,
repos=NULL, type="win.binary")

  >
  > I expect that this problem will go away when the maintainer of
 the data.table package fixes the error.

 You can see the errors in the package on this web page:

 https://cloud.r-project.org/web/checks/check_results_data.table.html

 Currently it is failing self-tests on all platforms except r-oldrel,
 which is the previous release of R.  I'd recommend backing out of R
 3.5.0 and going to R 3.4.4 if that's a possibility for you.

 Yet another possibility is to use a version of data.table from

Github,

 which is newer than the version on CRAN and may have fixed the

errors,

 but that would require an installation from source, which not every
 Windows user is comfortable with.

 Duncan Murdoch


On 26-Apr-2018 9:44 PM, "Duncan Murdoch" > wrote:

 On 26/04/2018 10:33 AM, Fox, John wrote:
  > Dear A.K. Singh,
  >
  > As you discovered, the data.table package has an error under R
 3.5.0 that prevents CRAN from distributing a Windows binary for the
 package. The reason that you weren't able to install the package
 from source is apparently that you haven't installed the R
 package-building tools for Windows. See
 .
  >
  > Because a number of users of my Rcmdr and car packages have
 contacted me with a similar issue, as a temporary work-around I've
 placed a Windows binary for the data.table package on my website at
 <

https://socialsciences.mcmaster.ca/jfox/.Pickup/data.table_1.10.4-3.zip>.

 You should be able to install the package from there via the command
  >
  >
   install.packages("

https://socialsciences.mcmaster.ca/jfox/.Pickup/data.table_1.10.4-3.zip;,
repos=NULL, type="win.binary")

  >
  > I expect that this problem will go away when the maintainer of
 the data.table package fixes the error.

 You can see the errors in the package on 

Re: [R] Portable R in zip file for Windows

2018-01-26 Thread Dénes Tóth

Hi Juan,

you might find this useful: https://sourceforge.net/projects/rportable/

Cheers,
Denes

On 01/26/2018 11:57 AM, Juan Manuel Truppia wrote:

Pretty good question Gabor. I can execute R once it is installed (if
someone with rights installs it before) but not the installer. I can
download the installer (with some pain). I know that some installers are
actually compressed files in disguise, but I think this is not the case
with R, right?
I will study the exact nature of the restriction, and get back to you.
Nevertheless, having a installer and a "portable" version is something
pretty common (R Studio, Notepad++ and 7Zip pop to my mind now) and pretty
helpful to deal with security restrictions, so I thought R had one,
somewhere.

On Fri, Jan 26, 2018, 00:49 Gabor Grothendieck 
wrote:


Can you clarify what the nature of the security restriction is?

If you can't run the R installer then how it is that you could run R?
That would still involve running an external exe even if it came
in a zip file.

Could it be that the restriction is not on running exe files but on
downloading them?

If that is it then there are obvious workarounds (rename it not
to have an exe externsion or zip it using another machine,
upload to the cloud and download onto the restricted machine)
but it might be safer to just ask the powers that be to download it
for you.  You probably don't need a new version of R more than
once a year.


On Thu, Jan 25, 2018 at 3:04 PM, Juan Manuel Truppia
 wrote:

What is wrong with you guys? I asked for a zip, like R Studio has for
example. Totally clear.
I cant execute exes. But I can unzip files.
Thanks Gabor, I had that in mind, but can't execute the exe due to

security

restrictions.
Geez, really, treating people who ask questions this way just makes you
don't want to ask a single one.


On Thu, Jan 25, 2018, 11:19 Gabor Grothendieck 
wrote:


I believe that the ordinary Windows installer for R can produce a
portable result by choosing the appropriate configuration options from

the

offered screens when you run the installer  Be sure to enter the desired
path in the Select Destination Location screen, choose Yes on the
Startup options screen and ensure that all boxes are unchecked on the
Select additional tasks screen.

On Wed, Jan 24, 2018 at 10:11 PM, Juan Manuel Truppia
 wrote:

I read a message from 2009 or 2010 where it mentioned the availability
of R
for Windows in a zip file, no installation required. It would be very
useful for me. Is this still available somewhere?

Thanks

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.




--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com




--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com



[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Faster Subsetting

2016-09-28 Thread Dénes Tóth

Hi Harold,

Generally: you can not beat data.table, unless you can represent your 
data in a matrix (or array or vector). For some specific cases, Hervé's 
suggestion might be also competitive.
Your problem is that you did not put any effort to read at least part of 
the very extensive documentation of the data.table package. You should 
start here: https://github.com/Rdatatable/data.table/wiki/Getting-started


To put in a nutshell: use a key which allows binary search instead of 
the much-much slower vector scan. (With the automatic auto-indexing 
feature of the data.table package, you may even skip this step.) The 
point is that creating the key must be done only once, and all 
subsequent subsetting operations which use the key become incredibly 
fast. You missed this point because you replicated the creation of the 
key as well, not only the subsetting in one of your examples.


Here is a version of Herve's example (OK, it is a bit biased because 
data.table has a highly optimized internal version of mean() for 
calculating the group means):


## create a keyed data.table
tmp_dt <- data.table(id = rep(1:2, each = 10), foo = rnorm(20), 
key = "id")

system.time(tmp_dt[, .(result = mean(foo)), by = id])
# user system elapsed
# 0.004 0.000 0.005

## subset a keyed data.table
all_ids <- tmp_dt[, unique(id)]
select_id <- sample(all_ids, 1)
system.time(tmp_dt[.(select_id)])
# user system elapsed
# 0.000 0.000 0.001

## or equivalently
system.time(tmp_dt[id == select_id])
# user system elapsed
# 0.000 0.000 0.001

Note: the CRAN version of the data.table package is already very fast, 
but you should try the developmental version ( 
devtools::install_github("Rdatatable/data.table") ) for multi-threaded 
subsetting.



Cheers,
Denes


On 09/28/2016 08:53 PM, Hervé Pagès wrote:
> Hi,
>
> I'm surprised nobody suggested split(). Splitting the data.frame
> upfront is faster than repeatedly subsetting it:
>
>tmp <- data.frame(id = rep(1:2, each = 10), foo = rnorm(20))
>idList <- unique(tmp$id)
>
>system.time(for (i in idList) tmp[which(tmp$id == i),])
>#   user  system elapsed
># 16.286   0.000  16.305
>
>system.time(split(tmp, tmp$id))
>#   user  system elapsed
>#  5.637   0.004   5.647
>
> Cheers,
> H.
>
> On 09/28/2016 09:09 AM, Doran, Harold wrote:
>> I have an extremely large data frame (~13 million rows) that resembles
>> the structure of the object tmp below in the reproducible code. In my
>> real data, the variable, 'id' may or may not be ordered, but I think
>> that is irrelevant.
>>
>> I have a process that requires subsetting the data by id and then
>> running each smaller data frame through a set of functions. One
>> example below uses indexing and the other uses an explicit call to
>> subset(), both return the same result, but indexing is faster.
>>
>> Problem is in my real data, indexing must parse through millions of
>> rows to evaluate the condition and this is expensive and a bottleneck
>> in my code.  I'm curious if anyone can recommend an improvement that
>> would somehow be less expensive and faster?
>>
>> Thank you
>> Harold
>>
>>
>> tmp <- data.frame(id = rep(1:200, each = 10), foo = rnorm(2000))
>>
>> idList <- unique(tmp$id)
>>
>> ### Fast, but not fast enough
>> system.time(replicate(500, tmp[which(tmp$id == idList[1]),]))
>>
>> ### Not fast at all, a big bottleneck
>> system.time(replicate(500, subset(tmp, id == idList[1])))
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Treating a vector of characters as object names to create list

2016-09-04 Thread Dénes Tóth



On 09/05/2016 12:07 AM, Bert Gunter wrote:

Time for an R tutorial or two to learn how to use the "apply" family
in R. I think what you want is:

merged_list <- lapply(merging, get)



Or even:
named_merged_list <- mget(merging)

Anyway, probably you could arrive to a list of parameters directly. 
(E.g., if you import the parameter values from an external source or if 
they are the return values of a function, etc.).




-- Bert



Bert Gunter

"The trouble with having an open mind is that people keep coming along
and sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Sun, Sep 4, 2016 at 2:57 PM, Ryan Utz  wrote:

Hello,

I have a vector of characters that I know will be object names and I'd like
to treat this vector as a series of names to create a list. But, for the
life of me, I cannot figure out how to treat a vector of characters as a
vector of object names when creating a list.

For example, this does exactly what I want to do (with 'merged.parameters'
as the end goal):

###
merging=c('alkalinity','iron')
alkalinity=c('39086','29801','90410','00410')
iron=c('01045','01046')
merged.parameters=list(alkalinity,iron)
###

But, say I have many, many parameters in 'merging' beyond alkalinity and
iron and I'd like to just cleanly turn the elements in 'merging' into a
list. This does not work:

###
merged.parameters=list(get(merging))
###

because it's only grabbing the first element of 'merging', for some reason.
Any advice? This feels like it really should be easy...

--

Ryan Utz, Ph.D.
Assistant professor of water resources
*chatham**UNIVERSITY*
Home/Cell: (724) 272-7769

 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Matrix

2016-07-16 Thread Dénes Tóth



On 07/17/2016 01:39 AM, Duncan Murdoch wrote:

On 16/07/2016 6:25 PM, Ashta wrote:
 > Hi all,
 >
 > I have a large square matrix (60 x 60)  and found it hard to
 > visualize. Is it possible to change it  as shown below?
 >
 > Sample example (3 x 3)
 >
 > A   B   C
 > A  3   4   5
 > B  4   7   8
 > C  5   8   9
 >
 > Desired output
 > A A  3
 > A B  4
 > A C  5
 > B B  7
 > B C  8
 > C C  9

Yes, use matrix indexing.  I don't think the 3600 values are going to be
very easy to read, but here's how to produce them:

m <- matrix(1:3600, 60, 60)
indices <- expand.grid(row = 1:60, col = 1:60)
cbind(indices$row, indices$col, m[as.matrix(indices)])



Or use as.data.frame.table():

m <- matrix(1:9, 3, 3,
dimnames = list(dimA = letters[1:3],
dimB = letters[1:3]))
m
as.data.frame.table(m, responseName = "value")

---

I do not know what you mean by "visualize", but image() or heatmap() are 
good starting points if you need a plot of the values. If you really 
need to inspect the raw values, you can try interactive (scrollable) 
tables, e.g.:


library(DT)
m <- provideDimnames(matrix(1:3600, 60, 60))
datatable(m, options = list(pageLength = 60))


Cheers,
  Denes




Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Find mean of values in three-dimensional array

2016-06-15 Thread Dénes Tóth



On 06/15/2016 09:05 PM, peter dalgaard wrote:
>
>> On 15 Jun 2016, at 19:37 , Nick Tulli  wrote:
>>
>> Hey R-Help,
>>
>> I've got a three dimensional array which I pulled from a netcdf file.
>> The data in array are the humidity values of locations in the United
>> States over a time period. The three dimensions are [longitude,
>> latitude, days], 141x81x92. My goal is to find the mean value at each
>> longitude/latitude over the 92 day period.
>>
>> I could probably accomplish my goal by running a loop, but I'm sure
>> that there is a much easier and more efficient way to accomplish the
>> goal in R. Any suggestions?
>
> Dunno about fast, but the canonical way is apply(A, c(1,2), mean)

For "mean" and "sum", row/colMeans() is pretty fast and efficient. Note 
the 'dims' argument; you might also consider the aperm() function before 
the aggregation.


E.g.:

# create an array
x <- provideDimnames(array(rnorm(141*81*92), c(141, 81, 92)))
names(dimnames(x)) <- c("long", "lat", "days")

# collapse over days
str(rowMeans(x, dims = 2))

# collapse over lat
x_new <- aperm(x, c("lat", "long", "days"))
str(colMeans(x_new))

Cheers,
Denes


>
> E.g.
>
> (A <- array(1:24,c(2,3,4)))
> apply(A, c(1,2), mean)
> apply(A, c(1,3), mean)
>
> -pd
>
>>
>>
>> Thanks guys.
>>
>> __
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

>> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reshaping an array - how does it work in R

2016-03-22 Thread Dénes Tóth


Hi Martin,


On 03/22/2016 10:20 AM, Martin Maechler wrote:

>>>>>Dénes Tóth<toth.de...@ttk.mta.hu>
>>>>> on Fri, 18 Mar 2016 22:56:23 +0100 writes:

 > Hi Roy,
 > R (usually) makes a copy if the dimensionality of an array is modified,
 > even if you use this syntax:

 > x <- array(1:24, c(2, 3, 4))
 > dim(x) <- c(6, 4)

 > See also ?tracemem, ?data.table::address, ?pryr::address and other tools
 > to trace if an internal copy is done.

Well, without using strange (;-) packages,  indeed standard R's
tracemem(), notably the help page is a good pointer.

According to the help page memory tracing is enabled in the
default R binaries for Windows and OS X.
For Linux (where I, as R developer, compile R myself anyway),
one needs to configure with --enable-memory-profiling .

Now, let's try:

> x <- array(rnorm(47), dim = c(1000,50, 40))
> tracemem(x)
[1] "<0x7f79a498a010>"
> dim(x) <- c(1000* 50, 40)
> x[5] <- pi
> tracemem(x)
[1] "<0x7f79a498a010>"
>

So,*BOTH*   the re-dimensioning*AND*   the  sub-assignment did
*NOT*  make a copy.


This is interesting. First I wanted to demonstrate to Roy that recent R 
versions are smart enough not to make any copy during reshaping an 
array. Then I put together an example (similar to yours) and realized 
that after several reshapes, R starts to copy the array. So I had to 
modify my suggestion... And now, I realized that this was an 
RStudio-issue. At least on Linux, a standard R terminal behaves as you 
described, however, RStudio (version 0.99.862, which is not the very 
latest) tends to create copies (quite randomly, at least to me). If I 
have time I will test this more thoroughly and file a report to RStudio 
if it turns out to be a bug.


Denes



Indeed, R has become much smarter  in these things in recent
years ... not thanks to me, but very much thanks to
Luke Tierney (from R-core), and also thanks to contributions from "outside",
notably Tomas Kalibera.

And hence:*NO*  such strange workarounds are needed in this specific case:

 > Workaround: use data.table::setattr or bit::setattr to modify the
 > dimensions in place (i.e., without making a copy). Risk: if you modify
 > an object by reference, all other objects which point to the same memory
 > address will be modified silently, too.

Martin Maechler, ETH Zurich  (and R-core)

 > HTH,
 > Denes

(generally, your contributions help indeed, Denes, thank you!)


 > On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:
 >> Hi All:
 >>
 >> I am working with a very large array.  if noLat is the number of 
latitudes, noLon the number of longitudes and noTime the number of  time periods, the 
array is of the form:
 >>
 >> myData[noLat, no Lon, noTime].
 >>
 >> It is read in this way because that is how it is stored in a (series) 
of netcdf files.  For the analysis I need to do, I need instead the array:
 >>
 >> myData[noLat*noLon, noTime].  Normally this would be easy:
 >>
 >> myData<- array(myData,dim=c(noLat*noLon,noTime))
 >>
 >> My question is how does this command work in R - does it make a copy of 
the existing array, with different indices for the dimensions, or does it just redo 
the indices and leave the given array as is?  The reason for this question is my 
array is 30GB in memory, and I don’t have enough space to have a copy of the array in 
memory.  If the latter I will have to figure out a work around to bring in only part 
of the data at a time and put it into the proper locations.
 >>
 >> Thanks,
 >>
 >> -Roy



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Reshaping an array - how does it work in R

2016-03-19 Thread Dénes Tóth

Hi Roy,

R (usually) makes a copy if the dimensionality of an array is modified, 
even if you use this syntax:

x <- array(1:24, c(2, 3, 4))
dim(x) <- c(6, 4)

See also ?tracemem, ?data.table::address, ?pryr::address and other tools 
to trace if an internal copy is done.


Workaround: use data.table::setattr or bit::setattr to modify the 
dimensions in place (i.e., without making a copy). Risk: if you modify 
an object by reference, all other objects which point to the same memory 
address will be modified silently, too.


HTH,
  Denes



On 03/18/2016 10:28 PM, Roy Mendelssohn - NOAA Federal wrote:

Hi All:

I am working with a very large array.  if noLat is the number of latitudes, 
noLon the number of longitudes and noTime the number of  time periods, the 
array is of the form:

myData[noLat, no Lon, noTime].

It is read in this way because that is how it is stored in a (series) of netcdf 
files.  For the analysis I need to do, I need instead the array:

myData[noLat*noLon, noTime].  Normally this would be easy:

myData<- array(myData,dim=c(noLat*noLon,noTime))

My question is how does this command work in R - does it make a copy of the 
existing array, with different indices for the dimensions, or does it just redo 
the indices and leave the given array as is?  The reason for this question is 
my array is 30GB in memory, and I don’t have enough space to have a copy of the 
array in memory.  If the latter I will have to figure out a work around to 
bring in only part of the data at a time and put it into the proper locations.

Thanks,

-Roy



**
"The contents of this message do not reflect any position of the U.S. Government or 
NOAA."
**
Roy Mendelssohn
Supervisory Operations Research Analyst
NOAA/NMFS
Environmental Research Division
Southwest Fisheries Science Center
***Note new address and phone***
110 Shaffer Road
Santa Cruz, CA 95060
Phone: (831)-420-3666
Fax: (831) 420-3980
e-mail: roy.mendelss...@noaa.gov www: http://www.pfeg.noaa.gov/

"Old age and treachery will overcome youth and skill."
"From those who have been given much, much will be expected"
"the arc of the moral universe is long, but it bends toward justice" -MLK Jr.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to extract same columns from identical dataframes in a list?

2016-02-08 Thread Dénes Tóth

Hi,

Although you did not provide any reproducible example, it seems you 
store the same type of values in your data.frames. If this is true, it 
is much more efficient to store your data in an array:


mylist <- list(a = data.frame(week1 = rnorm(24), week2 = rnorm(24)),
   b = data.frame(week1 = rnorm(24), week2 = rnorm(24)))

myarray <- unlist(mylist, use.names = FALSE)
dim(myarray) <- c(nrow(mylist$a), ncol(mylist$a), length(mylist))
dimnames(myarray) <- list(hour = rownames(mylist$a),
  week = colnames(mylist$a),
  other = names(mylist))
# now you can do:
mean(myarray[, "week1", "a"])

# or:
colMeans(myarray)


Cheers,
  Denes


On 02/08/2016 02:33 PM, Wolfgang Waser wrote:

Hello,

I have a list of 7 data frames, each data frame having 24 rows (hour of
the day) and 5 columns (weeks) with a total of 5 x 24 values

I would like to combine all 7 columns of week 1 (and 2 ...) in a
separate data frame for hourly calculations, e.g.

apply(new.data.frame,1,mean)


In some way sapply (lapply) works, but I cannot directly select columns
of the original data frames in the list. As a workaround I have to
select a range of values:


sapply(list_of_dataframes,"[",1:24)


Values 1:24 give the first column, 25:48 the second and so on.

Is there an easier / more direct way to select for specific columns
instead of selecting a range of values, avoiding loops?


Cheers,

Wolfgang

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] does save.image() also save the random state?

2016-02-05 Thread Dénes Tóth

On 02/05/2016 05:25 PM, Duncan Murdoch wrote:

On 05/02/2016 11:14 AM, Jinsong Zhao wrote:

Dear there,

Here is a snipped code,

  > rm(list = ls())
  > x <- 123
  > save.image("abc.RData")
  > rm(list = ls())
  > load("abc.RData")
  > sample(10)
   [1]  3  7  4  6 10  2  5  9  8  1
  > rm(list = ls())
  > load("abc.RData")
  > sample(10)
   [1]  3  7  4  6 10  2  5  9  8  1

you will see that, after loading a abc.RData file that is saved by
save.image(), sample(10) gives the same results. I am wondering whether
it's designed purposely. And if it is, how can I get a different results
of sample(10) every time after loading the saved image?


This happens because you are reloading the random number seed.  You can
tell R to ignore it by calling

set.seed(NULL)

just after you load the image.  See ?set.seed for more details.

Duncan Murdoch



Based on your problem description, it seems that you actually do not 
want to restore the whole workspace but only the objects that you worked 
with. If this is indeed the case, it is much better to use 
save(list=ls(), file = "abc.RData") instead of save.image("abc.RData").
(Actually it is almost always better to use an explicitly parametrized 
save() call instead of save.image()).


save.image() can cause a lot of troubles besides the one you faced 
recently (which is caused due to the save and restore of the 
.Random.seed hidden object, as Duncan mentioned).


Cheers,
Denes





__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Efficient way to create new column based on comparison with another dataframe

2016-01-31 Thread Dénes Tóth

Hi,

I have not followed this thread from the beginning, but have you tried 
the foverlaps() function from the data.table package?


Something along the lines of:

---
# create the tables (use as.data.table() or setDT() if you
# start with a data.frame)
mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1,
  Position = c(3000, 6000, 1000))
Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"),
   Start = c(0, 5001), End = c(5000, 1))

# add a dummy variable to be able to define Position as an interval
mapfile[, Position2 := Position]

# add keys
setkey(mapfile, Chr, Position, Position2)
setkey(Chr.Arms, Chr, Start, End)

# use data.table::foverlaps (see ?foverlaps)
mapfile <- foverlaps(mapfile, Chr.Arms, type = "within")

# remove the dummy variable
mapfile[, Position2 := NULL]

# recreate original order
setorder(mapfile, Chr, Name)

---

BTW, there is a typo in your *SOLUTION*. I guess you wanted to write 
data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position = c(3000, 6000, 
1000), key = "Chr") instead of data.frame(Name = c("S1", "S2", "S3"), 
Chr = 1, Position = c(3000, 6000, 1000), key = "Chr").


HTH,
  Denes



On 01/30/2016 07:48 PM, Gaius Augustus wrote:

I'll look into the Intervals idea.  The data.table code posted might not
work (because I don't believe it would put the rows in the correct order if
the chromosomes are interspersed), however, it did make me think about
possibly assigning based on values...

*SOLUTION*
mapfile <- data.frame(Name = c("S1", "S2", "S3"), Chr = 1, Position =
c(3000, 6000, 1000), key = "Chr")
Chr.Arms <- data.frame(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End
= c(5000, 1), key = "Chr")

for(i in 1:nrow(Chr.Arms)){
   cur.row <- Chr.Arms[i, ]
   mapfile$Arm[ mapfile$Chr == cur.row$Chr & mapfile$Position >=
cur.row$Start & mapfile$Position <= cur.row$End] <- cur.row$Arm
}

This took out the need for the intermediate table/vector.  This worked for
me, and was VERY fast.  Took <5 minutes on a dataframe with 35 million rows.

Thanks for the help,
Gaius

On Sat, Jan 30, 2016 at 10:50 AM, Gaius Augustus 
wrote:


I'll look into the Intervals idea.  The data.table code posted might not
work (because I don't believe it would put the rows in the correct order if
the chromosomes are interspersed), however, it did make me think about
possibly assigning based on values...

Something like:
mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position =
c(3000, 6000, 1000), key = "Chr")
Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001), End
= c(5000, 1), key = "Chr")

for(i in 1:nrow(Chr.Arms)){
   cur.row <- Chr.Arms[i, ]
   mapfile[ Chr == cur.row$Chr & Position >= cur.row$Start & Position <=
cur.row$End] <- Chr.Arms$Arm
}

This might take out the need for the intermediate table/vector.  Not sure
yet if it'll work, but we'll see.  I'm interested to know if anyone else
has any ideas, too.

Thanks,
Gaius

On Fri, Jan 29, 2016 at 11:34 PM, Ulrik Stervbo 
wrote:


Hi Gaius,

Could you use data.table and loop over the small Chr.arms?

library(data.table)
mapfile <- data.table(Name = c("S1", "S2", "S3"), Chr = 1, Position =
c(3000, 6000, 1000), key = "Chr")
Chr.Arms <- data.table(Chr = 1, Arm = c("p", "q"), Start = c(0, 5001),
End = c(5000, 1), key = "Chr")

Arms <- data.table()
for(i in 1:nrow(Chr.Arms)){
   cur.row <- Chr.Arms[i, ]
   Arm <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End]
   Arm <- Arm[ , Arm:=cur.row$Arm][]
   Arms <- rbind(Arms, Arm)
}

# Or use plyr to loop over each possible arm
library(plyr)
Arms <- ddply(Chr.Arms, .variables = "Arm", function(cur.row, mapfile){
   mapfile <- mapfile[ Position >= cur.row$Start & Position <= cur.row$End]
   mapfile <- mapfile[ , Arm:=cur.row$Arm][]
   return(mapfile)
}, mapfile = mapfile)

I have just started to use the data.table and I have the feeling the code
above can be greatly improved - maybe the loop can be dropped entirely?

Hope this helps
Ulrik

On Sat, 30 Jan 2016 at 03:29 Gaius Augustus 
wrote:


I have two dataframes. One has chromosome arm information, and the other
has SNP position information. I am trying to assign each SNP an arm
identity.  I'd like to create this new column based on comparing it to
the
reference file.

*1) Mapfile (has millions of rows)*

NameChr   Position
S1  1  3000
S2  1  6000
S3  1  1000

*2) Chr.Arms   file (has 39 rows)*

ChrArmStart   End
1  p  0   5000
1  q  50011


*R Script that works, but slow:*
Arms  <- c()
for (line in 1:nrow(Mapfile)){
   Arms[line] <- Chr.Arms$Arm[ Mapfile$Chr[line] == Chr.Arms$Chr &
  Mapfile$Position[line] > Chr.Arms$Start &  Mapfile$Position[line] <
Chr.Arms$End]}
}
Mapfile$Arm <- Arms


*Output Table:*

Name   Chr   Position   Arm
S1  1 3000  p
S2  1 6000  q
S3  1 1000  p



Re: [R] Lists heading in an array of lists in R

2016-01-22 Thread Dénes Tóth

Hi,

Provide a list of a list in the second assignment:

--
TunePar <- matrix(list(NULL), 2, 2)
TunePar[2,1] <- list(list(G = 2))
TunePar[2,1]
TunePar[2,1][[1]]$G
TunePar[[2]]$G
---

The point is that "[" returns the list element of the same level as the 
original object (TunePar in the present example). So in the assignment 
if you extracted one element on the LHS, it expects a one-element vector 
on the RHS, e.g., check this out:

---
# this throws an error
TunePar[1,2] <- list(H = 1:3, I = letters[1:2])

# this is fine
TunePar[1,2] <- list(list(H = 1:3, I = letters[1:2]))
TunePar[1,2]
TunePar[1,2][[1]]$I
---

HTH,
 Denes



On 01/22/2016 08:29 AM, TJUN KIAT TEO wrote:

TunePar<-matrix(list(Null),2,2)

TunePar[1,1]=list(G=2)


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Transfer a 3-dimensional array to a matrix in R

2015-10-20 Thread Dénes Tóth

Hi,

Bill was faster than me in suggesting aperm() instead of apply(), 
however, his solution is still suboptimal. Try to avoid array(), and

set the dimensions directly if possible.



fn1 <- function(x) {
apply(x, 3, t)
}


fn2 <- function(x) {
array(aperm(x, c(2, 1, 3)), c(prod(dim(x)[1:2]), dim(x)[3]))
}

fn3 <- function(x) {
x <- aperm(x, c(2, 1, 3))
dim(x) <- c(prod(dim(x)[1:2]), dim(x)[3])
x
}

# check that the functions return the same
x <- array(1:18, dim=c(3, 2, 3))
stopifnot(identical(fn1(x), fn2(x)))
stopifnot(identical(fn1(x), fn3(x)))

# create two larger arrays, play with the size of the 3rd dimension
x <- array(1:18e4, dim=c(3, 2e1, 3e3))
y <- array(1:18e4, dim=c(3e3, 2e1, 3))

# and the timing:
library(microbenchmark)
microbenchmark(fn1(x), fn2(x), fn3(x), fn1(y), fn2(y), fn3(y), times = 100L)

---

Conclusion:
fn3() is about 3x as fast as fn2(), and fn1() can be extremely 
inefficient if dim(x)[3] is large.



HTH,
  Denes




On 10/20/2015 08:48 PM, William Dunlap wrote:

Or use aperm() (array index permuation):
   > array(aperm(x, c(2,1,3)), c(6,3))
[,1] [,2] [,3]
   [1,]17   13
   [2,]4   10   16
   [3,]28   14
   [4,]5   11   17
   [5,]39   15
   [6,]6   12   18

Bill Dunlap
TIBCO Software
wdunlap tibco.com


On Tue, Oct 20, 2015 at 11:31 AM, John Laing  wrote:

x <- array(1:18, dim=c(3, 2, 3))
x

, , 1

  [,1] [,2]
[1,]14
[2,]25
[3,]36

, , 2

  [,1] [,2]
[1,]7   10
[2,]8   11
[3,]9   12

, , 3

  [,1] [,2]
[1,]   13   16
[2,]   14   17
[3,]   15   18


apply(x, 3, t)

  [,1] [,2] [,3]
[1,]17   13
[2,]4   10   16
[3,]28   14
[4,]5   11   17
[5,]39   15
[6,]6   12   18


On Tue, Oct 20, 2015 at 12:39 PM, Chunyu Dong 
wrote:


Hello!


Recently I am trying to transfer a large 3-dimensional array to a matrix.
For example, a array like:
, , 1
  [,1] [,2]
[1,]14
[2,]25
[3,]36
, , 2
  [,1] [,2]
[1,]7   10
[2,]8   11
[3,]9   12
, , 3
  [,1] [,2]
[1,]   13   16
[2,]   14   17
[3,]   15   18


I would like to transfer it to a matrix like:
17  13
41016
28  14
51117
39  15
61218


Could you tell me how to do it in R ? Thank you very much!


Best regards,
Chunyu





 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



 [[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Plotting EEG signals as a "head" using R

2015-10-19 Thread Dénes Tóth

Hi,

the eegkit package 
(https://cran.r-project.org/web/packages/eegkit/index.html) might help 
you if you happen to work with a standard electrode cap.


Best,
  Denes



On 10/19/2015 02:41 PM, Charles Novaes de Santana wrote:

Dear all,

I have .csv file with the evoked potential of different electrodes of a
human subject and I would like to plot a head figure representing the
evoked potentials. My csv file has 1000 lines (from 1ms to 1000 ms) and 12
columns (each column for an electrode I am studying).

Do you know a way to use this csv files to plot a head representing the
evoked potential at specific points (like P300, N100, etc) using R? (I know
a way to plot it in Matlab using ERPLAB and EEGLAB, but Matlab is not an
option in our Lab).

The idea is to have a head figure similar to the one in the following
picture:
http://d2avczb82rh8fa.cloudfront.net/content/jn/113/3/740/F3.large.jpg

Thanks for any help, sorry for not having a reproducible example.

Best,

Charles




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] fast way to create composite matrix based on mixed indices?

2015-09-17 Thread Dénes Tóth

Hi Matt,

you could use matrix indexing. Here is a possible solution, which could 
be optimized further (probably).


# The old matrix
(old.mat <- matrix(1:30,nrow=3,byrow=TRUE))
# matrix of indices
index <- matrix(c(1,1,1,4,
  1,3,5,10,
  2,2,1,3,
  2,1,4,8,
  2,3,9,10),
nrow=5,byrow=TRUE,
dimnames=list(NULL,
  c('new.mat.row','old.mat.row',
'old.mat.col.start','old.mat.col.end')))
# expected result
new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30),
  byrow=TRUE, nrow=2)
#
# column indices
ind <- mapply(seq, index[, 3], index[,4],
  SIMPLIFY = FALSE, USE.NAMES = FALSE)
ind_len <- vapply(ind, length, integer(1))
ind <- unlist(ind)

#
# old indices
old.ind <- cbind(rep(index[,2], ind_len), ind)
#
# new indices
new.ind <- cbind(rep(index[,1], ind_len), ind)
#
# create the new matrix
result <- matrix(NA_integer_, max(index[,1]), max(index[,4]))
#
# fill the new matrix
result[new.ind] <- old.mat[old.ind]
#
# check the results
identical(result, new.mat)


HTH,
  Denes




On 09/17/2015 10:36 PM, Matthew Keller wrote:

HI all,

Sorry for the title here but I find this difficult to describe succinctly.
Here's the problem.

I want to create a new matrix where each row is a composite of an old
matrix, but where the row & column indexes of the old matrix change for
different parts of the new matrix. For example, the second row of new
matrix (which has , e.g., 10 columns) might be columns 1 to 3 of row 2 of
old matrix, columns 4 to 8 of row 1 of old matrix, and columns 9 to 10 of
row 3 of old matrix.

Here's an example in code:

#The old matrix
(old.mat <- matrix(1:30,nrow=3,byrow=TRUE))

#matrix of indices to create the new matrix from the old one.
#The 1st column gives the row number of the new matrix
#the 2nd gives the row of the old matrix that we're going to copy into the
new matrix
#the 3rd gives the starting column of the old matrix for the row in col 2
#the 4th gives the end column of the old matrix for the row in col 2
index <- matrix(c(1,1,1,4,
   1,3,5,10,
   2,2,1,3,
   2,1,4,8,
   2,3,9,10),
 nrow=5,byrow=TRUE,

dimnames=list(NULL,c('new.mat.row','old.mat.row','old.mat.col.start','old.mat.col.end')))

I will be given old.mat and index and want to create new.mat from them.

I want to create a new.matrix of two rows that looks like this:
new.mat <- matrix(c(1:4,25:30,11:13,4:8,29:30),byrow=TRUE,nrow=2)

So here, the first row of new.mat is columns 1 to 4 of row 1 of the old.mat
and columns 5 to 10 of row 3 of old.mat.

new.mat and old.mat will always have the same number of columns but the
number of rows could differ.

I could accomplish this in a loop, but the real problem is quite large
(new.mat might have 1e8 elements), and so a for loop would be prohibitively
slow.

I may resort to unix tools and use a shell script, but wanted to first see
if this is doable in R in a fast way.

Thanks in advance!

Matt




__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Multiple if function

2015-09-16 Thread Dénes Tóth



On 09/16/2015 04:41 PM, Bert Gunter wrote:

Yes! Chuck's use of mapply is exactly the split/combine strategy I was
looking for. In retrospect, exactly how one should think about it.
Many thanks to all for a constructive discussion .

-- Bert


Bert Gunter



Use mapply like this on large problems:

unsplit(
   mapply(
   function(x,z) eval( x, list( y=z )),
   expression( A=y*2, B=y+3, C=sqrt(y) ),
   split( dat$Flow, dat$ASB ),
   SIMPLIFY=FALSE),
   dat$ASB)

Chuck




Is there any reason not to use data.table for this purpose, especially 
if efficiency is of concern?


---

# load data.table and microbenchmark
library(data.table)
library(microbenchmark)
#
# prepare data
DF <- data.frame(
ASB = rep_len(factor(LETTERS[1:3]), 3e5),
Flow = rnorm(3e5)^2)
DT <- as.data.table(DF)
DT[, ASB := as.character(ASB)]
#
# define functions
#
# Chuck's version
fnSplit <- function(dat) {
unsplit(
mapply(
function(x,z) eval( x, list( y=z )),
expression( A=y*2, B=y+3, C=sqrt(y) ),
split( dat$Flow, dat$ASB ),
SIMPLIFY=FALSE),
dat$ASB)
}
#
# data.table-way (IMHO, much easier to read)
fnDataTable <- function(dat) {
dat[,
result :=
if (.BY == "A") {
2 * Flow
} else if (.BY == "B") {
3 + Flow
} else if (.BY == "C") {
sqrt(Flow)
},
by = ASB]
}
#
# benchmark
#
microbenchmark(fnSplit(DF), fnDataTable(DT))
identical(fnSplit(DF), fnDataTable(DT)[, result])

---

Actually, in Chuck's version the unsplit() part is slow. If the order is 
not of concern (e.g., DF is reordered before calling fnSplit), fnSplit 
is comparable to the DT-version.



Denes

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extracting elements out of list in list in list

2015-01-19 Thread Dénes Tóth


Hi,

Here is a solution which is restricted to lists with identically shaped 
branches (like your example). The idea is to transform the list to an 
array and make use of the fact that unlist(x, use.names=FALSE) is much 
much faster for large lists than unlist(x).


# function which transforms a list whose skeleton is appropriate, i.e.
# at all levels of the list, the elements have the same skeleton
# NOTE that no check is implemented for this (should be added)
# NOTE that it also works if the final node is not a scalar but a
# matrix or array
list2array - function(x) {
recfn - function(xx, dims, nms) {
if (is.recursive(xx)) {
dims - c(dims, length(xx))
nms - c(nms, list(names(xx)))
recfn(xx[[1]], dims, nms)
} else {
dims - c(dim(xx), rev(dims))
nms - c(dimnames(xx), rev(nms))
return(list(dims, nms))
}
}
temp - recfn(x, integer(), list())
# return
array(unlist(x, use.names=FALSE),
  temp[[1]],
  temp[[2]])
}

# create a list which is a collection of
# moderately large matrices
dimdat - c(1e3, 5e2)
datgen - function() array(rnorm(prod(dimdat)),
   dimdat,
   lapply(dimdat, function(i) letters[1:i]))
exlist - list(
f1=list(x1=list(A=datgen(), B=datgen()),
x2=list(A=datgen(), B=datgen())),
f2=list(x1=list(A=datgen(), B=datgen()),
x2=list(A=datgen(), B=datgen()))
)

# tranform the list to an array
system.time(exarray - list2array(exlist))

# check if an arbitrary subview is identical
# to the original list element
identical(exarray[,,B, x2, f1], exlist$f1$x2$B)

# compare the time for unlist(x)
system.time(unlist(exlist))


HTH,
  Denes





Hi

Consider the following variable:

--8---cut here---start-8---
x1 - list(
 A = 11,
 B = 21,
 C = 31
)

x2 - list(
 A = 12,
 B = 22,
 C = 32
)

x3 - list(
 A = 13,
 B = 23,
 C = 33
)

x4 - list(
 A = 14,
 B = 24,
 C = 34
)

y1 - list(
 x1 = x1,
 x2 = x2
)

y2 - list(
 x3 = x3,
 x4 = x4
)

x - list(
 f1 = y1,
 f2 = y2
)
--8---cut here---end---8---


To extract all fields named A from y1, I can do

,
|  sapply(y1, [[, A)
| x1 x2
| 11 12
`

But how can I do the same for x?

I could put an sapply into an sapply, but this would be less then
elegant.

Is there an easier way of doing this?

Thanks,

Rainer






__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Extract values from multiple lists

2014-12-17 Thread Dénes Tóth

Dear Jeff,

On 12/17/2014 01:46 AM, Jeff Newmiller wrote:

You are chasing ghosts of performance past, Denes.


In terms of memory efficiency, yes. In terms of CPU time, there can be 
significant difference, see below.



The data.frame

function causes no problems, and if it is used then the OP would not
need to presume they know the internal structure of the data frame.
See below. (I am using R3.1.2.)

a1 - list(x = rnorm(1e6), y = rnorm(1e6))
a2 - list(x = rnorm(1e6), y = rnorm(1e6))
a3 - list(x = rnorm(1e6), y = rnorm(1e6))

# get names of the objects
out_names - ls(pattern=a[[:digit:]]$)

# amount of memory allocated
gc(reset=TRUE)

# Explicitly call data frame
out2 - data.frame( a1=a1[[x]], a2=a2[[x]], a3=a3[[x]] )

# No copying.
gc()

# Your suggested retreival method
out3a - lapply( lapply( out_names, get ), [[, x )
names( out3a ) - out_names
# The obvious way to finish the job works fine.
out3 - do.call( data.frame, out3a )


BTW, the even more obvious as.data.frame() produces the same with an 
even more intuitive interface.


However, for lists with a larger number of elements the transformation 
to a data.frame can be pretty slow. In the toy example, we created only 
a three-element list. Let's increase it a little bit.


---

# this is not even that large
datlen - 1e2
listlen - 1e5

# create a toy list
mylist - matrix(seq_len(datlen * listlen),
 nrow = datlen, ncol = listlen)
mylist - lapply(1:ncol(mylist), function(i) mylist[, i])
names(mylist) - paste0(V, seq_len(listlen))


# define the more efficient function ---
# note that I put class(x) first so that setattr does not
# modify the attributes of the original input (see ?setattr,
# you have to be careful)
setAttrib - function(x) {
class(x) - data.frame
data.table::setattr(x, row.names, seq_along(x[[1]]))
x
}

# benchmarking
# (we do not need microbenchmark here, the differences are
# extremely large) - on my machine, 9.4 sec, 8.1 sec vs 0.15 sec
gc(reset=TRUE)
system.time(df1 - do.call(data.frame, mylist))
gc()
system.time(df2 - as.data.frame(mylist))
gc()
system.time(df3 - setAttrib(mylist))
gc()

# check results
identical(df1, df2)
identical(df1, df3)



Of course for small datasets, one should use the built-in and safe 
functions (either do.call or as.data.frame). BTW, for the original 
three-element list, these are even faster than the workaround.


All the best,
  Denes






# No copying... well, you do end up with a new list in out3, but the
data itself doesn't get copied.
gc()


On Tue, 16 Dec 2014, D?nes T?th wrote:


On 12/16/2014 06:06 PM, SH wrote:

Dear List,

I hope this posting is not redundant.  I have several list outputs
with the
same components.  I ran a function with three different scenarios below
(e.g., scen1, scen2, and scen3,...,scenN).  I would like to extract the
same components and group them as a data frame.  For example,
pop.inf.r1 - scen1[['pop.inf.r']]
pop.inf.r2 - scen2[['pop.inf.r']]
pop.inf.r3 - scen3[['pop.inf.r']]
...
pop.inf.rN-scenN[['pop.inf.r']]
new.df - data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN)

My final output would be 'new.df'.  Could you help me how I can do that
efficiently?


If efficiency is of concern, do not use data.frame() but create a list
and add the required attributes with data.table::setattr (the setattr
function of the data.table package). (You can also consider creating a
data.table instead of a data.frame.)

# some largish lists
a1 - list(x = rnorm(1e6), y = rnorm(1e6))
a2 - list(x = rnorm(1e6), y = rnorm(1e6))
a3 - list(x = rnorm(1e6), y = rnorm(1e6))

# amount of memory allocated
gc(reset=TRUE)

# get names of the objects
out_names - ls(pattern=a[[:digit:]]$)

# create a list
out - lapply(lapply(out_names, get), [[, x)

# note that no copying occured
gc()

# decorate the list
data.table::setattr(out, names, out_names)
data.table::setattr(out, row.names, seq_along(out[[1]]))
class(out) - data.frame

# still no copy
gc()

# output
head(out)


HTH,
 Denes




Thanks in advance,

Steve

P.S.:  Below are some examples of summary outputs.



summary(scen1)

 Length Class  Mode
aql1   -none- numeric
rql1   -none- numeric
alpha  1   -none- numeric
beta   1   -none- numeric
n.sim  1   -none- numeric
N  1   -none- numeric
n.sample   1   -none- numeric
n.acc  1   -none- numeric
lot.inf.r  1   -none- numeric
pop.inf.n   2000   -none- list
pop.inf.r   2000   -none- list
pop.decision.t1 2000   -none- list
pop.decision.t2 2000   -none- list
sp.inf.n2000   -none- list
sp.inf.r2000   -none- list
sp.decision 2000   -none- list

summary(scen2)

 Length Class  Mode
aql1   -none- numeric
rql1   -none- numeric
alpha  1   -none- numeric
beta   1   -none- numeric
n.sim  1   -none- numeric
N  1   

Re: [R] Extract values from multiple lists

2014-12-16 Thread Dénes Tóth



On 12/16/2014 06:06 PM, SH wrote:

Dear List,

I hope this posting is not redundant.  I have several list outputs with the
same components.  I ran a function with three different scenarios below
(e.g., scen1, scen2, and scen3,...,scenN).  I would like to extract the
same components and group them as a data frame.  For example,
pop.inf.r1 - scen1[['pop.inf.r']]
pop.inf.r2 - scen2[['pop.inf.r']]
pop.inf.r3 - scen3[['pop.inf.r']]
...
pop.inf.rN-scenN[['pop.inf.r']]
new.df - data.frame(pop.inf.r1, pop.inf.r2, pop.inf.r3,...,pop.inf.rN)

My final output would be 'new.df'.  Could you help me how I can do that
efficiently?


If efficiency is of concern, do not use data.frame() but create a list 
and add the required attributes with data.table::setattr (the setattr 
function of the data.table package). (You can also consider creating a 
data.table instead of a data.frame.)


# some largish lists
a1 - list(x = rnorm(1e6), y = rnorm(1e6))
a2 - list(x = rnorm(1e6), y = rnorm(1e6))
a3 - list(x = rnorm(1e6), y = rnorm(1e6))

# amount of memory allocated
gc(reset=TRUE)

# get names of the objects
out_names - ls(pattern=a[[:digit:]]$)

# create a list
out - lapply(lapply(out_names, get), [[, x)

# note that no copying occured
gc()

# decorate the list
data.table::setattr(out, names, out_names)
data.table::setattr(out, row.names, seq_along(out[[1]]))
class(out) - data.frame

# still no copy
gc()

# output
head(out)


HTH,
  Denes




Thanks in advance,

Steve

P.S.:  Below are some examples of summary outputs.



summary(scen1)

 Length Class  Mode
aql1   -none- numeric
rql1   -none- numeric
alpha  1   -none- numeric
beta   1   -none- numeric
n.sim  1   -none- numeric
N  1   -none- numeric
n.sample   1   -none- numeric
n.acc  1   -none- numeric
lot.inf.r  1   -none- numeric
pop.inf.n   2000   -none- list
pop.inf.r   2000   -none- list
pop.decision.t1 2000   -none- list
pop.decision.t2 2000   -none- list
sp.inf.n2000   -none- list
sp.inf.r2000   -none- list
sp.decision 2000   -none- list

summary(scen2)

 Length Class  Mode
aql1   -none- numeric
rql1   -none- numeric
alpha  1   -none- numeric
beta   1   -none- numeric
n.sim  1   -none- numeric
N  1   -none- numeric
n.sample   1   -none- numeric
n.acc  1   -none- numeric
lot.inf.r  1   -none- numeric
pop.inf.n   2000   -none- list
pop.inf.r   2000   -none- list
pop.decision.t1 2000   -none- list
pop.decision.t2 2000   -none- list
sp.inf.n2000   -none- list
sp.inf.r2000   -none- list
sp.decision 2000   -none- list

summary(scen3)

 Length Class  Mode
aql1   -none- numeric
rql1   -none- numeric
alpha  1   -none- numeric
beta   1   -none- numeric
n.sim  1   -none- numeric
N  1   -none- numeric
n.sample   1   -none- numeric
n.acc  1   -none- numeric
lot.inf.r  1   -none- numeric
pop.inf.n   2000   -none- list
pop.inf.r   2000   -none- list
pop.decision.t1 2000   -none- list
pop.decision.t2 2000   -none- list
sp.inf.n2000   -none- list
sp.inf.r2000   -none- list
sp.decision 2000   -none- list

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Equivalent to matlab .* operator in R

2014-11-19 Thread Dénes Tóth

Hi,

It is better to use sweep() for these kinds of problems, see ?sweep

y - matrix(cbind(c(0, 0.5, 1),c(0, 0.5, 1)),ncol=2)
z - matrix(c(12, -6),ncol=2)
sweep(y, 2, z, *)


Best,
  Denes



On 11/19/2014 03:50 PM, Berend Hasselman wrote:

On 19-11-2014, at 15:22, Ruima E. ruimax...@gmail.com wrote:


Hi,

I have this:

y = matrix(cbind(c(0, 0.5, 1),c(0, 0.5, 1)),ncol=2)
z = matrix(c(12, -6),ncol=2)

In matlab I would do this


y .* x

I would get this in matlab


ans

0-0
6-3
12   -6

What is the equivalent in R?


One way of doing this could be:

y * rep(z,1,each=nrow(y))

Berend


Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Equivalent to matlab .* operator in R

2014-11-19 Thread Dénes Tóth

Hi,

just for the records, your original code seems incorrect, see inline.

On 11/19/2014 03:22 PM, Ruima E. wrote:

Hi,

I have this:

y = matrix(cbind(c(0, 0.5, 1),c(0, 0.5, 1)),ncol=2)
z = matrix(c(12, -6),ncol=2)

In matlab I would do this


y .* x
Here you wrote 'x' which I guess refers to 'z', and should be repmat(z, 
size(y, 1), 1) in matlab (assuming 'z' is a row vector)



I would get this in matlab


ans

0-0
6-3
12   -6

What is the equivalent in R?

Thanks

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] arrow egdes and lty

2011-11-14 Thread Dénes TÓTH

Hi,

 Dear R developers,

 I want to draw an arrow in a figure with lty=2. The
 lty argument also affects the edge of the arrow, which is
 unfortunate. Feature or bug?

 Is there some way to draw an arrow with intact edge, still
 with lty=2?

AFAIK there is no such option in the arrow function, but you might try to
play around with the shape package and its various arrowhead options.


 Example code:

 plot(1:10)
 arrows(4, 5, 6, 7, lty=2)

library(shape)
plot(1:10)
Arrows(4, 5, 6, 7, lty=2)

HTH,
  Denes



 Best wishes,

 Matthias
 --

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R-help Digest, Vol 104, Issue 19

2011-10-21 Thread Dénes TÓTH


 On Oct 21, 2011, at 09:01 , Martin Maechler wrote:

 ARE == Alex Ruiz Euler rruizeu...@ucsd.edu
on Wed, 19 Oct 2011 14:05:16 -0700 writes:

ARE Motion supported. Very.

ARE On Wed, 19 Oct 2011 15:40:14 +0200
ARE peter dalgaard pda...@gmail.com wrote:

 Argh!

 Someone please unsubscribe this guy?

 He did this over Summer too and still hasn't learned that 1
 recipients of R-help do not care whether he is out of office!

 -pd

 Well, there are hundreds like him.
 The only difference being that he speaks Hungarian..


 You might filter on the Subject line being Re: [R] R-help Digest.*, with
 no attention to content. That has an obvious side effect, but maybe not a
 harmful one...

 -pd


 Why?  I (as R-* mailing list site maintainer)
 have had (procmail) filters that automatically catch such 'out of
 office'
 messages, so the 10'000 readers don't have to get them.
 The current set of filters catches  a set of English, French,
 German,.. (and I don't know) messages
 So I have (many!!) filters like this:

 :0
 * ^Subject: (Re|Holiday|Vacation): .*[-A-za-z]+ Digest, Vol [1-9][0-9]*,
 Issue [1-9][0-9]*
 {
  :0B
  * I( will not be reading.*\e?[-]?mail|.* away .* attend to your
 message when I get)
  mlist-bounced.spool
 }

 ---
 but can't start doing that for Hungarian or Chinese or ...

FYI:
holiday=szabadság
on holiday=szabadságon
out of office=nem tartózkodom az irodában OR irodán kívül vagyok

Expressions like nem tartózkodom az irodában OR irodán kívül vagyok
will never occur in a real post sent to the R-list, so could be used for
filtering.

HTH,
Dénes



 Martin

 --
 Peter Dalgaard, Professor
 Center for Statistics, Copenhagen Business School
 Solbjerg Plads 3, 2000 Frederiksberg, Denmark
 Phone: (+45)38153501
 Email: pd@cbs.dk  Priv: pda...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Structural equation modelling in R compared with Amos

2011-10-20 Thread Dénes TÓTH

Dear Ravi,

I would also suggest you to have a look at 'lavaan' (www.lavaan.org). It
has an extremely straightforward yet flexible model syntax and very good
documentation. Although its statistical capabilities are not comparable to
those of Mplus (which is the most powerful SEM-software I think), it is
getting more and more closer to it.
Personally I prefer lavaan over OpenMX, but it is also a matter of taste.

HTH,
  Denes



 Hi Ravi,

 Look at openmx, it uses R to do matrix optimization especially for SEM
 (though more general too).  It does not have the draw paths interactively
 feature, but it is extremely powerful and flexible.  I have used Amos,
 EQS, Mplus, Lisrel, and OpenMx and I believe OpenMx is competitive against
 any of those if you know what you are doing.  I have found Amos to be
 particularly limited in its ability to handle complex models.  If you are
 looking for a simpler interface to SEM in R for some basic models, he k
 out the SEM package by John Fox.

 HTH,

 Josh

 On Oct 20, 2011, at 4:56, Ravi Kulkarni ravi.k...@gmail.com wrote:

 Can anyone give me links to reviews/comparisons of R with Amos for SEM?
 I
 have found some but they are a little old (2009).

 Ravi



 --
 View this message in context:
 http://r.789695.n4.nabble.com/Structural-equation-modelling-in-R-compared-with-Amos-tp3921654p3921654.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Repeat a loop until...

2011-10-19 Thread Dénes TÓTH

If you want to generate truncated distributions, package 'tmvtnorm' can
help you out.

Regards,
  Denes




 Dear all,

 I know there have been various questions posted over the years about
 loops
 but I'm afraid that I'm still stuck.  I am using Windows XP and R 2.9.2.
 I am generating some data using the multivariate normal distribution
 (within the 'mnormt' package). [The numerical values of sanad and covmat

 are not important.]
  datamat - rmnorm(n=1500,mean=c(mean(sanad[,1]),mean(sanad[,2]),mean
 (sanad[,3])),varcov=covmat)

 The middle column of 'datamat' is simulated data for age.  Obviously
 some
 of the simulated values are not going to be sensible.  Therefore I'd
 like
 to set up a function that looks at each row of 'datamat' and if the
 value
 for the middle column is 5 or 86 then the whole row is replaced by
 another imputed row.  Of course, the problem is that the imputed value
 for
 age may be outside my acceptable range too.

 If there a way to set up a loop such that it keeps checking each row
 until
 the values are within the range?

 So far I have the following but this doesn't repeat the process.
 ctstrunk - function(data)
 {
 for(i in 1:nrow(data)){
 if(data[i,2]5)
 data[i,]-rmnorm(n=1,mean=c(mean(sanad[,1]),mean(sanad[,
 2]),mean(sanad[,3])),varcov=covmat)
 if(data[i,2]86)
 data[i,]-rmnorm(n=1,mean=c(mean(sanad[,1]),mean(sanad[,
 2]),mean(sanad[,3])),varcov=covmat)
 }
 return(data)
 }

 I thought of perhaps a repeat loop such as the following but this loop
 never stops...
 ctstrunk - function(data)
 {
 repeat{
 for(i in 1:nrow(data)){
 if(data[i,2]5)
 data[i,]-rmnorm(n=1,mean=c(mean(sanad[,1]),mean(sanad[,
 2]),mean(sanad[,3])),varcov=covmat)
 if(data[i,2]5){break}
 }
 repeat{
 for(i in 1:nrow(data)){
 if(data[i,2]86)
 data[i,]-rmnorm(n=1,mean=c(mean(sanad[,1]),mean(sanad[,
 2]),mean(sanad[,3])),varcov=covmat)
 if(data[i,2]86){break}
 }
 return(data)
 }

 I have also tried a while loop but again, the function didn't stop
 ctstrunk - function(data)
 {
 for(i in 1:nrow(data)){
 while(data[i,2]5)
 data[i,]-rmnorm(n=1,mean=c(mean(sanad
 [,1]),mean(sanad[,2]),mean(sanad[,3])),varcov=covmat)
 while(data[i,2]86)
 data[i,]-rmnorm(n=1,mean=c(mean(sanad
 [,1]),mean(sanad[,2]),mean(sanad[,3])),varcov=covmat)
 }
 return(data)
 }

 Many thanks for any assistance you can offer.

 Kind regards,
 Laura

 Laura Bonnett
 Research Assistant
 Department of Biostatistics
 University of Liverpool

 Telephone: 0151 7944059
 Email: l.j.bonn...@liv.ac.uk


[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] plots of correlation matrices

2011-10-11 Thread Dénes TÓTH



 Hi,

 One way to do that is this  (avoiding the use of a for loop):


 l.txt- id category attribute1 attribute2 attribute3 attribute4
 661 SCHS 43.2 0 56.5 1
 12202 SCHS 161.7 5.7 155 16
 1182 SCHS 21.4 0 29 0
 1356 SSS  8.8182 0.1818 10.6667 0.6667
 1864 SCHS 443.7273 9.9091 537 46
 12360 SOA 6.6364 0 10 0
 3382 SOA 7.1667 0 26 0.5
 1033 SOA 63.9231 1.5385 91.5 11.5
 14742 SSS 4.3846 0 8 0
 12760 SSS 425.0714 1.7857 297.5 3.5
 

 dat.df - read.table(textConnection(l.txt),  header=T, as.is = TRUE)
 closeAllConnections()

 dat.lt-by(dat.df[,3:6], dat.df$category, cor)

I guess Gawesh is looking for ?layout or ?par:

par(mfrow=c(2,2))
lapply(dat.lt,corrplot)


 lapply(dat.lt,corrplot)


 Regards,
 Carlos Ortega
 www.qualityexcellence.es

 2011/10/11 gj gaw...@gmail.com

 Hi,

 I want to do a visualisation of a matrix plot made up of several plots
 of
 correlation matrices (using corrplot()). My data is in csv format.
 Here's
 an
 example:

 id,category,attribute1,attribute2,attribute3,attribute4
 661,SCHS,43.2,0,56.5,1
 12202,SCHS,161.7,5.7,155,16
 1182,SCHS,21.4,0,29,0
 1356,SSS, 8.8182,0.1818,10.6667,0.6667
 1864,SCHS,443.7273,9.9091,537,46
 12360,SOA,6.6364,0,10,0
 3382,SOA,7.1667,0,26,0.5
 1033,SOA,63.9231,1.5385,91.5,11.5
 14742,SSS,4.3846,0,8,0
 12760,SSS,425.0714,1.7857,297.5,3.5

 I can get rid of the id. But I need the 'category' as a way of
 distinguishing the various correlation matrices.
 I can do a plot of the correlation matrix using corrplot() function in
 the
 corrplot package (ignoring the id and category). But what I need is a
 matrix
 of the plots of each correlation matrix based on the category, ie I have
 three categories in the data, hence I will need three plots of the
 correlation matrix  in one diagram (because the correlation matrix only
 makes sense if they are distinguished by category).

 Any help?

 Regards
 Gawesh

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] high and lowest with names

2011-10-11 Thread Dénes TÓTH

which.max is even faster:

dims - c(1000,1000)
tt - array(rnorm(prod(dims)),dims)
# which
system.time(
replicate(100, which(tt==max(tt), arr.ind=TRUE))
)
# which.max ( arrayInd)
system.time(
replicate(100, arrayInd(which.max(tt), dims))
)

Best,
Denes

 But it's simpler and probably faster to use R's built-in capabilities.
 ?which ## note the arr.ind argument!)

 As an example:

 test - matrix(rnorm(24), nr = 4)
 which(test==max(test), arr.ind=TRUE)
  row col
 [1,]   2   6

 So this gives the row and column indices of the max, from which row and
 column names can easily be obtained from the dimnames attribute of the
 matrix.

 Note: This assumes that the object in question is a matrix, NOT a data
 frame, for which it would be slightly more complicated.

 -- Bert


 On Tue, Oct 11, 2011 at 3:06 PM, Carlos Ortega
 c...@qualityexcellence.eswrote:

 Hi,

 With this code you can find row and col names for the largest value
 applied
 to your example:

 r.m.tmp-apply(dat,1,max)
 r.max-names(r.m.tmp)[r.m.tmp==max(r.m.tmp)]

 c.m.tmp-apply(dat,2,max)
 c.max-names(c.m.tmp)[c.m.tmp==max(c.m.tmp)]

 It's inmediate how to get the same for the smallest and build a function
 to
 calculate everything and return a list.


 Regards,
 Carlos Ortega
 www.qualityexcellence.es

 2011/10/11 Ben qant ccqu...@gmail.com

  Hello,
 
  I'm looking to get the values, row names and column names of the
 largest
  and
  smallest values in a matrix.
 
  Example (except is does not include the names):
 
   x - swiss$Education[1:25]
   dat = matrix(x,5,5)
   colnames(dat) = c('a','b','c','d','c')
   rownames(dat) = c('z','y','x','w','v')
   dat
a  b  c  d  c
  z 12  7  6  2 10
  y  9  7 12  8  3
  x  5  8  7 28 12
  w  7  7 12 20  6
  v 15 13  5  9  1
 
   #top 10
   sort(dat,partial=n-9:n)[(n-9):n]
   [1]  9 10 12 12 12 12 13 15 20 28
   # bottom 10
   sort(dat,partial=1:10)[1:10]
   [1] 1 2 3 5 5 6 6 7 7 7
 
  ...except I need the rownames and colnames to go along for the ride
 with
  the
  values...because of this, I am guessing the return value will need to
 be
 a
  list since all of the values have different row and col names (which
 is
  fine).
 
  Regards,
 
  Ben
 
 [[alternative HTML version deleted]]
 
  __
  R-help@r-project.org mailing list
  https://stat.ethz.ch/mailman/listinfo/r-help
  PLEASE do read the posting guide
  http://www.R-project.org/posting-guide.html
  and provide commented, minimal, self-contained, reproducible code.
 

[[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] correlation matrix

2011-10-10 Thread Dénes TÓTH

And you might also consider packages like corrplot, corrgram etc. for
other plotting options of a correlation matrix.
They can be more informative than simply invoking image(heat)



 What a pleasant post to respond to - with self-contained code. :)

 heat-matrix(0,nrow=dim(xa)[1],ncol=dim(xa)[2])

 heat[lower.tri(heat)]-xa[lower.tri(xa)]
 heat[upper.tri(heat)]-xb[upper.tri(xb)]
 diag(heat)-1

 heat

 HTH,
 Daniel


 1Rnwb wrote:

 Hello Gurus
 I have two correlation matrices 'xa' and 'xb'
 set.seed(100)
 d=cbind(x=rnorm(20)+1,
 x1=rnorm(20)+1,
 x2=rnorm(20)+1)


 d1=cbind(x=rnorm(20)+2,
 x1=rnorm(20)+2,
 x2=rnorm(20)+2)

 xa=cor(d,use='complete')

 xb=cor(d1,use='complete')



 I want to combine these two to get a third matrix which should have half
 values from 'xa' and half values from 'xb'
x x1 x2
 x  1.000  -0.15157123 -0.23085308
 x1 0.3466155 1.  -0.01061675
 x2 0.1234507 0.01775527 1.

 I would like to generate a heatmap for correlation values in disease and
 non disease phenotype

 I would appreciate if someone can point me in correct direction.
 Thanks
 sharad


 --
 View this message in context:
 http://r.789695.n4.nabble.com/correlation-matrix-tp3891085p3891685.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] calc.relimp pmvd for US R-user

2011-09-13 Thread Dénes TÓTH


 Dear All:

 I am calculating  the relative importance of a regressor in a linear
 model.
 Does anyone know how I can obtain/install the 'pmvd' computation type? I
 am
 a US user.

 Regards,
 Y


Hi,

Do you have any specific reason to use the pmvd method, especially in
the light of
http://prof.beuth-hochschule.de/fileadmin/user/groemping/downloads/amstat07mayp139.pdf?
The default lmg method
- gives very similar results,
- is more well-known in the literature, and
- is included in the US-version of the package.
Anyway, as a non-US user once upon a time I tried both methods, and they
really did not deviate that much (at least the ordering of the variables
was highly similar).


Hope that helps,
   Denes





 --
 View this message in context:
 http://r.789695.n4.nabble.com/calc-relimp-pmvd-for-US-R-user-tp3808752p3808752.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] studentized and standarized residuals

2011-08-10 Thread Dénes TÓTH

Dear Jen,

Actually you can check out what R does by looking at the source.

# first type the name of the function
 rstandard
function (model, ...)
UseMethod(rstandard)
environment: namespace:stats

# ?methods will list you the corresponding functions
 methods(rstandard)
[1] rstandard.glm rstandard.lm

# choose rstandard.lm
 rstandard.lm
function (model, infl = lm.influence(model, do.coef = FALSE),
sd = sqrt(deviance(model)/df.residual(model)), ...)
{
res - infl$wt.res/(sd * sqrt(1 - infl$hat))
res[is.infinite(res)] - NaN
res
}

# in case the function is not visible,
# you can use package-name:::function-name to display it
stats:::rstandard.lm


Best,
  Denes






 Thanks Patrick - at least I know I wasn't being too silly :-)
 Jen

 --
 View this message in context:
 http://r.789695.n4.nabble.com/studentized-and-standarized-residuals-tp3732997p3733173.html
 Sent from the R help mailing list archive at Nabble.com.

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] How to count numbers of a vector and use them as index values?

2011-07-31 Thread Dénes TÓTH

See also ?tabulate.

tabulate(x,8)



 Hi Paul,

 I would use something like this:

 x - c(2,2,3,3,4,6)
 table(x)
 x
 2 3 4 6
 2 2 1 1
 x - factor(x, levels=1:8)
 table(x)
 x
 1 2 3 4 5 6 7 8
 0 2 2 1 0 1 0 0

 Sarah

 On Sun, Jul 31, 2011 at 5:41 PM, Paul Menzel
 paulepan...@users.sourceforge.net wrote:
 Dear R folks,


 I am sorry to ask this simple question, but my search for the right
 way/command was unsuccessful.

 I have a vector

 x - c(2, 2, 3, 3, 4, 6)

 Now the values of x should be considered the index of another vector
 with possible greater length, say 8, and the value should be how often
 the indexes appeared in the original vector x.

 length(result)
  [1] 8
 result
  [1] 0 2 2 1 0 1 0 0


 Thank you in advance,

 Paul




 --
 Sarah Goslee
 http://www.functionaldiversity.org

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] finding a faster way to run lm on rows of predictor matrix

2011-07-29 Thread Dénes TÓTH

Hi,

you can solve the task by simple matrix algebra.
Note that I find it really inconvenient to have a matrix with variables in
rows and cases in columns, so I transposed your predictors matrix.

# your data
regress.y = rnorm(150)
predictors = matrix(rnorm(6000*150), ncol=150, nrow=6000)
tpreds - t(predictors)

# compute coefficients
coefs - apply(tpreds,2,function(x)
solve(crossprod(x),crossprod(x,regress.y)))

# compute residuals
resids - regress.y - sweep(tpreds,2,coefs,*)

(Note that resids shall be transposed back if you really insist on your
original matrix format.)

HTH,
 Denes


 Hi, everyone.
 I need to run lm with the same response vector but with varying predictor
 vectors. (i.e. 1 response vector on each individual 6,000 predictor
 vectors)
 After looking through the R archive, I found roughly 3 methods that has
 been suggested.
 Unfortunately, I need to run this task multiple times(~ 5,000 times) and
 would like to find a faster way than the existing methods.
 All three methods I have bellow run on my machine timed with system.time
 13~20 seconds.

 The opposite direction of 6,000 response vectors and 1 predictor vectors,
 that is supported with lm runs really fast ~0.5 seconds.
 They are pretty much performing the same number of lm fits, so I was
 wondering if there was a faster way, before I try to code this up in c++.

 thanks!!

 ## sample data ###
 regress.y = rnorm(150)
 predictors = matrix(rnorm(6000*150), ncol=150, nrow=6000)

 ## method 1 ##
 data.residuals = t(apply(predictors, 1, function(x)( lm(regress.y ~ -1 +
 as.vector(x))$residuals)))

 user  system elapsed
  15.076   0.048  15.128

 ## method 2 ##
 data.residuals = matrix(rep(0, nrow(predictors) * ncol(predictors)),
 nrow=nrow(predictors), ncol=ncol(predictors) )

 for( i in 1:nrow(predictors)){
 pred = as.vector(predictors[i,])
 data.residuals[i, ] = lm(regress.y ~ -1 + pred )$residuals
 }

  user  system elapsed
  13.841   0.012  13.854

 ## method 3 ##
 library(nlme)

 all.data - data.frame( y=rep(regress.y, nrow(predictors)),
 x=c(t(predictors)), g=gl(nrow(predictors), ncol(predictors)) )
 all.fits - lmList( y ~ x | g, data=all.data)
 data.residuals = matrix( residuals(all.fits), nrow=nrow(predictors),
 ncol=ncol(predictors))

 user  system elapsed
  36.407   0.484  36.892


 ## the opposite direction, supported by lm ###
 lm(t(predictors) ~ -1 + regress.y)$residuals

  user  system elapsed
  0.500   0.120   0.613

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping through data sets to change column from character to numeric

2011-07-28 Thread Dénes TÓTH

The problem is that you can not assign a variable to itself.

rm(list=ls())
df1 - data.frame(ResultValue=as.character(1:5))
df2 - data.frame(ResultValue=as.character(1:10))
frames = ls()
for (frame in frames){
 temp - get(frame)
 temp[,ResultValue] = as.numeric(temp[,ResultValue])
 assign(frame,temp)
}

HTH,
  Denes



 Greetings to all --

 I am having a silly problem that I just can't solve.  Someone has given me
 an .RData file will hundreds of data frames.  Each data frame has a column
 named ResultValue, currently character when it should be numeric.  I want
 to
 loop through all of the data frames to change the variable to numeric, but
 I
 cannot figure out how to do it. My best guess was along the lines of:

 frames = ls()
 for (frame in frames){
  assign(frame, get(frame), .GlobalEnv)
  frame[,ResultValue] = as.numeric(frame[,ResultValue])
  }

 It doesn't work.  After the assign() the frame object remains the
 character
 name of the dataframe I am trying to change.  If I do the following, the
 TEST object comes out just fine.

 frames = ls()
 for (frame in frames){
  assign(TEST, get(frame), .GlobalEnv)
  TEST[,ResultValue] = as.numeric(TEST[,ResultValue])
  }

 Seems like it should be simple, but I am misunderstanding something and
 not
 following the logic.  Any insight?

 Thanks,

 Sarah

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Looping through data sets to change column from character to numeric

2011-07-28 Thread Dénes TÓTH

Sorry, I was wrong. Of course you can assign a variable to itself, but it
does not make much sense...
What you misunderstood was that in the assignment you assign the data
frame (e.g. df1) to itself. You do not modify the frame object which
remains a character string.



 The problem is that you can not assign a variable to itself.

 rm(list=ls())
 df1 - data.frame(ResultValue=as.character(1:5))
 df2 - data.frame(ResultValue=as.character(1:10))
 frames = ls()
 for (frame in frames){
  temp - get(frame)
  temp[,ResultValue] = as.numeric(temp[,ResultValue])
  assign(frame,temp)
 }

 HTH,
   Denes



 Greetings to all --

 I am having a silly problem that I just can't solve.  Someone has given
 me
 an .RData file will hundreds of data frames.  Each data frame has a
 column
 named ResultValue, currently character when it should be numeric.  I
 want
 to
 loop through all of the data frames to change the variable to numeric,
 but
 I
 cannot figure out how to do it. My best guess was along the lines of:

 frames = ls()
 for (frame in frames){
  assign(frame, get(frame), .GlobalEnv)
  frame[,ResultValue] = as.numeric(frame[,ResultValue])
  }

 It doesn't work.  After the assign() the frame object remains the
 character
 name of the dataframe I am trying to change.  If I do the following, the
 TEST object comes out just fine.

 frames = ls()
 for (frame in frames){
  assign(TEST, get(frame), .GlobalEnv)
  TEST[,ResultValue] = as.numeric(TEST[,ResultValue])
  }

 Seems like it should be simple, but I am misunderstanding something and
 not
 following the logic.  Any insight?

 Thanks,

 Sarah

  [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Finding/identifying a value within a factor

2011-07-25 Thread Dénes TÓTH

Hi,

you provided a character vector as an example. I guess you meant something
like:
x - factor(c(1,2,3,4,1))

# You can identify those elements with an  by ?grep or ?grepl:
indices - grep(,as.character(x))

# You can transform those elements by ?as.numeric
as.numeric(x[indices])

HTH,
  Denes


 Hi all,

 I'm trying to identify a particular digit or value within a vector of
 factors. Specifically, this is environmental data where in some cases the
 minimum value reported is  a particular number (and I want to
 manipulate
 only these). For example:

  x-c(1,2,3,4,1)

 For a dataset that is hundreds or thousands of lines long, I'd like to
 find
 or identify only those that have a  symbol (R automatically stores the
 entire vector in factor format due to these symbols when it imports the
 data-I don't mind converting if necessary). Eventually, I'd like to divide
 the number in half for these cases, but I think I have that coding lined
 up
 once I can just identify them from the stew.

 I've exhausted help and net resources so far...

 Thanks,
 Ryan


 --

 Ryan Utz, Ph.D.
 Aquatic Ecologist/STREON Scientist
 National Ecological Observatory Network

 Home/Cell: (724) 272-7769
 Work: (720) 746-4844 ext. 2488

   [[alternative HTML version deleted]]

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.