Re: [R] AUC/Volume under the surface

2018-12-17 Thread Jeff Newmiller
You need to grasp two concepts:

1) Models in R conventionally have predict methods. To plot your model, predict 
the dependent variable based on the model object and a grid of your independent 
variable(s). Whether you have interactions or logistic regression shouldn't be 
relevant to getting a plot. The "expand.grid" function is helpful with 
preparing your inputs for the predict method.

2) Surfaces generally need a regular grid of input values independent of the 
input values you built your model from. This grid is often input to the 
plotting functions with the set of unique x-values and the set of unique 
y-values, with your z-values in a matrix. That is, for most 3d surface plotting 
functions you do not provide the gridded x and y values that you gave to the 
predict function.

There are many functions that can do this plotting... the rgl package is quite 
nice, but the scatterplot3d package can be simpler to publish with. Search for 
tutorials... there are many. If you get stuck, post a reproducible example data 
set and what you did with it here and someone will likely offer some more 
specific assistance.


On December 17, 2018 7:20:58 PM PST, Rehena Sultana via R-help 
 wrote:
>Dear Members,I am having trouble in plotting a 3D ROC curve based on
>multinomial logistic regression. I am also interested in finding
>AUC/Volume under the surface based on final multivariate model. I do
>have an interaction term in my final model. 
>I tried using "HUM" library. But I failed to figure out how to include
>interaction term in the model. 
>Looking forward for any help. Appreciate your help in advance. 
>Regards,rehena
>   [[alternative HTML version deleted]]
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] AUC/Volume under the surface

2018-12-17 Thread Rehena Sultana via R-help
Dear Members,I am having trouble in plotting a 3D ROC curve based on 
multinomial logistic regression. I am also interested in finding AUC/Volume 
under the surface based on final multivariate model. I do have an interaction 
term in my final model. 
I tried using "HUM" library. But I failed to figure out how to include 
interaction term in the model. 
Looking forward for any help. Appreciate your help in advance. 
Regards,rehena
[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Functional data anlysis for unequal length and unequal width time series

2018-12-17 Thread Jeff Newmiller
You will learn something useful if you search for "rolling join". The zoo 
package can handle this, as can the data.table package (read the vignette).

Your decision to pad with NA at the end was ill-considered... the first point 
of your first series is between the first two points of your second series... 
you need to interleave the points somehow.

You will need to decide whether you want to use piecewise linear approximation 
(as with the base "approx" function) or the more stable 
last-observation-carried-forward ("locf") or cubic splines or something more 
exotic like Fourier interpolation to identify the new interpolated "y" values 
in each series.

You can avoid the rolling join if you intend to resample the series to have 
points at regular intervals.  Just apply your preferred interpolation technique 
with your intended mesh of regular time values to each of your series in turn 
and then use cbind with the results.

I don't know anything about the package you mention, but getting time series 
data aligned is a common preprocessing step for many time series analysis.

Oh, and to you should probably be familiar with that CRAN Time Series Task View 
[1].

PS you should provide a link back to your original posting when moving the 
conversation to a different venue in case the discussion doesn't stay dead 
there.

[1] https://cran.r-project.org/web/views/TimeSeries.html

On December 17, 2018 8:50:09 AM PST, so...@iastate.edu wrote:
>Dear All,
>I apologize if you have already seen in Stack Overflow. I
>have not got any response from there so I am posting for help here.
>
>I have data on 1318 time series. Many of these series are of unequal
>length. Apart from this also quite a few time points for each of the
>series are observed at different time points. For example consider the
>following four series
>
>t1 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.67)
>V1 <- c(-0.1710, -0.0824, -0.0419, -0.0416, -0.0216, -0.0792, -0.0656,-
>0.0273, -0.0589)
>ser1 <- cbind(t1, V1)
>
>t2 <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38)
>V2 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231, 0.2264)
>ser2 <- cbind(t2, V2)
>
>t3 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.65,
>25.88, 25.97, 25.99)
>V3 <- c(0.0897, -0.0533, -0.3497, -0.5684, -0.4294, -0.1109, 0.0352,
>0.0550, -0.0536, 0.0185, -0.0295, -0.0324)
>ser3 <- cbind(t3, V3)
>
>t4 <- c(24.5, 24.67, 24.71, 24.98, 25.17)
>V4 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231)
>ser4 <- cbind(t4, V4)
>
>Here t1, t2, t3, t4 are the time points and V1, V2, V3, V4 are the
>observations made at over those time points. The time points in the
>actual data are Julian dates so they look like these, just that they
>are much larger decimal figures like 2452450.6225.
>
>I am trying to cluster these time series using functional data approach
>for which I am using the "funFEM" package in R. Th examples present are
>for equispaced and equal length time series so I am not sure how to use
>the package for my data. Initially I tried by making all the time
>series equal in length to the time series having the highest number of
>observations (here equal to ser3) by adding NA's to the time series. So
>following this example I made ser2 as
>
>t2_n <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38, 25.50, 25.55, 25.65,
>25.88, 25.97, 25.99)
>V2_na <- c(V2, rep(NA, 6))
>ser2_na <- cbind(t2_n, V2_na)
>
>Note that to make t2 equal to length of t3 I grabbed the last 6 time
>points from t3. To make V2 equal in length to V3 I added NA's.
>
>Then I created my data matrix as
>
>dat <- rbind(V1_na, V2_na, V3, V4_na).
>
>The code I used was
>
>require(funFEM)
>basis<- create.fourier.basis(c(min(t3), max(t3)), nbasis = 25) 
>fdobj <- smooth.basis(c(min(t3), max(t3)) ,dat, basis)$fd
>
>Note that the range is constructed using the maximum and minumum time
>point of ser_3 series.
>
>res <- funFEM(fdobj, K = 2:9, model = "all", crit = "bic", init =
>"random") 
>
>But this gives me an error
>
>Error in svd(X) : infinite or missing values in 'x'.
>
>Can anyone tell please help me on how to deal with this dataset for
>this package or any alternative package?
>
>Sincerly,
>Souradeep
>
>__
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

-- 
Sent from my phone. Please excuse my brevity.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for if-then-do code blocks

2018-12-17 Thread Gabor Grothendieck
There is some discussion of approaches to this here:

https://stackoverflow.com/questions/34096162/dplyr-mutate-replace-on-a-subset-of-rows/34096575#34096575


On Mon, Dec 17, 2018 at 10:30 AM Paul Miller via R-help
 wrote:
>
> Hello All,
>
> Season's greetings!
>
>  Am trying to replicate some SAS code in R. The SAS code uses if-then-do code 
> blocks. I've been trying to do likewise in R as that seems to be the most 
> reliable way to get the same result.
>
> Below is some toy data and some code that does work. There are some things I 
> don't necessarily like about the code though. So I was hoping some people 
> could help make it better. One thing I don't like is that the within function 
> reverses the order of the computed columns such that test1:test5 becomes 
> test5:test1. I've used a mutate to overcome that but would prefer not to have 
> to do so.
>
>  Another, perhaps very small thing, is the need to calculate an ID variable 
> that becomes the basis for a grouping.
>
> I did considerable Internet searching for R code that conditionally computes 
> blocks of code. I didn't find much though and so am wondering if my search 
> terms were not sufficient or if there is some other reason. It occurred to me 
> that maybe if-then-do code blocks like we often see in SAS as are frowned 
> upon and therefore not much implemented.
>
> I'd be interested in seeing more R-compatible approaches if this is the case. 
> I've learned that it's a mistake to try and make R be like SAS. It's better 
> to let R be R. Trouble is I'm not always sure how to do that.
>
> Thanks,
>
> Paul
>
>
> d1 <- data.frame(workshop=rep(1:2,4),
> gender=rep(c("f","m"),each=4))
>
> library(tibble)
> library(plyr)
>
> d2 <- d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
>   ddply("ID",
> within,
> if (gender == "f" & workshop == 1) {
>   test1 <- 1
>   test1 <- 6 + test1
>   test2 <- 2 + test1
>   test4 <- 1
>   test5 <- 1
> } else {
>   test1 <- test2 <- test4 <- test5 <- 0
> })
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.



-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Functional data anlysis for unequal length and unequal width time series

2018-12-17 Thread Bert Gunter
Specialized: Probably need to email the maintainer. See ?maintainer

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Mon, Dec 17, 2018 at 9:27 AM  wrote:

> Dear All,
> I apologize if you have already seen in Stack Overflow. I
> have not got any response from there so I am posting for help here.
>
> I have data on 1318 time series. Many of these series are of unequal
> length. Apart from this also quite a few time points for each of the
> series are observed at different time points. For example consider the
> following four series
>
> t1 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.67)
> V1 <- c(-0.1710, -0.0824, -0.0419, -0.0416, -0.0216, -0.0792, -0.0656,-
> 0.0273, -0.0589)
> ser1 <- cbind(t1, V1)
>
> t2 <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38)
> V2 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231, 0.2264)
> ser2 <- cbind(t2, V2)
>
> t3 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.65,
> 25.88, 25.97, 25.99)
> V3 <- c(0.0897, -0.0533, -0.3497, -0.5684, -0.4294, -0.1109, 0.0352,
> 0.0550, -0.0536, 0.0185, -0.0295, -0.0324)
> ser3 <- cbind(t3, V3)
>
> t4 <- c(24.5, 24.67, 24.71, 24.98, 25.17)
> V4 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231)
> ser4 <- cbind(t4, V4)
>
> Here t1, t2, t3, t4 are the time points and V1, V2, V3, V4 are the
> observations made at over those time points. The time points in the
> actual data are Julian dates so they look like these, just that they
> are much larger decimal figures like 2452450.6225.
>
> I am trying to cluster these time series using functional data approach
> for which I am using the "funFEM" package in R. Th examples present are
> for equispaced and equal length time series so I am not sure how to use
> the package for my data. Initially I tried by making all the time
> series equal in length to the time series having the highest number of
> observations (here equal to ser3) by adding NA's to the time series. So
> following this example I made ser2 as
>
> t2_n <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38, 25.50, 25.55, 25.65,
> 25.88, 25.97, 25.99)
> V2_na <- c(V2, rep(NA, 6))
> ser2_na <- cbind(t2_n, V2_na)
>
> Note that to make t2 equal to length of t3 I grabbed the last 6 time
> points from t3. To make V2 equal in length to V3 I added NA's.
>
> Then I created my data matrix as
>
> dat <- rbind(V1_na, V2_na, V3, V4_na).
>
> The code I used was
>
> require(funFEM)
> basis<- create.fourier.basis(c(min(t3), max(t3)), nbasis = 25)
> fdobj <- smooth.basis(c(min(t3), max(t3)) ,dat, basis)$fd
>
> Note that the range is constructed using the maximum and minumum time
> point of ser_3 series.
>
> res <- funFEM(fdobj, K = 2:9, model = "all", crit = "bic", init =
> "random")
>
> But this gives me an error
>
> Error in svd(X) : infinite or missing values in 'x'.
>
> Can anyone tell please help me on how to deal with this dataset for
> this package or any alternative package?
>
> Sincerly,
> Souradeep
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] error = FALSE causes knit2wp to throw duplicate label error

2018-12-17 Thread Nathan Parsons
Thanks for getting me pointed in the right direction. If I happen upon
a satisfactory solution, I will report back!

Nate Parsons
Pronouns: He, Him, His
Graduate Teaching Assistant
Department of Sociology
Portland State University
Portland, Oregon

Schedule an appointment: https://calendly.com/nate-parsons

503-893-8281
503-725-3957 FAX


Nate Parsons
Pronouns: He, Him, His
Graduate Teaching Assistant
Department of Sociology
Portland State University
Portland, Oregon

Schedule an appointment: https://calendly.com/nate-parsons

503-893-8281
503-725-3957 FAX


On Sun, Dec 16, 2018 at 1:46 PM Jeff Newmiller  wrote:
>
> This seems a bit deep into knitr for R-help... you might have better luck on 
> StackExchange. I also suggest that posting an incomplete example is usually 
> the kiss of death for getting constructive assistance online.
>
> FWIW my guess is that executing knitr from within an Rmarkdown document is a 
> bad idea unless you are building using child documents. Try manipulating your 
> markdown from an R file.
>
> On December 16, 2018 11:48:44 AM PST, Nathan Parsons 
>  wrote:
> >Goal: post from R to Wordpress installation on server.
> >
> >Problem: R keeps returning the error “Error in parse_block(g[-1],
> >g[1], params.src) : duplicate label 'setup’” if error = FALSE in the
> >knitr options or in an r chunk. It works fine if error = TRUE. I could
> >just go through each post each time and remove any returned errors
> >manually, but I'd like to find a more permanent solution.
> >
> >I don't have any duplicate labels; is knit2wp somehow introducing a
> >duplicate label in the .Rmd
> >-> .md / upload process?
> >
> >My code:
> >
> >```{r setup, include=FALSE}
> >## Set the global chunk options for knitting reports
> >  knitr::opts_chunk$set(
> >echo = TRUE,
> >eval = TRUE,
> >message = TRUE,
> >error = FALSE,
> >warning = TRUE,
> >highlight = TRUE,
> >prompt = FALSE
> >  )
> >
> >## Load and activate libraries using 'pacman' package
> >  if (!require(pacman)) {
> >install.packages("pacman", repos = "http://cran.us.r-project.org;)
> >  require(pacman)
> >  }
> >
> >  pacman::p_load_gh("duncantl/XMLRPC",
> >"duncantl/RWordPress")
> >  pacman::p_load("knitr")
> >```
> >
> >```{r chunk1, echo = FALSE}
> >## post information
> >  fileName <- "fancy_post.Rmd"
> >  postTitle <- "Fancy Post Title"
> >
> >```
> >
> >blah blah blah...
> >
> >```{r chunk2, echo = FALSE}
> >## Set working directory to correct location
> >  last_dir <- getwd()
> >  setwd("~/Sites/posts")
> >
> >## Tell knitr to create the html code and upload it to your wordpress
> >site
> >  knit2wp(input = fileName,
> >title = postTitle,
> >publish = FALSE,
> >action = 'newPost')
> >
> >  setwd(last_dir)
> >```
> >
> >
> >Traceback:
> >Error in parse_block(g[-1], g[1], params.src) : duplicate label 'setup'
> >26. stop("duplicate label '", label, "'")
> >25. parse_block(g[-1], g[1], params.src)
> >24. FUN(X[[i]], ...)
> >23. lapply(groups, function(g) { block = grepl(chunk.begin, g[1]) if
> >(!set.preamble && !parent_mode()) { return(if (block) "" else g) ...
> >22. split_file(lines = text)
> >21. process_file(text, output)
> >20. knit(input, encoding = encoding, envir = envir)
> >19. knit2wp(input = fileName, title = postTitle, publish = FALSE,
> >action = "newPost")
> >18. eval(expr, envir, enclos)
> >17. eval(expr, envir, enclos)
> >16. withVisible(eval(expr, envir, enclos))
> >15. withCallingHandlers(withVisible(eval(expr, envir, enclos)),
> >warning = wHandler, error = eHandler, message = mHandler)
> >14. handle(ev <- withCallingHandlers(withVisible(eval(expr, envir,
> >enclos)), warning = wHandler, error = eHandler, message = mHandler))
> >13. timing_fn(handle(ev <- withCallingHandlers(withVisible(eval(expr,
> >envir, enclos)), warning = wHandler, error = eHandler, message =
> >mHandler)))
> >12. evaluate_call(expr, parsed$src[[i]], envir = envir, enclos =
> >enclos, debug = debug, last = i == length(out), use_try =
> >stop_on_error != 2L, keep_warning = keep_warning, keep_message =
> >keep_message, output_handler = output_handler, include_timing =
> >include_timing)
> >11. evaluate::evaluate(...)
> >10. evaluate(code, envir = env, new_device = FALSE, keep_warning =
> >!isFALSE(options$warning), keep_message = !isFALSE(options$message),
> >stop_on_error = if (options$error && options$include) 0L else 2L,
> >output_handler = knit_handlers(options$render, options))
> >9. in_dir(input_dir(), evaluate(code, envir = env, new_device = FALSE,
> >keep_warning = !isFALSE(options$warning), keep_message =
> >!isFALSE(options$message), stop_on_error = if (options$error &&
> >options$include) 0L else 2L, output_handler =
> >knit_handlers(options$render, options)))
> >8. block_exec(params)
> >7. call_block(x)
> >6. process_group.block(group)
> >5. process_group(group)
> >4. withCallingHandlers(if (tangle) process_tangle(group) else
> >process_group(group), error = function(e) { setwd(wd) cat(res, sep =

Re: [R] [R studio] Plotting of line chart for each columns at 1 page

2018-12-17 Thread Jim Lemon
Hi Subhamitra,
As for the error that you mention, it was probably:

Error in axis(1, at = year_mids, labels = 3 - 1 - 1994:3 - 8 - 2017) :
  'at' and 'labels' lengths differ, 24 != 1992

Anything more than a passing glance reveals that you didn't read the
explanation I sent about the arguments passed to the "axis" function.
Perhaps it will be rewarding to read the help page for the "axis" function
in the "graphics" package.

Your confusion about the logic (really simple arithmetic) of assigning
positions for the year labels may be allayed by the following. Think back
to those grade school problems that read:

 "If I have m apples to give to n people, how many must I give each person
so that all will receive the same number and I will have the fewest apples
left?"

I'm sure that you remember that this can be solved in a number of ways. You
can divide m/n and drop the remainder. So, from 03-01-2002 to 03-08-2017 in
EMs2.1:

diff(as.Date(c("03-01-2002","03-08-2017"),"%d-%m-%Y"))
Time difference of 5691 days
# plus 1 for all of the days included
# calculate the number of years
5692/365.25
[1] 15.58385

So if there had been an observation each day, you would have the trivial
task of dividing the number of days by the number of years to get the tick
increments:

5692/15.58385
365.2499

Of course you don't have that many observations and you are trying to get
the number of observations, not days, in each year. By making the
assumption that the missing observations are spread evenly over the years,
you can simply replace the number of days with the number of observations.
At the moment I don't have that as I unrared your data at home. But you do
have it and I will call it nobs:

# this calculates the number of observations per year
nobs/15.58385


will yield the number of observations in each year. So you have your tick
increments. Now for the offset. If you want the year ticks to appear at the
middle of each year, you will want to start at 182 minus the two days
missing in January or 180. So your new year_mids will be:

year_mids<-seq(180,nobs,obs_per_year)

Your years are 2002:2017 for EMs2.1, so:

axis(1,year_mids,2002:2017)

may well be what you want for axis ticks. As you can see, the "m apples to
n people" approach gives you the answer. The only missing part was the
offset, or where to start handing out apples. You might want to have
another look at the help pages for "axis" and "seq" (or ":") which will
show you why your axis command failed badly. Good luck.

Jim

On Mon, Dec 17, 2018 at 6:12 PM Subhamitra Patra 
wrote:

> Hello Sir,
>
> Thank you very much for your excellent guidance to a new R learner.
>
> I tried with your suggested code and got the expected results, but for the
> 2 CSV files (i.e. EMs2.1. and EMs.3.1), the date column is not coming in
> the X-axis (shown in the last row of the attached result Pdf file).  I
> think I need to increase more or less than 229 in the year-mids because for
> both the CSV files, starting date is 03-01-2002 and 04-07-2001
> (date-month-year) for EMs 2.1. and EMs 3.1. respectively. *Sir, hence I
> am quite confused for the logic behind the fixing of year_mids*. For your
> convenience, I am attaching both the code and result file.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for if-then-do code blocks

2018-12-17 Thread Richard M. Heiberger
I got another 10% savings with this example by using only one
subscripting adjustment.
I also fixed a typo in my previous posting (which didn't affect the timing).



microbenchmark(
 rmh={
d3 <-data.frame(ID=rownames(d1),
   d1,
   test1=0,
   test2=0,
   test4=0,
   test5=0)
myRowSubset <- d3$gender=="f" & d3$workshop==1
test1 <- 1
d3[myRowSubset, "test1"] <- test1 + 6
d3[myRowSubset, "test2"] <- test1 + 6 + 2
d3[myRowSubset, c("test4", "test5")] <- test1
  },
 rmh4={
   d4 <- data.frame(ID=rownames(d1),
d1,
test1=0,
test2=0,
test4=0,
test5=0)
   myRowSubset <- d4$gender=="f" & d4$workshop==1
   test1 <- 1
   d4[myRowSubset, c("test1", "test2", "test4", "test5")] <-
 matrix(test1 + c(6, 6+2, 0, 0), nrow=sum(myRowSubset), ncol=4, byrow=TRUE)
 }
)

Unit: microseconds
 expr min   lq mean   median   uq  max neval cld
  rmh 956.187 1183.304 1538.012 1617.985 1865.149 2177.071   100   b
 rmh4 850.729 1042.997 1380.842 1416.476 1700.307 2448.545   100  a


On Mon, Dec 17, 2018 at 12:49 PM Richard M. Heiberger  wrote:
>
> this can be dome even faster, and I think more easily read, using only base R
>
> d1 <- data.frame(workshop=rep(1:2,4),
> gender=rep(c("f","m"),each=4))
>
> ## needed by vector and rowbased, not needed by rmh
> library(tibble)
> library(plyr)
> library(magrittr)
>
> microbenchmark(
>   vector = {d1 %>%
> rownames_to_column("ID") %>%
> mutate(
>   test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
>   test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
>   test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
>   test5 = test4
> ) },
>   rowbased = {d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
>   ddply("ID",
> within,
> if (gender == "f" & workshop == 1) {
>   test1 <- 1
>   test1 <- 6 + test1
>   test2 <- 2 + test1
>   test4 <- 1
>   test5 <- 1
> } else {
>   test1 <- test2 <- test4 <- test5 <- 0
> })},
>   rmh={
> data.frame(ID=rownames(d1),
>d1,
>test1=0,
>test2=0,
>test4=0,
>test5=0)
> myRowSubset <- d3$gender=="f" & d3$workshop==1
> test1 <- 1
> d3[myRowSubset, "test1"] <- test1 + 6
> d3[myRowSubset, "test2"] <- test1 + 6 + 2
> d3[myRowSubset, c("test4", "test5")] <- test1
>   }
> )
>
> Unit: microseconds
>  expr  min   lq  mean   medianuqmax neval cld
>vector 1281.994 1468.102  1669.266 1573.043  1750.354   3171.777   100  a
>  rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034   100   b
>   rmh  925.571 1056.530  1167.568 1116.425  1221.457   1968.199   100  a
> On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help
>  wrote:
> >
> > Dear Paul,
> >
> > R's power is that is works vectorised. Unlike SAS which is rowbased. Using
> > R in a SAS way will lead to very slow code.
> >
> > Your examples can be written vectorised
> >
> > d1 %>%
> >   rownames_to_column("ID") %>%
> >   mutate(
> > test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> > test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> > test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> > test5 = test4
> >   )
> >
> > Here is a speed comparison.
> >
> > library(microbenchmark)
> > microbenchmark(
> >   vector = {d1 %>%
> > rownames_to_column("ID") %>%
> > mutate(
> >   test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> >   test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> >   test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> >   test5 = test4
> > ) },
> >   rowbased = {d1 %>%
> >   rownames_to_column("ID") %>%
> >   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
> >   ddply("ID",
> > within,
> > if (gender == "f" & workshop == 1) {
> >   test1 <- 1
> >   test1 <- 6 + test1
> >   test2 <- 2 + test1
> >   test4 <- 1
> >   test5 <- 1
> > } else {
> >   test1 <- test2 <- test4 <- test5 <- 0
> > })}
> > )
> >
> >
> > Best regards,
> >
> > Thierry
> >
> > ir. Thierry Onkelinx
> > Statisticus / Statistician
> >
> > Vlaamse Overheid / Government of Flanders
> > INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> > FOREST
> > Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> > thierry.onkel...@inbo.be
> > Havenlaan 88 bus 73, 1000 Brussel
> > www.inbo.be
> >
> > ///
> > To call in the statistician after the experiment is done may be no more
> > than asking him to perform a 

Re: [R] Quoting smooth random terms mccv::gam

2018-12-17 Thread Smith, Desmond
Thanks very much, Simon!

On 12/17/18, 8:23 AM, "Simon Wood"  wrote:

I would quote the p-value, but not the statistic (as it is not a 
standard F stat). The actual statistic is given here:


https://urldefense.proofpoint.com/v2/url?u=https-3A__academic.oup.com_biomet_article-2Dpdf_100_4_1005_566200_ast038.pdf=DwIDaQ=UXmaowRpu5bLSLEQRunJ2z-YIUZuUoa9Rw_x449Hd_Y=y6YM-SSv8WOxR70LMzwwFohC41WMNU4ZGFcHpTmGWLo=CqMTgm500nmgBrPfC2cLIc45sZ6h0I2odVPYT8qCmYA=epymthgL8_opS9pITmfKRJnH_uzCCzEWTYU9qEwQ_q0=

On 14/12/2018 04:33, Smith, Desmond wrote:
> Dear All,
>
> I have a mgcv::gam model of the form:
>
> m1 <- gam(Y ~ A + s(B, bs = "re"), data = dataframe, family = gaussian, 
method = "REML")
>
> The random term is quoted in summary(m1) as, for example,
>
> Approximate significance of smooth terms:
> # edf Ref.df  F p-value
> s(B)  4.486  5 97.195 6.7e-08 ***
>
> My question is, how would I quote this result (statistic and P value) in 
a formal document?
>
> For example, one possibility is F[4.486,5] = 97.195, P = 6.7e-08. 
However, arguing against this, “reverse engineering” of the result using
>
> pf(q= 97.195, df1= 4.486, df2= 5, lower.tail=FALSE)
>
> gives an incorrect p value:
>
> [1] 0.1435508
>
> I would be very grateful for your advice. Many thanks for your help!
>
> 
>
> UCLA HEALTH SCIENCES IMPORTANT WARNING: This email (and any attachments) 
is only intended for the use of the person or entity to which it is addressed, 
and may contain information that is privileged and confidential. You, the 
recipient, are obligated to maintain it in a safe, secure and confidential 
manner. Unauthorized redisclosure or failure to maintain confidentiality may 
subject you to federal and state penalties. If you are not the intended 
recipient, please immediately notify us by return email, and delete this 
message from your computer.
>
>   [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> 
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp=DwIDaQ=UXmaowRpu5bLSLEQRunJ2z-YIUZuUoa9Rw_x449Hd_Y=y6YM-SSv8WOxR70LMzwwFohC41WMNU4ZGFcHpTmGWLo=CqMTgm500nmgBrPfC2cLIc45sZ6h0I2odVPYT8qCmYA=XCgKsfTTCyRoegz5hqMvtEt9m0KRm-qD0HtpXGYdxzg=
> PLEASE do read the posting guide 
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html=DwIDaQ=UXmaowRpu5bLSLEQRunJ2z-YIUZuUoa9Rw_x449Hd_Y=y6YM-SSv8WOxR70LMzwwFohC41WMNU4ZGFcHpTmGWLo=CqMTgm500nmgBrPfC2cLIc45sZ6h0I2odVPYT8qCmYA=hyi1_CVKTW4c_7lyb3TlmtPnSpsQfnI7BhRjVOLhXdI=
> and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R

2018-12-17 Thread David L Carlson
There are some problems with your example. Your code does not produce anything 
like your example data frame because you draw only 9 values without 
replacement. Your code produces 10 columns, each with the same permutation of 
the values 1:9.

Then your desired output does not make sense in terms of your example data. The 
first entry is V2:9 but 9 does not appear in row 1.

Using your posted example:
DF <- structure(list(V1 = c(3L, 1L, 2L, 3L, 6L, 8L, 1L, 9L, 1L),
V2 = c(2L, 4L, 3L, 8L, 2L, 2L, 5L, 3L, 2L), V3 = c(5L, 7L, 4L, 3L,
3L, 4L, 3L, 5L, 4L), V4 = c(6L, 8L, 7L, 4L, 7L, 8L, 6L, 8L, 8L),
V5 = c(5L, 7L, 5L, 5L, 2L, 3L, 8L, 4L, 3L), V6 = c(2L, 7L, 8L, 6L,
1L, 2L, 3L, 9L, 2L), V7 = c(6L, 3L, 9L, 7L, 8L, 9L, 8L, 7L, 1L),
V8 = c(8L, 4L, 1L, 4L, 3L, 7L, 9L, 8L, 2L), V9 = c(1L, 2L, 3L, 6L,
2L, 6L, 1L, 1L, 5L), V10 = c(3L, 9L, 5L, 5L, 4L, 5L, 3L, 2L, 6L)), 
class = "data.frame", row.names = c(NA, -9L))

Your code produces:

 V1V2   V3V4V5
1  V8:8  V4:6 V7:6  V3:5  V5:5
2 V10:9  V4:8 V3:7  V5:7  V6:7
3  V7:9  V6:8 V4:7  V5:5 V10:5
4  V2:8  V7:7 V6:6  V9:6  V5:5
5  V7:8  V4:7 V1:6 V10:4  V3:3
6  V7:9  V1:8 V4:8  V8:7  V9:6
7  V8:9  V5:8 V7:8  V4:6  V2:5
8  V1:9  V6:9 V4:8  V8:8  V7:7
9  V4:8 V10:6 V9:5  V3:4  V5:3

Which seems to be what you wanted.

-
David L. Carlson
Department of Anthropology
Texas A University


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Tom Woolman
Sent: Monday, December 17, 2018 11:34 AM
To: r-help@r-project.org
Subject: [R] Trying to fix code that will find highest 5 column names and their 
associated values for each row in a data frame in R


I have a data frame each with 10 variables of integer data for various
  attributes about each row of data, and I need to know the highest 5  
variables related to each of
  row in this data frame and output that to a new data frame. In addition to
  the 5 highest variable names, I also need to know the corresponding 5
  highest variable values for each row.

  A simple code example to generate a sample data frame for this is:

  set.seed(1)
  DF <- matrix(sample(1:9,9),ncol=10,nrow=9)
  DF <- as.data.frame.matrix(DF)


This would result in an example data frame like this:

  #   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
  # 1  3  2  5  6  5  2  6  8  1   3
  # 2  1  4  7  8  7  7  3  4  2   9
  # 3  2  3  4  7  5  8  9  1  3   5
  # 4  3  8  3  4  5  6  7  4  6   5
  # 5  6  2  3  7  2  1  8  3  2   4
  # 6  8  2  4  8  3  2  9  7  6   5
  # 7  1  5  3  6  8  3  8  9  1   3
  # 8  9  3  5  8  4  9  7  8  1   2
  # 9  1  2  4  8  3  2  1  2  5   6


  My ideal output would be something like this:


  #  V1   V2   V3   V4   V5
  # 1  V2:9 V7:8 V8:7 V4:6 V3:5
  # 2  V9:9 V3:8 V5:7 V7:6 V4:5
  # 3  V5:9 V3:8 V2:7 V9:6 V7:5
  # 4  V8:9 V4:8 V2:7 V5:6 V9:5
  # 5  V9:9 V1:8 V6:7 V3:6 V5:5
  # 6  V8:9 V1:8 V5:7 V9:6 V4:5
  # 7  V2:9 V8:8 V7:7 V5:6 V9:5
  # 8  V4:9 V7:8 V9:7 V2:6 V8:5
  # 9  V3:9 V7:8 V8:7 V4:6 V5:5
  # 10 V6:9 V8:8 V1:7 V9:6 V4:5


  I was trying to use code, but this doesn't seem to work:

  out <- t(apply(DF, 1, function(x){
o <- head(order(-x), 5)
paste0(names(x[o]), ':', x[o])
  }))
  as.data.frame(out)



  Thanks everyone!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for if-then-do code blocks

2018-12-17 Thread Richard M. Heiberger
this can be dome even faster, and I think more easily read, using only base R

d1 <- data.frame(workshop=rep(1:2,4),
gender=rep(c("f","m"),each=4))

## needed by vector and rowbased, not needed by rmh
library(tibble)
library(plyr)
library(magrittr)

microbenchmark(
  vector = {d1 %>%
rownames_to_column("ID") %>%
mutate(
  test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
  test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
  test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
  test5 = test4
) },
  rowbased = {d1 %>%
  rownames_to_column("ID") %>%
  mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
  ddply("ID",
within,
if (gender == "f" & workshop == 1) {
  test1 <- 1
  test1 <- 6 + test1
  test2 <- 2 + test1
  test4 <- 1
  test5 <- 1
} else {
  test1 <- test2 <- test4 <- test5 <- 0
})},
  rmh={
data.frame(ID=rownames(d1),
   d1,
   test1=0,
   test2=0,
   test4=0,
   test5=0)
myRowSubset <- d3$gender=="f" & d3$workshop==1
test1 <- 1
d3[myRowSubset, "test1"] <- test1 + 6
d3[myRowSubset, "test2"] <- test1 + 6 + 2
d3[myRowSubset, c("test4", "test5")] <- test1
  }
)

Unit: microseconds
 expr  min   lq  mean   medianuqmax neval cld
   vector 1281.994 1468.102  1669.266 1573.043  1750.354   3171.777   100  a
 rowbased 8131.230 8691.899 10894.700 9219.882 10435.642 133293.034   100   b
  rmh  925.571 1056.530  1167.568 1116.425  1221.457   1968.199   100  a
On Mon, Dec 17, 2018 at 12:15 PM Thierry Onkelinx via R-help
 wrote:
>
> Dear Paul,
>
> R's power is that is works vectorised. Unlike SAS which is rowbased. Using
> R in a SAS way will lead to very slow code.
>
> Your examples can be written vectorised
>
> d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(
> test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
> test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
> test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
> test5 = test4
>   )
>
> Here is a speed comparison.
>
> library(microbenchmark)
> microbenchmark(
>   vector = {d1 %>%
> rownames_to_column("ID") %>%
> mutate(
>   test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
>   test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
>   test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
>   test5 = test4
> ) },
>   rowbased = {d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
>   ddply("ID",
> within,
> if (gender == "f" & workshop == 1) {
>   test1 <- 1
>   test1 <- 6 + test1
>   test2 <- 2 + test1
>   test4 <- 1
>   test5 <- 1
> } else {
>   test1 <- test2 <- test4 <- test5 <- 0
> })}
> )
>
>
> Best regards,
>
> Thierry
>
> ir. Thierry Onkelinx
> Statisticus / Statistician
>
> Vlaamse Overheid / Government of Flanders
> INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
> FOREST
> Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
> thierry.onkel...@inbo.be
> Havenlaan 88 bus 73, 1000 Brussel
> www.inbo.be
>
> ///
> To call in the statistician after the experiment is done may be no more
> than asking him to perform a post-mortem examination: he may be able to say
> what the experiment died of. ~ Sir Ronald Aylmer Fisher
> The plural of anecdote is not data. ~ Roger Brinner
> The combination of some data and an aching desire for an answer does not
> ensure that a reasonable answer can be extracted from a given body of data.
> ~ John Tukey
> ///
>
> 
>
>
> Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help <
> r-help@r-project.org>:
>
> > Hello All,
> >
> > Season's greetings!
> >
> >  Am trying to replicate some SAS code in R. The SAS code uses if-then-do
> > code blocks. I've been trying to do likewise in R as that seems to be the
> > most reliable way to get the same result.
> >
> > Below is some toy data and some code that does work. There are some things
> > I don't necessarily like about the code though. So I was hoping some people
> > could help make it better. One thing I don't like is that the within
> > function reverses the order of the computed columns such that test1:test5
> > becomes test5:test1. I've used a mutate to overcome that but would prefer
> > not to have to do so.
> >
> >  Another, perhaps very small thing, is the need to calculate an ID
> > variable that becomes the basis for a grouping.
> >
> > I did considerable Internet searching for R code that conditionally
> > computes blocks of code. I didn't find much though and so am 

[R] Trying to fix code that will find highest 5 column names and their associated values for each row in a data frame in R

2018-12-17 Thread Tom Woolman



I have a data frame each with 10 variables of integer data for various
 attributes about each row of data, and I need to know the highest 5  
variables related to each of

 row in this data frame and output that to a new data frame. In addition to
 the 5 highest variable names, I also need to know the corresponding 5
 highest variable values for each row.

 A simple code example to generate a sample data frame for this is:

 set.seed(1)
 DF <- matrix(sample(1:9,9),ncol=10,nrow=9)
 DF <- as.data.frame.matrix(DF)


This would result in an example data frame like this:

 #   V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
 # 1  3  2  5  6  5  2  6  8  1   3
 # 2  1  4  7  8  7  7  3  4  2   9
 # 3  2  3  4  7  5  8  9  1  3   5
 # 4  3  8  3  4  5  6  7  4  6   5
 # 5  6  2  3  7  2  1  8  3  2   4
 # 6  8  2  4  8  3  2  9  7  6   5
 # 7  1  5  3  6  8  3  8  9  1   3
 # 8  9  3  5  8  4  9  7  8  1   2
 # 9  1  2  4  8  3  2  1  2  5   6


 My ideal output would be something like this:


 #  V1   V2   V3   V4   V5
 # 1  V2:9 V7:8 V8:7 V4:6 V3:5
 # 2  V9:9 V3:8 V5:7 V7:6 V4:5
 # 3  V5:9 V3:8 V2:7 V9:6 V7:5
 # 4  V8:9 V4:8 V2:7 V5:6 V9:5
 # 5  V9:9 V1:8 V6:7 V3:6 V5:5
 # 6  V8:9 V1:8 V5:7 V9:6 V4:5
 # 7  V2:9 V8:8 V7:7 V5:6 V9:5
 # 8  V4:9 V7:8 V9:7 V2:6 V8:5
 # 9  V3:9 V7:8 V8:7 V4:6 V5:5
 # 10 V6:9 V8:8 V1:7 V9:6 V4:5


 I was trying to use code, but this doesn't seem to work:

 out <- t(apply(DF, 1, function(x){
   o <- head(order(-x), 5)
   paste0(names(x[o]), ':', x[o])
 }))
 as.data.frame(out)



 Thanks everyone!

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Functional data anlysis for unequal length and unequal width time series

2018-12-17 Thread soura
Dear All,
I apologize if you have already seen in Stack Overflow. I
have not got any response from there so I am posting for help here.

I have data on 1318 time series. Many of these series are of unequal
length. Apart from this also quite a few time points for each of the
series are observed at different time points. For example consider the
following four series

t1 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.67)
V1 <- c(-0.1710, -0.0824, -0.0419, -0.0416, -0.0216, -0.0792, -0.0656,-
0.0273, -0.0589)
ser1 <- cbind(t1, V1)

t2 <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38)
V2 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231, 0.2264)
ser2 <- cbind(t2, V2)

t3 <- c(24.51, 24.67, 24.91, 24.95, 25.10, 25.35, 25.50, 25.55, 25.65,
25.88, 25.97, 25.99)
V3 <- c(0.0897, -0.0533, -0.3497, -0.5684, -0.4294, -0.1109, 0.0352,
0.0550, -0.0536, 0.0185, -0.0295, -0.0324)
ser3 <- cbind(t3, V3)

t4 <- c(24.5, 24.67, 24.71, 24.98, 25.17)
V4 <- c(-0.0280, -0.1980, -0.2556, 0.3131, 0.3231)
ser4 <- cbind(t4, V4)

Here t1, t2, t3, t4 are the time points and V1, V2, V3, V4 are the
observations made at over those time points. The time points in the
actual data are Julian dates so they look like these, just that they
are much larger decimal figures like 2452450.6225.

I am trying to cluster these time series using functional data approach
for which I am using the "funFEM" package in R. Th examples present are
for equispaced and equal length time series so I am not sure how to use
the package for my data. Initially I tried by making all the time
series equal in length to the time series having the highest number of
observations (here equal to ser3) by adding NA's to the time series. So
following this example I made ser2 as

t2_n <- c(24.5, 24.67, 24.91, 24.98, 25.14, 25.38, 25.50, 25.55, 25.65,
25.88, 25.97, 25.99)
V2_na <- c(V2, rep(NA, 6))
ser2_na <- cbind(t2_n, V2_na)

Note that to make t2 equal to length of t3 I grabbed the last 6 time
points from t3. To make V2 equal in length to V3 I added NA's.

Then I created my data matrix as

dat <- rbind(V1_na, V2_na, V3, V4_na).

The code I used was

require(funFEM)
basis<- create.fourier.basis(c(min(t3), max(t3)), nbasis = 25) 
fdobj <- smooth.basis(c(min(t3), max(t3)) ,dat, basis)$fd

Note that the range is constructed using the maximum and minumum time
point of ser_3 series.

res <- funFEM(fdobj, K = 2:9, model = "all", crit = "bic", init =
"random") 

But this gives me an error

Error in svd(X) : infinite or missing values in 'x'.

Can anyone tell please help me on how to deal with this dataset for
this package or any alternative package?

Sincerly,
Souradeep

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] R code for if-then-do code blocks

2018-12-17 Thread Thierry Onkelinx via R-help
Dear Paul,

R's power is that is works vectorised. Unlike SAS which is rowbased. Using
R in a SAS way will lead to very slow code.

Your examples can be written vectorised

d1 %>%
  rownames_to_column("ID") %>%
  mutate(
test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
test5 = test4
  )

Here is a speed comparison.

library(microbenchmark)
microbenchmark(
  vector = {d1 %>%
rownames_to_column("ID") %>%
mutate(
  test1 = ifelse(gender == "f" & workshop == 1, 7, 0),
  test2 = ifelse(gender == "f" & workshop == 1, test1 + 2, 0),
  test4 = ifelse(gender == "f" & workshop == 1, 1, 0),
  test5 = test4
) },
  rowbased = {d1 %>%
  rownames_to_column("ID") %>%
  mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
  ddply("ID",
within,
if (gender == "f" & workshop == 1) {
  test1 <- 1
  test1 <- 6 + test1
  test2 <- 2 + test1
  test4 <- 1
  test5 <- 1
} else {
  test1 <- test2 <- test4 <- test5 <- 0
})}
)


Best regards,

Thierry

ir. Thierry Onkelinx
Statisticus / Statistician

Vlaamse Overheid / Government of Flanders
INSTITUUT VOOR NATUUR- EN BOSONDERZOEK / RESEARCH INSTITUTE FOR NATURE AND
FOREST
Team Biometrie & Kwaliteitszorg / Team Biometrics & Quality Assurance
thierry.onkel...@inbo.be
Havenlaan 88 bus 73, 1000 Brussel
www.inbo.be

///
To call in the statistician after the experiment is done may be no more
than asking him to perform a post-mortem examination: he may be able to say
what the experiment died of. ~ Sir Ronald Aylmer Fisher
The plural of anecdote is not data. ~ Roger Brinner
The combination of some data and an aching desire for an answer does not
ensure that a reasonable answer can be extracted from a given body of data.
~ John Tukey
///




Op ma 17 dec. 2018 om 16:30 schreef Paul Miller via R-help <
r-help@r-project.org>:

> Hello All,
>
> Season's greetings!
>
>  Am trying to replicate some SAS code in R. The SAS code uses if-then-do
> code blocks. I've been trying to do likewise in R as that seems to be the
> most reliable way to get the same result.
>
> Below is some toy data and some code that does work. There are some things
> I don't necessarily like about the code though. So I was hoping some people
> could help make it better. One thing I don't like is that the within
> function reverses the order of the computed columns such that test1:test5
> becomes test5:test1. I've used a mutate to overcome that but would prefer
> not to have to do so.
>
>  Another, perhaps very small thing, is the need to calculate an ID
> variable that becomes the basis for a grouping.
>
> I did considerable Internet searching for R code that conditionally
> computes blocks of code. I didn't find much though and so am wondering if
> my search terms were not sufficient or if there is some other reason. It
> occurred to me that maybe if-then-do code blocks like we often see in SAS
> as are frowned upon and therefore not much implemented.
>
> I'd be interested in seeing more R-compatible approaches if this is the
> case. I've learned that it's a mistake to try and make R be like SAS. It's
> better to let R be R. Trouble is I'm not always sure how to do that.
>
> Thanks,
>
> Paul
>
>
> d1 <- data.frame(workshop=rep(1:2,4),
> gender=rep(c("f","m"),each=4))
>
> library(tibble)
> library(plyr)
>
> d2 <- d1 %>%
>   rownames_to_column("ID") %>%
>   mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
>   ddply("ID",
> within,
> if (gender == "f" & workshop == 1) {
>   test1 <- 1
>   test1 <- 6 + test1
>   test2 <- 2 + test1
>   test4 <- 1
>   test5 <- 1
> } else {
>   test1 <- test2 <- test4 <- test5 <- 0
> })
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Quoting smooth random terms mccv::gam

2018-12-17 Thread Simon Wood
I would quote the p-value, but not the statistic (as it is not a 
standard F stat). The actual statistic is given here:


https://academic.oup.com/biomet/article-pdf/100/4/1005/566200/ast038.pdf

On 14/12/2018 04:33, Smith, Desmond wrote:

Dear All,

I have a mgcv::gam model of the form:

m1 <- gam(Y ~ A + s(B, bs = "re"), data = dataframe, family = gaussian, method = 
"REML")

The random term is quoted in summary(m1) as, for example,

Approximate significance of smooth terms:
# edf Ref.df  F p-value
s(B)  4.486  5 97.195 6.7e-08 ***

My question is, how would I quote this result (statistic and P value) in a 
formal document?

For example, one possibility is F[4.486,5] = 97.195, P = 6.7e-08. However, 
arguing against this, “reverse engineering” of the result using

pf(q= 97.195, df1= 4.486, df2= 5, lower.tail=FALSE)

gives an incorrect p value:

[1] 0.1435508

I would be very grateful for your advice. Many thanks for your help!



UCLA HEALTH SCIENCES IMPORTANT WARNING: This email (and any attachments) is 
only intended for the use of the person or entity to which it is addressed, and 
may contain information that is privileged and confidential. You, the 
recipient, are obligated to maintain it in a safe, secure and confidential 
manner. Unauthorized redisclosure or failure to maintain confidentiality may 
subject you to federal and state penalties. If you are not the intended 
recipient, please immediately notify us by return email, and delete this 
message from your computer.

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] RTerm.exe

2018-12-17 Thread Duncan Murdoch

On 17/12/2018 9:41 AM, pe...@mybetterlife.co.uk wrote:

Hi,

  


I have recently downloaded R and accessed.  I  am using Zorro (a development
platform for ForeX) and requested to add my path for RTerm.exe

  


I can't find RTerm.exe and confirm that I appear to load R using "C:\Program
Files\R\R-3.5.1\bin\x64\Rgui.exe"

  


Not sure what your question is, but the executable is called Rterm.exe, 
and should be in the same directory as Rgui.exe.


Duncan Murdoch

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] R code for if-then-do code blocks

2018-12-17 Thread Paul Miller via R-help
Hello All,

Season's greetings!

 Am trying to replicate some SAS code in R. The SAS code uses if-then-do code 
blocks. I've been trying to do likewise in R as that seems to be the most 
reliable way to get the same result. 

Below is some toy data and some code that does work. There are some things I 
don't necessarily like about the code though. So I was hoping some people could 
help make it better. One thing I don't like is that the within function 
reverses the order of the computed columns such that test1:test5 becomes 
test5:test1. I've used a mutate to overcome that but would prefer not to have 
to do so. 

 Another, perhaps very small thing, is the need to calculate an ID variable 
that becomes the basis for a grouping. 

I did considerable Internet searching for R code that conditionally computes 
blocks of code. I didn't find much though and so am wondering if my search 
terms were not sufficient or if there is some other reason. It occurred to me 
that maybe if-then-do code blocks like we often see in SAS as are frowned upon 
and therefore not much implemented. 

I'd be interested in seeing more R-compatible approaches if this is the case. 
I've learned that it's a mistake to try and make R be like SAS. It's better to 
let R be R. Trouble is I'm not always sure how to do that. 

Thanks,

Paul


d1 <- data.frame(workshop=rep(1:2,4),
    gender=rep(c("f","m"),each=4))

library(tibble)
library(plyr)

d2 <- d1 %>%
  rownames_to_column("ID") %>%
  mutate(test1 = NA, test2 = NA, test4 = NA, test5 = NA) %>%
  ddply("ID",
    within,
    if (gender == "f" & workshop == 1) {
  test1 <- 1
  test1 <- 6 + test1
  test2 <- 2 + test1
  test4 <- 1
  test5 <- 1
    } else {
  test1 <- test2 <- test4 <- test5 <- 0
    })

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] RTerm.exe

2018-12-17 Thread peter
Hi,

 

I have recently downloaded R and accessed.  I  am using Zorro (a development
platform for ForeX) and requested to add my path for RTerm.exe

 

I can't find RTerm.exe and confirm that I appear to load R using "C:\Program
Files\R\R-3.5.1\bin\x64\Rgui.exe"

 

 

 

 

Regards

Peter H. Williams

 

113 Skelmorlie Castle Road

Skelmorlie

Ayrshire

PA17 5AL

TEL:- 01475 529946

Mobile:- 07773 348117

Skype:-   peterhwuk

 

 

 

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R-es] Random Forest con poca "n" y muchos predictores

2018-12-17 Thread rubenfcasal
Hola de nuevo,

Se me olvidó comentar que adicionalmente RF selecciona al azar las 
variables explicativas en cada ajuste. Para más detalles recomendaría el 
libro:
An Introduction to Statistical Learning 
(http://www-bcf.usc.edu/~gareth/ISL; disponible de forma gratuita en 
pdf), e incluso hacer el correspondiente curso gratuito 
https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/about.

Un saludo, Rubén.


El 17/12/2018 a las 12:50, Rubén Fernández Casal escribió:
> Hola Gemma,
>
> En principio con el random forest no tendrías mucho problema. En 
> general con pocos datos los métodos de aprendizaje estadístico / 
> automático que requieren de una muestra de aprendizaje y otra de 
> validación podrían tener problemas. En estos casos sería recomendable 
> hacer bagging, remuestreo del conjunto de datos de entrenamiento, y 
> eso ya es lo que hacen los algoritmos estándar de RF como el 
> implementado en randomForest...
>
> Un saludo, Rubén.
>
>
> El jue., 13 de diciembre de 2018 10:01, Gemma Ruiz-Olalla 
> mailto:gemma.ruizola...@gmail.com>> escribió:
>
> Hola,
>
> Me he iniciado hace poco en Machine Learning, y tengo una duda
> sobre mis
> conjuntos de datos: el primero tiene 37 variables explicativas y 116
> instancias, y el segundo, 140 variables explicativas y 195
> instancias. El
> primero lo veo bien, ya que hay 3 veces más casos que variables
> explicativas, pero creo que el segundo caso puede suponer un
> problema al
> haber casi el mismo número de predictores que de casos, verdad?
>
> Para "arreglar" esto (en un Random Forest), tendría sentido hacer
> iterar el
> train() unas 50-100 veces? Ir guardando estos modelos
> resultantes (entrenados) en una lista, para luego hacer una especie de
> promedio con ellos, y éste resultante (sus parámetros ntree y
> mtry) usarlo
> para generar el modelo randomForest() definitivo.
>
> Tiene sentido, o qué podría hacer si no?
>
> Muchas gracias!
>
> -- 
> Gemma Ruiz-Olalla
> gemma.ruizola...@gmail.com 
>
>         [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org 
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>


-- 
Ruben Fernandez Casal
https://rubenfcasal.github.io
Department of Mathematics
Faculty of Computer Science
Universidade da Coruña
Corporate email: ruben.fcasal  udc  es
--


[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R-es] Random Forest con poca "n" y muchos predictores

2018-12-17 Thread Rubén Fernández Casal
Hola Gemma,

En principio con el random forest no tendrías mucho problema. En general
con pocos datos los métodos de aprendizaje estadístico / automático que
requieren de una muestra de aprendizaje y otra de validación podrían tener
problemas. En estos casos sería recomendable hacer bagging, remuestreo del
conjunto de datos de entrenamiento, y eso ya es lo que hacen los algoritmos
estándar de RF como el implementado en randomForest...

Un saludo, Rubén.


El jue., 13 de diciembre de 2018 10:01, Gemma Ruiz-Olalla <
gemma.ruizola...@gmail.com> escribió:

> Hola,
>
> Me he iniciado hace poco en Machine Learning, y tengo una duda sobre mis
> conjuntos de datos: el primero tiene 37 variables explicativas y 116
> instancias, y el segundo, 140 variables explicativas y 195 instancias. El
> primero lo veo bien, ya que hay 3 veces más casos que variables
> explicativas, pero creo que el segundo caso puede suponer un problema al
> haber casi el mismo número de predictores que de casos, verdad?
>
> Para "arreglar" esto (en un Random Forest), tendría sentido hacer iterar el
> train() unas 50-100 veces? Ir guardando estos modelos
> resultantes (entrenados) en una lista, para luego hacer una especie de
> promedio con ellos, y éste resultante (sus parámetros ntree y mtry) usarlo
> para generar el modelo randomForest() definitivo.
>
> Tiene sentido, o qué podría hacer si no?
>
> Muchas gracias!
>
> --
> Gemma Ruiz-Olalla
> gemma.ruizola...@gmail.com
>
> [[alternative HTML version deleted]]
>
> ___
> R-help-es mailing list
> R-help-es@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-help-es
>

[[alternative HTML version deleted]]

___
R-help-es mailing list
R-help-es@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-help-es


Re: [R] display of ECDF

2018-12-17 Thread PIKAL Petr
Hi

You should add limits and maybe some rotation to x axis

ggplot(df, aes(x, colour = g)) + stat_ecdf()  +
scale_x_log10(breaks=breaks, limits=c(0.01, 100))+
theme(axis.text.x = element_text(size=8, angle=45))

Cheers
Petr

> -Original Message-
> From: R-help  On Behalf Of Bogdan Tanasa
> Sent: Monday, December 17, 2018 10:42 AM
> To: r-help 
> Subject: [R] display of ECDF
>
> Dear all,
>
> please could you advise me on the following : I would like to display a few 
> CDF
> data (the R code is below), by using a set of numerical BREAKS on a X axis to 
> be
> shown at EQUAL DISTANCE from each other (although numerically, the BREAKS
> are on log10 axis and do not reflecting an equal distance):
>
> df <- data.frame(
>   x = c(rnorm(100, 0, 3), rnorm(100, 0, 10)),
>   g = gl(2, 100)
> )
>
> breaks=c(0.001, 0.01, 0.1, 1, 5, 10, 20, 30, 100)
>
> ggplot(df, aes(x, colour = g)) + stat_ecdf()  + scale_x_log10(breaks=breaks),
>
> how shall I do it ? thanks a lot !
>
> -- bogdan
>
> [[alternative HTML version deleted]]
>
> __
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Osobní údaje: Informace o zpracování a ochraně osobních údajů obchodních 
partnerů PRECHEZA a.s. jsou zveřejněny na: 
https://www.precheza.cz/zasady-ochrany-osobnich-udaju/ | Information about 
processing and protection of business partner’s personal data are available on 
website: https://www.precheza.cz/en/personal-data-protection-principles/
Důvěrnost: Tento e-mail a jakékoliv k němu připojené dokumenty jsou důvěrné a 
podléhají tomuto právně závaznému prohláąení o vyloučení odpovědnosti: 
https://www.precheza.cz/01-dovetek/ | This email and any documents attached to 
it may be confidential and are subject to the legally binding disclaimer: 
https://www.precheza.cz/en/01-disclaimer/

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] display of ECDF

2018-12-17 Thread Bogdan Tanasa
Dear all,

please could you advise me on the following : I would like to display a few
CDF data (the R code is below), by using a set of numerical BREAKS on a X
axis to be shown at EQUAL DISTANCE from each other (although numerically,
the BREAKS are on log10 axis and do not reflecting an equal distance):

df <- data.frame(
  x = c(rnorm(100, 0, 3), rnorm(100, 0, 10)),
  g = gl(2, 100)
)

breaks=c(0.001, 0.01, 0.1, 1, 5, 10, 20, 30, 100)

ggplot(df, aes(x, colour = g)) + stat_ecdf()  +
scale_x_log10(breaks=breaks),

how shall I do it ? thanks a lot !

-- bogdan

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.