Re: [R] Efficient way to subset rows in R for dataset with 10^7 columns

2018-04-13 Thread Jeff Newmiller
Oh, there are ways, but the constraining issue here is moving data (memory bandwidth), and data table is probably already the fastest mechanism for doing that. If you have a computer with four or more real cores you can try setting up a subset of the columns in each task and cbind the results

Re: [R] Bivariate Normal Distribution Plots

2018-04-13 Thread Marc Girondot via R-help
Try this code: # Standard deviations and correlation sig_x <- 1 sig_y <- 2 rho_xy <- 0.7 # Covariance between X and Y sig_xy <- rho_xy * sig_x *sig_y # Covariance matrix Sigma_xy <- matrix(c(sig_x ^ 2, sig_xy, sig_xy, sig_y ^ 2), nrow = 2, ncol = 2) # Load the mvtnorm package

Re: [R] Efficient way to subset rows in R for dataset with 10^7 columns

2018-04-13 Thread Jeff Newmiller
You have 10^7 columns? That process is bound to be slow. On April 13, 2018 5:31:32 PM PDT, Jack Arnestad wrote: >I have a data.table with dimensions 100 by 10^7. > >When I do > >trainIndex <- > caret::createDataPartition( >df$status, >p = .9, >

[R] Efficient way to subset rows in R for dataset with 10^7 columns

2018-04-13 Thread Jack Arnestad
I have a data.table with dimensions 100 by 10^7. When I do trainIndex <- caret::createDataPartition( df$status, p = .9, list = FALSE, times = 1 ) outerTrain <- df[trainIndex] outerTest <- df[-trainIndex] Subsetting the rows of df takes

[R] Extracting specified pages from a lattice ("trellis") object.

2018-04-13 Thread Rolf Turner
Suppose that (e.g.) xyplot() returns an object "xxx" with (say) 3 pages. I would like to extract/plot (print) just one of these pages, e.g. page 2. Here's a toy example: x <- rep(seq(0,1,length=11),12) set.seed(42) y <- rnorm(3*44) a <- rep(letters[1:12],each=11) dta <-

[R] cvTools for 2 models not working

2018-04-13 Thread varin sacha via R-help
Dear R-experts, I am trying to do cross-validation for different models using the cvTools package. I can't get the CV for the "FastTau" and "hbrfit". I guess I have to write my own functions at least for hbrfit. What is going wrong with FastTau ? Here below the reproducible example. It is a

Re: [R] Reading xpt files into R

2018-04-13 Thread David Winsemius
> On Apr 13, 2018, at 10:01 AM, WRAY NICHOLAS via R-help > wrote: > > Hello R folk > > I have an xpt file which I have been trying to open into R in R studio > > On the net I found guidance which says that I need packages Hmisc and > SASxport which I have successfully

[R] SparksR

2018-04-13 Thread Jeff Reichman
R-Help I'm working in my first large database (53,098,492,383 records). When I select the db via something like Library(SparkR) mydata <- sql("SELECT * FROM ") is "mydata" a SparkDataFrame, and do I work with SparkDataFrames like I would regular df (per say); because I can't image I

[R] Longitudinal and Multilevel Data in R and Stan: 5-day workshop May 28 to June 1, 2018

2018-04-13 Thread Georges Monette
Longitudinal and Multilevel Data in R and Stan ICPSR short course: May 28 to June 1, 2018 May 28: Introduction to R by John Fox May 29 to June 1: Longitudinal and Multilevel Data in R and Stan by Georges Monette Sponsored and organized by ICPSR, University of Michigan and held at York

[R] Reading xpt files into R

2018-04-13 Thread WRAY NICHOLAS via R-help
Hello R folk I have an xpt file which I have been trying to open into R in R studio On the net I found guidance which says that I need packages Hmisc and SASxport which I have successfully loaded. I had also found some code which says that this would allow me to read the xpt file into R:

Re: [R] Fwd: R Timeseries tsoutliers:tso

2018-04-13 Thread William Dunlap via R-help
You can record the time to evaluate each line by wrapping each line in a call to system.time(). E.g., expressions <- quote({ # paste your commands here, or put them into a file and use exprs <- parse("thatFile") d.dir <- '/Users/darshanpandya/xx' FNAME <- 'my_data.csv' d.input <-

[R] Fwd: R Timeseries tsoutliers:tso

2018-04-13 Thread Darshan Pandya
Hello, Writing to seek help in regard to some unexpected performance anomaly i am observing in using tsoutlers:tso on the mac vs on an AWS cloud server.. I am running the following code with very small dataset of about 208 records. d.dir <- '/Users/darshanpandya/xx' FNAME <- 'my_data.csv'