Re: [R] Error message cs_lu(A) failed: near-singular A (or out of memory)
Dear Rui If you impose the homogeneity (adding-up) restrictions, your system becomes singular, because the error terms of the share equations always sum up to zero. Therefore, you can arbitrarily delete one of the share equations and obtain the coefficients that were not directly estimated by the homogeneity restrictions. Furthermore, you can impose the homogeneity restriction at each single equation by normalization with one input price (numéraire). Finally, I suggest to impose the cross-equation restrictions by the argument restrict.regMat rather than by argument restrict.matrix, because the documentation says the advantage of imposing restrictions on the coefficients by 'restrict.regMat' is that the matrix, which has to be inverted during the estimation, gets smaller by this procedure, while it gets larger if the restrictions are imposed by 'restrict.matrix' and 'restrict.rhs'. I will send you my lecture notes on econometric production analysis with R by private mail. Please do not forget to cite R and the R packages that you use in your publications. If you have further questions regarding system estimation, microeconomic analysis, or stochastic frontier (efficiency) analysis with R, you can use a forum at the R-Forge sites of the systemfit [1,2], micEcon [3,4], or frontier [5] packages/projects, respectively. [1] http://www.systemfit.org/ [2] http://r-forge.r-project.org/projects/systemfit/ [3] http://www.micEcon.org/ [4] http://r-forge.r-project.org/projects/micecon/ [5] http://r-forge.r-project.org/projects/frontier/ Best (holiday) wishes from Copenhagen, Arne On 9 December 2012 23:31, Rui Neiva ruiqne...@gmail.com wrote: Hi there everyone, I have the following model (this is naturally a simplified version just for showing my problem, in case you're wondering this is a translog cost function with the associated cost share equations): C ~ á + â1 log X + â2 log Y + ã1 log Z + ã2 log XX C1 ~ â1 + â2 log YY + ã1 log ZZ Then I have some restrictions on the coefficients, namely that the sum of â equal 1 and the some of ã equal zero So, I've added the following equations to the model C2 ~ 0 + â1.Y1 + â2.Y2 C3 ~ 0 + ã1.Y3 + ã2. Y4 I've created columns in my data frame with values of 0 for variable C3 and values of 1 for Y1, Y2, Y3, Y4 and C2 I'm using the systemfit package to solve a multiple equation system using the SURE method, and using a matrix to impose the restrictions on the coefficients (i.e., that the â1 in all equations is the same value, and the same for all the other coefficients). When I try to run the model without the restricting equations (C2, C3) it runs just fine, but when I add these two equations I get the error: Error in solve(xtOmegaInv %*% xMat2, xtOmegaInv %*% yVec, tol = solvetol) : cs_lu(A) failed: near-singular A (or out of memory) Any ideas on what the problem might be? All the best, Rui Neiva P.S.: I've also posted this question on the Matrix help forum, but since I do not know how active that forum is I've decided to see if anyone in the mailing list would be able to help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Arne Henningsen http://www.arne-henningsen.name __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply with missing values
A bit of data might be useful. Make a small example and post the data with dput(). On 24.12.2012, at 20:21, jenna wrote: I am trying to get the means of each participants avg saccade amplitude as a function of the group they were in (designated by shape_), but there are missing values in the datasetthis is what i tried... with(data055,tapply(AVERAGE_SACCADE_AMPLITUDE,shape_,mean)) i get an error saying the argument is not numerical or logical next i try... 055 - tapply(data055$AVERAGE_SACCADE_AMPLITUDE, + data055$shape_,mean, na.rm =TRUE) still nothing help? thanks! -- View this message in context: http://r.789695.n4.nabble.com/apply-with-missing-values-tp4653889.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] as.Date to as.POSIXct
Hello, I have converted some UNIX time stamps with as.Date as follows dates_unix - seq( as.Date(convertTimeToString(timeStart)), length = length(data), by=1 mon) and now I would like to convert dates_unix from type Date to type POSIXct How can I do that? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R and Matlab
Simplest way is the call a system command, using R CMD. See :http://stackoverflow.com/questions/6695105/call-r-scripts-in-matlab But there are more complicated solutions are proposed: http://www.mathworks.co.uk/matlabcentral/fileexchange/5051 This is uses R-(D)-COM In my opinion most robust integration can be achived via Java code that runs R. So you can easily use that in Matlab. On 24 December 2012 20:42, Amirehsan Ranginkaman aeranginka...@gmail.com wrote: Hi, How can I call R functions or Package in MatLab? Is there any way? Thanks Regards Ranginkaman [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sampling data without having infinite numbers after diong a transformation
Hello R-helpers.. I want to ask about how I can sample data sets without having the infinite numbers coming out. For example, set.seed(1234) a-rnorm(15,0,1) b-rnorm(15,0,1) c-rnorm(15,0,1) d-rnorm(15,0,36) After come out with the sample, I need to do a transformation (by Hoaglin, 1985) for each data set. Actually I need to measure the skewness and kurtosis, that's why I need to do the transformation. After transformation, there will be 'Inf' value in my data sets and I cannot proceed with the next step where I need to compute the trimmed mean and sum square of deviation. If anyone can help on how to obtain a better data sets so that my programme will work. Thank you. Best regards, Hyo Min UPM Malaysia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate / collapse big data frame efficiently
Hi, I need to aggregate rows of a data.frame by computing the mean for rows with the same factor-level on one factor-variable; here is the sample code: x - data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52)) aggregate(x, list(x[,1]), mean) Now my problem is, that the actual data-set is much bigger (120 rows and approximately 100.000 columns) – and it takes very very long (actually at some point I just stopped it). Is there anything that can be done to make the aggregate routine more efficient? Or is there a different approach that would work faster? Thanks for any suggestions! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.Date to as.POSIXct
Hi, You should have provided a reproducible example. To convert in general, ?as.POSIXct() A.K. - Original Message - From: Antonio Rodriges antonio@gmail.com To: r-help@r-project.org Cc: Sent: Tuesday, December 25, 2012 7:28 AM Subject: [R] as.Date to as.POSIXct Hello, I have converted some UNIX time stamps with as.Date as follows dates_unix - seq( as.Date(convertTimeToString(timeStart)), length = length(data), by=1 mon) and now I would like to convert dates_unix from type Date to type POSIXct How can I do that? Thanks __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for loop not working
Dear Arun, thank-you very much. i m extremely sorry for spoiling your Christmas. eliza Date: Tue, 25 Dec 2012 08:51:05 -0800 From: smartpink...@yahoo.com Subject: Re: [R] for loop not working To: eliza_bo...@hotmail.com CC: r-help@r-project.org; kri...@ymail.com HI Eliza, Try this: set.seed(15) mat1-matrix(sample(1:2000,1776,replace=TRUE),ncol=444) colnames(mat1)-paste(Col,1:444,sep=) res1-lapply(1:37,function(i) mat1[,seq(i,444,37)]) res2-lapply(1:37,function(i) {a-mat1[,i:444];a[,c(TRUE,rep(FALSE,36))]}) #your code identical(res1,res2) #[1] TRUE From: eliza botto eliza_bo...@hotmail.com To: smartpink...@yahoo.com smartpink...@yahoo.com Cc: r-help@r-project.org r-help@r-project.org; kri...@ymail.com Sent: Tuesday, December 25, 2012 1:57 AM Subject: RE: [R] for loop not working Dear Arun, as usuall you were spot on. i tried the following lapply(seq_len(ncol(e)), function(i) { a-e[,(e[i]:444)] a[,c(TRUE, rep(FALSE,36))] }) but it never worked. thanks for your kind help. lots of love elisa Date: Mon, 24 Dec 2012 22:40:08 -0800 From: smartpink...@yahoo.com Subject: Re: [R] for loop not working To: eliza_bo...@hotmail.com CC: r-help@r-project.org; kri...@ymail.com HI Eliza, You could try this: set.seed(15) mat1-matrix(sample(1:2000,1776,replace=TRUE),ncol=444) colnames(mat1)-paste(Col,1:444,sep=) res-lapply(seq_len(ncol(mat1)),function(i) mat1[,seq(i,444,37)]) #If you want only this from 1:37, then res1-lapply(1:37,function(i) mat1[,seq(i,444,37)]) A.K. - Original Message - From: eliza botto eliza_bo...@hotmail.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Tuesday, December 25, 2012 12:03 AM Subject: [R] for loop not working dear R family,i have a matrix of 444 columns. what i want to do is the following. 1. starting from column 1 i want to select every 37th column on the way. more precisely i want to select column 1, 38,75,112,149 and so on. 2.starting from column 2, i again want to select every 37th column. which means 2,39,76,113,150 and so on. similarly starting from 3 till 37th column. i have tried following loop command which is not working.can anyone plz see whats wrong in that? for (i in 1:37) { a-e[,e[i]:444] } lapply(seq_len(1), function(i) { a[,c(TRUE, rep(FALSE,1))] }) extremly sorry for bothering you once again.. eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sampling data without having infinite numbers after diong a transformation
Perhaps you should read the help file for rnorm more carefully. ?rnorm Keep in mind that the normal probability distribution is a density function, so the smaller the standard deviation is, the greater the magnitude of the density function is. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Agnes Ayang agnes.ay...@yahoo.com wrote: Hello R-helpers.. I want to ask about how I can sample data sets without having the infinite numbers coming out. For example, set.seed(1234) a-rnorm(15,0,1) b-rnorm(15,0,1) c-rnorm(15,0,1) d-rnorm(15,0,36) After come out with the sample, I need to do a transformation (by Hoaglin, 1985) for each data set. Actually I need to measure the skewness and kurtosis, that's why I need to do the transformation. After transformation, there will be 'Inf' value in my data sets and I cannot proceed with the next step where I need to compute the trimmed mean and sum square of deviation. If anyone can help on how to obtain a better data sets so that my programme will work. Thank you. Best regards, Hyo Min UPM Malaysia [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate / collapse big data frame efficiently
You might consider using the sqldf package. --- Jeff NewmillerThe . . Go Live... DCN:jdnew...@dcn.davis.ca.usBasics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/BatteriesO.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --- Sent from my phone. Please excuse my brevity. Martin Batholdy batho...@googlemail.com wrote: Hi, I need to aggregate rows of a data.frame by computing the mean for rows with the same factor-level on one factor-variable; here is the sample code: x - data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52)) aggregate(x, list(x[,1]), mean) Now my problem is, that the actual data-set is much bigger (120 rows and approximately 100.000 columns) – and it takes very very long (actually at some point I just stopped it). Is there anything that can be done to make the aggregate routine more efficient? Or is there a different approach that would work faster? Thanks for any suggestions! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate / collapse big data frame efficiently
I'd suggest the 'data.table' package. That is one of the prime uses it was created for. Pat On 25/12/2012 16:34, Martin Batholdy wrote: Hi, I need to aggregate rows of a data.frame by computing the mean for rows with the same factor-level on one factor-variable; here is the sample code: x - data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52)) aggregate(x, list(x[,1]), mean) Now my problem is, that the actual data-set is much bigger (120 rows and approximately 100.000 columns) – and it takes very very long (actually at some point I just stopped it). Is there anything that can be done to make the aggregate routine more efficient? Or is there a different approach that would work faster? Thanks for any suggestions! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Patrick Burns pbu...@pburns.seanet.com twitter: @portfolioprobe http://www.portfolioprobe.com/blog http://www.burns-stat.com (home of 'Some hints for the R beginner' and 'The R Inferno') __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate / collapse big data frame efficiently
According to the way that you have used 'aggregate', you are taking the column means. Couple of suggestions for faster processing: 1. use matrices instead of data.frames ( i converted your example just before using it) 2, use the 'colMeans' I created a 120 x 10 matrix with 10 levels and its does the computation in less than 2 seconds: n - 10 nLevels - 10 nRows - 120 Cols - list(rep(list(sample(nRows)), n)) df - data.frame(levels = sample(nLevels, nRows, TRUE), Cols) colnames(df)[-1] - paste0('col', 1:n) # convert to matrix for faster processing df.m - as.matrix(df[, -1]) # remove levels column str(df.m) int [1:120, 1:10] 111 13 106 61 16 39 25 94 53 38 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:10] col1 col2 col3 col4 ... system.time({ + # split the indices of rows for each level + x - split(seq(nrow(df)), df$levels) + result - sapply(x, function(a) colMeans(df.m[a, ])) + }) user system elapsed 1.330.001.35 str(result) num [1:10, 1:10] 57 57 57 57 57 57 57 57 57 57 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:10] col1 col2 col3 col4 ... ..$ : chr [1:10] 1 2 3 4 ... On Tue, Dec 25, 2012 at 11:34 AM, Martin Batholdy batho...@googlemail.com wrote: Hi, I need to aggregate rows of a data.frame by computing the mean for rows with the same factor-level on one factor-variable; here is the sample code: x - data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52)) aggregate(x, list(x[,1]), mean) Now my problem is, that the actual data-set is much bigger (120 rows and approximately 100.000 columns) – and it takes very very long (actually at some point I just stopped it). Is there anything that can be done to make the aggregate routine more efficient? Or is there a different approach that would work faster? Thanks for any suggestions! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] apply with missing values
On Dec 24, 2012, at 11:21 AM, jenna wrote: I am trying to get the means of each participants avg saccade amplitude as a function of the group they were in (designated by shape_), but there are missing values in the datasetthis is what i tried... with(data055,tapply(AVERAGE_SACCADE_AMPLITUDE,shape_,mean)) i get an error saying the argument is not numerical or logical The error message appears informative (and would NOT be expected to be solved by adding na.rm=TRUE). Your construction of the data055 dataframe was flawed in some way. You have not shown how it was done nor given dput(head(data055)) which would have provided a basis for further assessment. Based on past experience my guess is that there was a character value in the column of data you thought was numeric during a read.* operation and is now a factor. next i try... 055 - tapply(data055$AVERAGE_SACCADE_AMPLITUDE, + data055$shape_,mean, na.rm =TRUE) still nothing -- David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pscore.dist problem when running optmatch
Hello, Did you contact the package maintainer? Mark M. Fredrickson mark.m.fredrick...@gmail.com There is also a webpage: https://github.com/markmfredrickson/optmatch Regards, Pascal Le 18/12/2012 21:37, MA a écrit : Hello My optmatch package is loaded and otherwise running fine. I get an error after lcds successfully completes logistic regression and I'm trying to obtain a propensity score: pdist - pscore.dist(lcds) Error: could not find function pscore.dist I searched the help files, other online sources, could find no answer for this. Any advice would be greatly appreciated! Thank you Michael Adolph __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate / collapse big data frame efficiently
aggregate() is not efficient. try by(). On Tue, Dec 25, 2012 at 11:34 AM, Martin Batholdy batho...@googlemail.comwrote: Hi, I need to aggregate rows of a data.frame by computing the mean for rows with the same factor-level on one factor-variable; here is the sample code: x - data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52)) aggregate(x, list(x[,1]), mean) Now my problem is, that the actual data-set is much bigger (120 rows and approximately 100.000 columns) and it takes very very long (actually at some point I just stopped it). Is there anything that can be done to make the aggregate routine more efficient? Or is there a different approach that would work faster? Thanks for any suggestions! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- == WenSui Liu Credit Risk Manager, 53 Bancorp wensui@53.com 513-295-4370 == [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] for loop not working
HI Eliza, Try this: set.seed(15) mat1-matrix(sample(1:2000,1776,replace=TRUE),ncol=444) colnames(mat1)-paste(Col,1:444,sep=) res1-lapply(1:37,function(i) mat1[,seq(i,444,37)]) res2-lapply(1:37,function(i) {a-mat1[,i:444];a[,c(TRUE,rep(FALSE,36))]}) #your code identical(res1,res2) #[1] TRUE From: eliza botto eliza_bo...@hotmail.com To: smartpink...@yahoo.com smartpink...@yahoo.com Cc: r-help@r-project.org r-help@r-project.org; kri...@ymail.com Sent: Tuesday, December 25, 2012 1:57 AM Subject: RE: [R] for loop not working Dear Arun, as usuall you were spot on. i tried the following lapply(seq_len(ncol(e)), function(i) { a-e[,(e[i]:444)] a[,c(TRUE, rep(FALSE,36))] }) but it never worked. thanks for your kind help. lots of love elisa Date: Mon, 24 Dec 2012 22:40:08 -0800 From: smartpink...@yahoo.com Subject: Re: [R] for loop not working To: eliza_bo...@hotmail.com CC: r-help@r-project.org; kri...@ymail.com HI Eliza, You could try this: set.seed(15) mat1-matrix(sample(1:2000,1776,replace=TRUE),ncol=444) colnames(mat1)-paste(Col,1:444,sep=) res-lapply(seq_len(ncol(mat1)),function(i) mat1[,seq(i,444,37)]) #If you want only this from 1:37, then res1-lapply(1:37,function(i) mat1[,seq(i,444,37)]) A.K. - Original Message - From: eliza botto eliza_bo...@hotmail.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Tuesday, December 25, 2012 12:03 AM Subject: [R] for loop not working dear R family,i have a matrix of 444 columns. what i want to do is the following. 1. starting from column 1 i want to select every 37th column on the way. more precisely i want to select column 1, 38,75,112,149 and so on. 2.starting from column 2, i again want to select every 37th column. which means 2,39,76,113,150 and so on. similarly starting from 3 till 37th column. i have tried following loop command which is not working.can anyone plz see whats wrong in that? for (i in 1:37) { a-e[,e[i]:444] } lapply(seq_len(1), function(i) { a[,c(TRUE, rep(FALSE,1))] }) extremly sorry for bothering you once again.. eliza [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate / collapse big data frame efficiently
Hi, Jim's method was found to be faster than data.table() n - 1 nLevels - 10 nRows - 120 Cols - list(rep(list(sample(nRows)), n)) df - data.frame(levels = sample(nLevels, nRows, TRUE), Cols) colnames(df)[-1] - paste0('col', 1:n) # convert to matrix for faster processing df.m - as.matrix(df[, -1]) # remove levels column system.time({ # split the indices of rows for each level x - split(seq(nrow(df)), df$levels) result - sapply(x, function(a) colMeans(df.m[a, ])) }) # user system elapsed # 0.056 0.000 0.056 library(data.table) df.dt-data.table(df) setkey(df.dt,levels) system.time({ result1- df.dt[,lapply(.SD,mean),by=levels]}) # user system elapsed # 7.756 0.000 7.771 system.time({result2-df.dt[,list(Mean=colMeans(.SD)),by=levels]}) # user system elapsed # 2.188 0.000 2.193 A.K. - Original Message - From: jim holtman jholt...@gmail.com To: Martin Batholdy batho...@googlemail.com Cc: r-help@r-project.org r-help@r-project.org Sent: Tuesday, December 25, 2012 1:20 PM Subject: Re: [R] aggregate / collapse big data frame efficiently According to the way that you have used 'aggregate', you are taking the column means. Couple of suggestions for faster processing: 1. use matrices instead of data.frames ( i converted your example just before using it) 2, use the 'colMeans' I created a 120 x 10 matrix with 10 levels and its does the computation in less than 2 seconds: n - 10 nLevels - 10 nRows - 120 Cols - list(rep(list(sample(nRows)), n)) df - data.frame(levels = sample(nLevels, nRows, TRUE), Cols) colnames(df)[-1] - paste0('col', 1:n) # convert to matrix for faster processing df.m - as.matrix(df[, -1]) # remove levels column str(df.m) int [1:120, 1:10] 111 13 106 61 16 39 25 94 53 38 ... - attr(*, dimnames)=List of 2 ..$ : NULL ..$ : chr [1:10] col1 col2 col3 col4 ... system.time({ + # split the indices of rows for each level + x - split(seq(nrow(df)), df$levels) + result - sapply(x, function(a) colMeans(df.m[a, ])) + }) user system elapsed 1.33 0.00 1.35 str(result) num [1:10, 1:10] 57 57 57 57 57 57 57 57 57 57 ... - attr(*, dimnames)=List of 2 ..$ : chr [1:10] col1 col2 col3 col4 ... ..$ : chr [1:10] 1 2 3 4 ... On Tue, Dec 25, 2012 at 11:34 AM, Martin Batholdy batho...@googlemail.com wrote: Hi, I need to aggregate rows of a data.frame by computing the mean for rows with the same factor-level on one factor-variable; here is the sample code: x - data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52)) aggregate(x, list(x[,1]), mean) Now my problem is, that the actual data-set is much bigger (120 rows and approximately 100.000 columns) – and it takes very very long (actually at some point I just stopped it). Is there anything that can be done to make the aggregate routine more efficient? Or is there a different approach that would work faster? Thanks for any suggestions! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate / collapse big data frame efficiently
Hi, You could use library(data.table) x - data.frame(A=rep(letters,2), B=rnorm(52), C=rnorm(52), D=rnorm(52)) res- with(x,aggregate(cbind(B,C,D),by=list(A),mean)) colnames(res)[1]-A x1-data.table(x) res2- x1[,list(B=mean(B),C=mean(C),D=mean(D)),by=A] identical(res,data.frame(res2)) #[1] TRUE Just for comparison: set.seed(25) xnew-data.frame(A=rep(letters,1500),B=rnorm(39000),C=rnorm(39000),D=rnorm(39000)) system.time(resnew-with(xnew,aggregate(cbind(B,C,D),by=list(A),mean))) #user system elapsed # 0.152 0.000 0.152 xnew1-data.table(xnew) system.time(resnew1- xnew1[,list(B=mean(B),C=mean(C),D=mean(D)),by=A]) # user system elapsed # 0.004 0.000 0.005 A.K. - Original Message - From: Martin Batholdy batho...@googlemail.com To: r-help@r-project.org r-help@r-project.org Cc: Sent: Tuesday, December 25, 2012 11:34 AM Subject: [R] aggregate / collapse big data frame efficiently Hi, I need to aggregate rows of a data.frame by computing the mean for rows with the same factor-level on one factor-variable; here is the sample code: x - data.frame(rep(letters,2), rnorm(52), rnorm(52), rnorm(52)) aggregate(x, list(x[,1]), mean) Now my problem is, that the actual data-set is much bigger (120 rows and approximately 100.000 columns) – and it takes very very long (actually at some point I just stopped it). Is there anything that can be done to make the aggregate routine more efficient? Or is there a different approach that would work faster? Thanks for any suggestions! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] splitting a long dataframe
Dear all...Merry Christmas I would like to split a long dataframe. The dataframe looks like this x-c('0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00', '0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00', '3:30:00', '4:00:00','0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00', '0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00' , '3:30:00', '4:00:00') y=seq(1:32) data1=data.frame(x,y) i want to split in such a way that the output looks like 0:00:00 1 8 17 24 0:30:00 2 9 18 25 1:00:00 3 10 19 26 1:30:00 4 11 20 27 2:00:00 5 12 21 28 2:30:00 6 13 22 29 3:00:00 7 14 23 30 3:30:00 NA 15 NA 31 4:00:00 NA 16 NA 32 any ideas or functions that i look into for doing this? Thanks a lot for your help and time. Cheers, Swagath __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] path analysis
What's the function of 'path analysis ' to do it with R? Please help me.Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] path analysis
First, hello, Second, http://r.789695.n4.nabble.com/path-analysis-td2528558.html#a2530207 Last, Regards Le 26/12/2012 04:11, Ali Mahmoudi a écrit : What's the function of 'path analysis ' to do it with R? Please help me.Thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] splitting a long dataframe
On Dec 25, 2012, at 9:52 AM, Swagath wrote: Dear all...Merry Christmas I would like to split a long dataframe. The dataframe looks like this x-c('0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00', '0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00', '3:30:00', '4:00:00','0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00', '0:00:00', '0:30:00', '1:00:00', '1:30:00', '2:00:00', '2:30:00', '3:00:00' , '3:30:00', '4:00:00') y=seq(1:32) data1=data.frame(x,y) i want to split in such a way that the output looks like 0:00:00 1 8 17 24 0:30:00 2 9 18 25 1:00:00 3 10 19 26 1:30:00 4 11 20 27 2:00:00 5 12 21 28 2:30:00 6 13 22 29 3:00:00 7 14 23 30 3:30:00 NA 15 NA 31 4:00:00 NA 16 NA 32 any ideas or functions that i look into for doing this? You already have 3 distinct solutions on StackOverflow. David Winsemius, MD Alameda, CA, USA __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.