[R] vectorization
Hi there, I have a data frame (mydata) with 1 numeric variable (income) and 1 factor (education). I want a new column in this data with the median income for each education level. A obviously inneficient way to do this is for ( k in 1: nrow(mydata) ){ l - mydata$education[k] mydata$md[k] - median(mydata$income[mydata$education==l],na.rm=T) } Since mydata has nearly 30.000 rows, this will be done not untill the end of this month. I thus need some help for vectorizing this, please. Thanks, Dimitri [[alternative HTML version deleted]] ___ Instale o discador agora! http://br.acesso.yahoo.com/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] vectorization
Here I go again with ave(): mydata$md - ave(mydata$income, mydata$education, FUN=median, na.rm=TRUE) IMHO it's one of the most under-rated helper functions in R. Andy From: Dimitri Joe Hi there, I have a data frame (mydata) with 1 numeric variable (income) and 1 factor (education). I want a new column in this data with the median income for each education level. A obviously inneficient way to do this is for ( k in 1: nrow(mydata) ){ l - mydata$education[k] mydata$md[k] - median(mydata$income[mydata$education==l],na.rm=T) } Since mydata has nearly 30.000 rows, this will be done not untill the end of this month. I thus need some help for vectorizing this, please. Thanks, Dimitri [[alternative HTML version deleted]] ___ Instale o discador agora! http://br.acesso.yahoo.com/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] vectorization
You can use tapply() to compute the medians, as in meds - tapply(mydata$inc,INDEX=mydata$ed,FUN=median) then create a new column with the medians as medianEd - meds[mydata$ed] Reid Huntsinger -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dimitri Joe Sent: Friday, June 17, 2005 1:01 PM To: R-Help Subject: [R] vectorization Hi there, I have a data frame (mydata) with 1 numeric variable (income) and 1 factor (education). I want a new column in this data with the median income for each education level. A obviously inneficient way to do this is for ( k in 1: nrow(mydata) ){ l - mydata$education[k] mydata$md[k] - median(mydata$income[mydata$education==l],na.rm=T) } Since mydata has nearly 30.000 rows, this will be done not untill the end of this month. I thus need some help for vectorizing this, please. Thanks, Dimitri [[alternative HTML version deleted]] ___ Instale o discador agora! http://br.acesso.yahoo.com/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] vectorization
These two lines worked for me: rst - tapply(mydata$income, mydata$education, median) mydata$md - rst[mydata$education] Here's my cheesy example: mydata - data.frame(income= round(rnorm(3, 55000, 1)), + education = letters[rbinom(3, 4, 1/2)+1]) rst - tapply(mydata$income, mydata$education, median) mydata$md - rst[mydata$education] head(mydata) income education md 1 66223 e 55094.5 2 56830 c 54966.0 3 58035 b 54937.5 4 74045 a 55213.5 5 61327 b 54937.5 6 64150 b 54937.5 Is this what you wanted? Kevin -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dimitri Joe Sent: Friday, June 17, 2005 10:01 AM To: R-Help Subject: [R] vectorization Hi there, I have a data frame (mydata) with 1 numeric variable (income) and 1 factor (education). I want a new column in this data with the median income for each education level. A obviously inneficient way to do this is for ( k in 1: nrow(mydata) ){ l - mydata$education[k] mydata$md[k] - median(mydata$income[mydata$education==l],na.rm=T) } Since mydata has nearly 30.000 rows, this will be done not untill the end of this month. I thus need some help for vectorizing this, please. Thanks, Dimitri [[alternative HTML version deleted]] ___ Instale o discador agora! http://br.acesso.yahoo.com/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] vectorization
try this: x.1 - data.frame(income=runif(100)*1, educ=sample(c('hs','col','none'),100,T)) x.1 income educ 1 5930.30882 col 2 5528.83222 hs 3 5967.04041 hs 4 3926.30682 hs 5 2603.75924 none ... x.2 - tapply(x.1$income, x.1$educ, mean) x.2 col hs none 5575.310 4994.921 5481.962 x.1$median - x.2[x.1$educ] x.1 income educ median 1 5930.30882 col 5575.310 2 5528.83222 hs 4994.921 3 5967.04041 hs 4994.921 4 3926.30682 hs 4994.921 5 2603.75924 none 5481.962 6 7398.83325 col 5575.310 7265.06895 hs 4994.921 . Jim __ James HoltmanWhat is the problem you are trying to solve? Executive Technical Consultant -- Convergys Labs [EMAIL PROTECTED] +1 (513) 723-2929 Dimitri Joe [EMAIL PROTECTED]To: R-Help r-help@stat.math.ethz.ch .br cc: Sent by: Subject: [R] vectorization [EMAIL PROTECTED] ath.ethz.ch 06/17/2005 14:00 Hi there, I have a data frame (mydata) with 1 numeric variable (income) and 1 factor (education). I want a new column in this data with the median income for each education level. A obviously inneficient way to do this is for ( k in 1: nrow(mydata) ){ l - mydata$education[k] mydata$md[k] - median(mydata$income[mydata$education==l],na.rm=T) } Since mydata has nearly 30.000 rows, this will be done not untill the end of this month. I thus need some help for vectorizing this, please. Thanks, Dimitri [[alternative HTML version deleted]] ___ Instale o discador agora! http://br.acesso.yahoo.com/ __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] vectorization
Hi, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Dimitri Joe Sent: Friday, June 17, 2005 7:01 PM To: R-Help Subject: [R] vectorization Hi there, I have a data frame (mydata) with 1 numeric variable (income) and 1 factor (education). I want a new column in this data with the median income for each education level. A obviously inneficient way to do this is I guess the attached code (incl. simulating your data structure) is not the most efficient way to do this, but at least (I hope so!) it does what you wanted it to do: ### Beginning of Example Code income - runif(100) education - as.factor(sample(c(high, middle, low), size=length(income), replace=TRUE)) mydata - data.frame(inc=income, edu=education) mymedians - tapply(X=mydata$inc, INDEX=mydata$edu, FUN=median) mydata$medians - ifelse(mydata$edu==high, mymedians[high], 0) mydata$medians - ifelse(mydata$edu==middle, mymedians[middle], mydata$medians) mydata$medians - ifelse(mydata$edu==low, mymedians[low], mydata$medians) head(mydata) mymedians ### End of Example Code Maybe one can increase the speed, but I think it is sufficient for your case of 30,000 cases as you can see from the timing on my desktop computer here (WinXP Pro SP2, P4, 3GHz, 512MB RAM): time.check - function(){ + income - runif(3) + education - as.factor(sample(c(high, middle, low), size=length(income), replace=TRUE)) + mydata - data.frame(inc=income, edu=education) + + mymedians - tapply(X=mydata$inc, INDEX=mydata$edu, FUN=median) + + mydata$medians - ifelse(mydata$edu==high, mymedians[high], 0) + mydata$medians - ifelse(mydata$edu==middle, mymedians[middle], mydata$medians) + mydata$medians - ifelse(mydata$edu==low, mymedians[low], mydata$medians) + return(NULL) + } system.time(time.check()) [1] 0.36 0.02 0.38 NA NA version _ platform i386-pc-mingw32 arch i386 os mingw32 system i386, mingw32 status beta major2 minor1.0 year 2005 month04 day 04 language R Best, Roland + This mail has been sent through the MPI for Demographic Rese...{{dropped}} __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] Vectorization
Greetings, Can anyone suggest me if we can vectorize the following problem effectively? I have two datasets, one dataset is portfolio of stocks returns on a historical basis and another dataset consist of a bunch of factors (again on a historical basis). I intend to compute a rolling n-day sensitivitiesfor each stock for each factor, so the output will be a data frame with tickerdtsensitivities. How would you go onto vector this situation effectively? I end up with a psuedo code like this: # For each date For curr dt in all dates # Get Universe of stocks as of that date Get Universe for curr date # Calculate Sensitivity for each factor between n days back dt to curr date sensitivity= sapply(univ{ticker},CalcSensitivity,n_days_back_dt,dt) Next date I would highly appreciate if the above logic could be improved (if at all) by a more effective solution since I do get into such situations on a regular basis. Thanks in advance Cheers Manoj __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[Fwd: Re: [R] vectorization of a data-aggregation loop]
great! many thanks, Phil Cheers christoph Phil Spector wrote: Christoph - I think reshape is the function you're looking for: tt - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), + c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 'c', + 'a', 'b', 'a', 'b', 'c'))) reshape(aggregate(as.numeric(tt$iwv),list(id=tt$id,type=tt$type),sum),idvar=id,timevar=type,direction=wide) id x.a x.b x.c 1 1 6 13 6 2 2 10 NA 7 3 3 9 14 7 - Phil Spector Statistical Computing Facility Department of Statistics UC Berkeley [EMAIL PROTECTED] On Tue, 1 Feb 2005, Christoph Lehmann wrote: Hi I have a simple question: the following data.frame id iwv type 1 1 1a 2 1 2b 3 1 11b 4 1 5a 5 1 6c 6 2 4c 7 2 3c 8 2 10a 9 3 6b 10 3 9a 11 3 8b 12 3 7c shall be aggregated into the form: id t.a t.b t.c 1 1 6 13 6 6 2 10 0 7 9 3 9 14 7 means for each 'type' (a, b, c) a new column is introduced which gets the sum of iwv for the respective observations 'id' of course I can do this transformation/aggregation in a loop (see below), but is there a way to do this more efficiently, eg. in using tapply (or something similar)- since I have lot many rows? thanks for a hint christoph #-- # the loop-way t - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 'c', 'a', 'b', 'a', 'b', 'c'))) names(t) - c(id, iwv, type) t$iwv - as.numeric(t$iwv) t # define the additional columns (type.a, type.b, type.c) tt - rep(0, nrow(t) * length(levels(t$type))) dim(tt) - c(nrow(t), length(levels(t$type))) tt - data.frame(tt) dimnames(tt)[[2]] - paste(t., levels(t$type), sep = ) t - cbind(t, tt) t obs - 0 obs.previous - 0 row.elim - rep(FALSE, nrow(t)) ta - which((names(t) == t.a)) #number of column which codes the first type r.ctr - 0 for (i in 1:nrow(t)){ obs - t[i,]$id if (obs == obs.previous) { row.elim[i] - TRUE r.ctr - r.ctr + 1 #increment type.col - as.numeric(t[i,]$type) t[i - r.ctr, ta - 1 + type.col] - t[i - r.ctr, ta - 1 + type.col] + t[i,]$iwv } else { r.ctr - 0 #record counter type.col - as.numeric(t[i,]$type) t[i, ta - 1 + type.col] - t[i,]$iwv } obs.previous - obs } t - t[!row.elim,] t - subset(t, select = -c(iwv, type)) t #-- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
[R] vectorization of a data-aggregation loop
Hi I have a simple question: the following data.frame id iwv type 1 1 1a 2 1 2b 3 1 11b 4 1 5a 5 1 6c 6 2 4c 7 2 3c 8 2 10a 9 3 6b 10 3 9a 11 3 8b 12 3 7c shall be aggregated into the form: id t.a t.b t.c 1 1 6 13 6 6 2 10 0 7 9 3 9 14 7 means for each 'type' (a, b, c) a new column is introduced which gets the sum of iwv for the respective observations 'id' of course I can do this transformation/aggregation in a loop (see below), but is there a way to do this more efficiently, eg. in using tapply (or something similar)- since I have lot many rows? thanks for a hint christoph #-- # the loop-way t - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 'c', 'a', 'b', 'a', 'b', 'c'))) names(t) - c(id, iwv, type) t$iwv - as.numeric(t$iwv) t # define the additional columns (type.a, type.b, type.c) tt - rep(0, nrow(t) * length(levels(t$type))) dim(tt) - c(nrow(t), length(levels(t$type))) tt - data.frame(tt) dimnames(tt)[[2]] - paste(t., levels(t$type), sep = ) t - cbind(t, tt) t obs - 0 obs.previous - 0 row.elim - rep(FALSE, nrow(t)) ta - which((names(t) == t.a)) #number of column which codes the first type r.ctr - 0 for (i in 1:nrow(t)){ obs - t[i,]$id if (obs == obs.previous) { row.elim[i] - TRUE r.ctr - r.ctr + 1 #increment type.col - as.numeric(t[i,]$type) t[i - r.ctr, ta - 1 + type.col] - t[i - r.ctr, ta - 1 + type.col] + t[i,]$iwv } else { r.ctr - 0 #record counter type.col - as.numeric(t[i,]$type) t[i, ta - 1 + type.col] - t[i,]$iwv } obs.previous - obs } t - t[!row.elim,] t - subset(t, select = -c(iwv, type)) t #-- __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] vectorization of a data-aggregation loop
On Tue, 2005-02-01 at 23:28 +0100, Christoph Lehmann wrote: Hi I have a simple question: the following data.frame id iwv type 1 1 1a 2 1 2b 3 1 11b 4 1 5a 5 1 6c 6 2 4c 7 2 3c 8 2 10a 9 3 6b 10 3 9a 11 3 8b 12 3 7c shall be aggregated into the form: id t.a t.b t.c 1 1 6 13 6 6 2 10 0 7 9 3 9 14 7 means for each 'type' (a, b, c) a new column is introduced which gets the sum of iwv for the respective observations 'id' of course I can do this transformation/aggregation in a loop (see below), but is there a way to do this more efficiently, eg. in using tapply (or something similar)- since I have lot many rows? thanks for a hint Well, I'll get you started using the sample data you have above. Presuming that your data is in a data frame called 'df': # Use aggregate to get the summations data by id and type df.a - aggregate(df$iwv, by = list(df$id, df$type), sum) # Show the results df.a Group.1 Group.2 x 1 1 a 6 2 2 a 10 3 3 a 9 4 1 b 13 5 3 b 14 6 1 c 6 7 2 c 7 8 3 c 7 # Now use xtabs() to create a contingency table from df.a xtabs(x ~ Group.1 + Group.2, data = df.a) Group.2 Group.1 a b c 1 6 13 6 2 10 0 7 3 9 14 7 You can now modify the colnames in the result of the xtabs step as you desire. It's a little easier in two steps. See ?aggregate and ?xtabs for more information. HTH, Marc Schwartz __ R-help@stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
Re: [R] vectorization question
Thank you very much to Tony Plate for his really clear explanation, and to Prof Ripley for his time solving this deficiency (IMHO) On Friday 15 August 2003 08:44, Martin Maechler wrote: Thank you, Tony. This certainly was the most precise explanation on this thread. Everyone note however, that this has been improved (by Brian Ripley) in the current R-devel {which should be come R 1.8 in October}. There, also $- assignment of data frames does check things and in this case will do the same replication as the [,] or [[]] assignments do. For back compatibility (with S-plus and earlier R versions), I'd still recommend using bracket [ rather than $ assignment for data frames. Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16 Leonhardstr. 27 ETH (Federal Inst. Technology)8092 Zurich SWITZERLAND phone: x-41-1-632-3408fax: ...-1228 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Alberto G. Murta Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062 Fax:+351 213015948 | http://www.ipimar-iniap.ipimar.pt/pelagicos/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] vectorization question
Tony == Tony Plate [EMAIL PROTECTED] on Thu, 14 Aug 2003 11:43:11 -0600 writes: Tony From ?data.frame: Details: A data frame is a list of variables of the same length with unique row names, given class `data.frame'. Tony Your example constructs an object that does not Tony conform to the definition of a data frame (the new Tony column is not the same length as the old columns). Tony Some data frame functions may work OK with such an Tony object, but others will not. For example, the print Tony function for data.frame silently handles such an Tony illegal data frame (which could be described as Tony unfortunate.) It would probably be far easier to Tony construct a correct data frame in the first place than Tony to try to find and fix functions that don't handle Tony illegal data frames. For adding a new column to a Tony data frame, the expressions x[,new.column.name] - Tony value and x[[new.column.name]] - value will Tony replicate the value so that the new column is the same Tony length as the existing ones, while the $ operator in Tony an assignment will not replicate the value. (One Tony could argue that this is a deficiency, but I think it Tony has been that way for a long time, and the behavior is Tony the same in the current version of S-plus.) x1 - data.frame(a=1:3) x2 - x1 x3 - x1 x1$b - 0 x2[,b] - 0 x3[[b]] - 0 sapply(x1, length) Tony a b Tony 3 1 sapply(x2, length) Tony a b Tony 3 3 sapply(x3, length) Tony a b Tony 3 3 as.matrix(x2) Tony a b Tony 1 1 0 Tony 2 2 0 Tony 3 3 0 as.matrix(x1) Tony Error in as.matrix.data.frame(x1) : dim- length of dims do not match the Tony length of object Thank you, Tony. This certainly was the most precise explanation on this thread. Everyone note however, that this has been improved (by Brian Ripley) in the current R-devel {which should be come R 1.8 in October}. There, also $- assignment of data frames does check things and in this case will do the same replication as the [,] or [[]] assignments do. For back compatibility (with S-plus and earlier R versions), I'd still recommend using bracket [ rather than $ assignment for data frames. Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/ Seminar fuer Statistik, ETH-Zentrum LEO C16Leonhardstr. 27 ETH (Federal Inst. Technology) 8092 Zurich SWITZERLAND phone: x-41-1-632-3408 fax: ...-1228 __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] vectorization question
Thank you very much. I just would expect that 'as.matrix' would have the same behaviour as 'data.matrix' when all columns in a data frame are numeric. Regards Alberto On Thursday 14 August 2003 16:41, Liaw, Andy wrote: If you look at the structure, you'll see: x$V4 - 0 str(x) `data.frame': 4 obs. of 4 variables: $ V1: int 1 2 3 4 $ V2: int 5 6 7 8 $ V3: int 9 10 11 12 $ V4: num 0 Don't know if this is the intended result. In any case, you're probably better off using data.matrix, as data.matrix(x) V1 V2 V3 V4 1 1 5 9 0 2 2 6 10 0 3 3 7 11 0 4 4 8 12 0 HTH, Andy -Original Message- From: Alberto Murta [mailto:[EMAIL PROTECTED] Sent: Thursday, August 14, 2003 12:50 PM To: [EMAIL PROTECTED] Subject: [R] vectorization question Dear all I recently noticed the following error when cohercing a data.frame into a matrix: example - matrix(1:12,4,3) example - as.data.frame(example) example$V4 - 0 example V1 V2 V3 V4 1 1 5 9 0 2 2 6 10 0 3 3 7 11 0 4 4 8 12 0 example - as.matrix(example) Error in as.matrix.data.frame(example) : dim- length of dims do not match the length of object However, if the column to be added has the right number of lines, there's no error: example - matrix(1:12,4,3) example - as.data.frame(example) example$V4 - rep(0,4) example V1 V2 V3 V4 1 1 5 9 0 2 2 6 10 0 3 3 7 11 0 4 4 8 12 0 example - as.matrix(example) example V1 V2 V3 V4 1 1 5 9 0 2 2 6 10 0 3 3 7 11 0 4 4 8 12 0 Shouldn't it work well both ways? I checked the attributes and dims of the data frame and they are the same in both cases. Where's the difference that originates the error message? Thanks in advance Alberto platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major1 minor7.1 year 2003 month06 day 16 language R -- Alberto G. Murta Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062 Fax:+351 213015948 | http://www.ipimar-iniap.ipimar.pt/pelagicos/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo /r-help --- --- Notice: This e-mail message, together with any attachments, contains information of Merck Co., Inc. (Whitehouse Station, New Jersey, USA), and/or its affiliates (which may be known outside the United States as Merck Frosst, Merck Sharp Dohme or MSD) that may be confidential, proprietary copyrighted and/or legally privileged, and is intended solely for the use of the individual or entity named on this message. If you are not the intended recipient, and have received this message in error, please immediately return this by e-mail and then delete it. __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help -- Alberto G. Murta Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062 Fax:+351 213015948 | http://www.ipimar-iniap.ipimar.pt/pelagicos/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help
Re: [R] vectorization question
From ?data.frame: Details: A data frame is a list of variables of the same length with unique row names, given class `data.frame'. Your example constructs an object that does not conform to the definition of a data frame (the new column is not the same length as the old columns). Some data frame functions may work OK with such an object, but others will not. For example, the print function for data.frame silently handles such an illegal data frame (which could be described as unfortunate.) It would probably be far easier to construct a correct data frame in the first place than to try to find and fix functions that don't handle illegal data frames. For adding a new column to a data frame, the expressions x[,new.column.name] - value and x[[new.column.name]] - value will replicate the value so that the new column is the same length as the existing ones, while the $ operator in an assignment will not replicate the value. (One could argue that this is a deficiency, but I think it has been that way for a long time, and the behavior is the same in the current version of S-plus.) x1 - data.frame(a=1:3) x2 - x1 x3 - x1 x1$b - 0 x2[,b] - 0 x3[[b]] - 0 sapply(x1, length) a b 3 1 sapply(x2, length) a b 3 3 sapply(x3, length) a b 3 3 as.matrix(x2) a b 1 1 0 2 2 0 3 3 0 as.matrix(x1) Error in as.matrix.data.frame(x1) : dim- length of dims do not match the length of object At Thursday 04:50 PM 8/14/2003 +, Alberto Murta wrote: Dear all I recently noticed the following error when cohercing a data.frame into a matrix: example - matrix(1:12,4,3) example - as.data.frame(example) example$V4 - 0 example V1 V2 V3 V4 1 1 5 9 0 2 2 6 10 0 3 3 7 11 0 4 4 8 12 0 example - as.matrix(example) Error in as.matrix.data.frame(example) : dim- length of dims do not match the length of object However, if the column to be added has the right number of lines, there's no error: example - matrix(1:12,4,3) example - as.data.frame(example) example$V4 - rep(0,4) example V1 V2 V3 V4 1 1 5 9 0 2 2 6 10 0 3 3 7 11 0 4 4 8 12 0 example - as.matrix(example) example V1 V2 V3 V4 1 1 5 9 0 2 2 6 10 0 3 3 7 11 0 4 4 8 12 0 Shouldn't it work well both ways? I checked the attributes and dims of the data frame and they are the same in both cases. Where's the difference that originates the error message? Thanks in advance Alberto platform i686-pc-linux-gnu arch i686 os linux-gnu system i686, linux-gnu status major1 minor7.1 year 2003 month06 day 16 language R -- Alberto G. Murta Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062 Fax:+351 213015948 | http://www.ipimar-iniap.ipimar.pt/pelagicos/ __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help __ [EMAIL PROTECTED] mailing list https://www.stat.math.ethz.ch/mailman/listinfo/r-help