[R] vectorization

2005-06-17 Thread Dimitri Joe
Hi there,

I have a data frame (mydata) with 1 numeric variable (income) and 1 factor 
(education). I want a new column in this data with the median income for each 
education level. A obviously inneficient way to do this is

for ( k in 1: nrow(mydata) ){
l - mydata$education[k]
mydata$md[k] - median(mydata$income[mydata$education==l],na.rm=T)
}

Since mydata has nearly 30.000 rows, this will be done not untill the end of 
this month. I thus need some help for vectorizing this, please.

Thanks,

Dimitri

[[alternative HTML version deleted]]






___ 

Instale o discador agora! http://br.acesso.yahoo.com/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] vectorization

2005-06-17 Thread Liaw, Andy
Here I go again with ave():

mydata$md - ave(mydata$income, mydata$education, FUN=median, na.rm=TRUE)

IMHO it's one of the most under-rated helper functions in R.

Andy

 From: Dimitri Joe
 
 Hi there,
 
 I have a data frame (mydata) with 1 numeric variable (income) 
 and 1 factor (education). I want a new column in this data 
 with the median income for each education level. A obviously 
 inneficient way to do this is
 
 for ( k in 1: nrow(mydata) ){
 l - mydata$education[k]
 mydata$md[k] - median(mydata$income[mydata$education==l],na.rm=T)
 }
 
 Since mydata has nearly 30.000 rows, this will be done not 
 untill the end of this month. I thus need some help for 
 vectorizing this, please.
 
 Thanks,
 
 Dimitri
 
   [[alternative HTML version deleted]]
 
 
 
   
   
   
 ___ 
 
 Instale o discador agora! http://br.acesso.yahoo.com/
 
 __
 R-help@stat.math.ethz.ch mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide! 
 http://www.R-project.org/posting-guide.html
 
 


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] vectorization

2005-06-17 Thread Huntsinger, Reid
You can use tapply() to compute the medians, as in

meds - tapply(mydata$inc,INDEX=mydata$ed,FUN=median)

then create a new column with the medians as

medianEd - meds[mydata$ed]


Reid Huntsinger

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dimitri Joe
Sent: Friday, June 17, 2005 1:01 PM
To: R-Help
Subject: [R] vectorization


Hi there,

I have a data frame (mydata) with 1 numeric variable (income) and 1 factor
(education). I want a new column in this data with the median income for
each education level. A obviously inneficient way to do this is

for ( k in 1: nrow(mydata) ){
l - mydata$education[k]
mydata$md[k] - median(mydata$income[mydata$education==l],na.rm=T)
}

Since mydata has nearly 30.000 rows, this will be done not untill the end of
this month. I thus need some help for vectorizing this, please.

Thanks,

Dimitri

[[alternative HTML version deleted]]






___ 

Instale o discador agora! http://br.acesso.yahoo.com/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] vectorization

2005-06-17 Thread Kevin Bartz
These two lines worked for me:

rst - tapply(mydata$income, mydata$education, median)
mydata$md - rst[mydata$education]

Here's my cheesy example:

 mydata - data.frame(income= round(rnorm(3, 55000, 1)),
+  education = letters[rbinom(3, 4, 1/2)+1])
 rst - tapply(mydata$income, mydata$education, median)
 mydata$md - rst[mydata$education]
 head(mydata)
  income education  md
1  66223 e 55094.5
2  56830 c 54966.0
3  58035 b 54937.5
4  74045 a 55213.5
5  61327 b 54937.5
6  64150 b 54937.5

Is this what you wanted?

Kevin

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Dimitri Joe
Sent: Friday, June 17, 2005 10:01 AM
To: R-Help
Subject: [R] vectorization

Hi there,

I have a data frame (mydata) with 1 numeric variable (income) and 1
factor (education). I want a new column in this data with the median
income for each education level. A obviously inneficient way to do this
is

for ( k in 1: nrow(mydata) ){
l - mydata$education[k]
mydata$md[k] - median(mydata$income[mydata$education==l],na.rm=T)
}

Since mydata has nearly 30.000 rows, this will be done not untill the
end of this month. I thus need some help for vectorizing this, please.

Thanks,

Dimitri

[[alternative HTML version deleted]]






___ 

Instale o discador agora! http://br.acesso.yahoo.com/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] vectorization

2005-06-17 Thread james . holtman




try this:

 x.1 - data.frame(income=runif(100)*1,
educ=sample(c('hs','col','none'),100,T))
 x.1
income educ
1   5930.30882  col
2   5528.83222   hs
3   5967.04041   hs
4   3926.30682   hs
5   2603.75924 none
...
 x.2 - tapply(x.1$income, x.1$educ, mean)
 x.2
 col   hs none
5575.310 4994.921 5481.962
 x.1$median - x.2[x.1$educ]
 x.1
income educ   median
1   5930.30882  col 5575.310
2   5528.83222   hs 4994.921
3   5967.04041   hs 4994.921
4   3926.30682   hs 4994.921
5   2603.75924 none 5481.962
6   7398.83325  col 5575.310
7265.06895   hs 4994.921
.


Jim
__
James HoltmanWhat is the problem you are trying to solve?
Executive Technical Consultant  --  Convergys Labs
[EMAIL PROTECTED]
+1 (513) 723-2929



   
  Dimitri Joe 
   
  [EMAIL PROTECTED]To:   R-Help 
r-help@stat.math.ethz.ch   
  .br cc:  
   
  Sent by: Subject:  [R] vectorization  
   
  [EMAIL PROTECTED] 
   
  ath.ethz.ch   
   

   

   
  06/17/2005 14:00  
   

   




Hi there,

I have a data frame (mydata) with 1 numeric variable (income) and 1 factor
(education). I want a new column in this data with the median income for
each education level. A obviously inneficient way to do this is

for ( k in 1: nrow(mydata) ){
l - mydata$education[k]
mydata$md[k] - median(mydata$income[mydata$education==l],na.rm=T)
}

Since mydata has nearly 30.000 rows, this will be done not untill the end
of this month. I thus need some help for vectorizing this, please.

Thanks,

Dimitri

 [[alternative HTML version deleted]]






___

Instale o discador agora! http://br.acesso.yahoo.com/

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] vectorization

2005-06-17 Thread Rau, Roland
Hi,


 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of Dimitri Joe
 Sent: Friday, June 17, 2005 7:01 PM
 To: R-Help
 Subject: [R] vectorization
 
 Hi there,
 
 I have a data frame (mydata) with 1 numeric variable (income) 
 and 1 factor (education). I want a new column in this data 
 with the median income for each education level. A obviously 
 inneficient way to do this is
 
I guess the attached code (incl. simulating your data structure) is not
the most efficient way to do this, but at least (I hope so!) it does
what you wanted it to do:


### Beginning of Example Code

income - runif(100)
education - as.factor(sample(c(high, middle, low),
size=length(income), replace=TRUE))
mydata - data.frame(inc=income, edu=education)


mymedians - tapply(X=mydata$inc, INDEX=mydata$edu, FUN=median)

mydata$medians - ifelse(mydata$edu==high, mymedians[high], 0)
mydata$medians - ifelse(mydata$edu==middle, mymedians[middle],
mydata$medians)
mydata$medians - ifelse(mydata$edu==low, mymedians[low],
mydata$medians)

head(mydata)
mymedians

### End of Example Code

Maybe one can increase the speed, but I think it is sufficient for your
case of 30,000 cases as you can see from the timing on my desktop
computer here (WinXP Pro SP2, P4, 3GHz, 512MB RAM):

 time.check - function(){
+   income - runif(3)
+   education - as.factor(sample(c(high, middle, low),
size=length(income), replace=TRUE))
+   mydata - data.frame(inc=income, edu=education)
+   
+   mymedians - tapply(X=mydata$inc, INDEX=mydata$edu, FUN=median)
+ 
+   mydata$medians - ifelse(mydata$edu==high, mymedians[high], 0)
+   mydata$medians - ifelse(mydata$edu==middle, mymedians[middle],
mydata$medians)
+   mydata$medians - ifelse(mydata$edu==low, mymedians[low],
mydata$medians)
+   return(NULL)
+ }
 system.time(time.check())
[1] 0.36 0.02 0.38   NA   NA
   
 version
 _  
platform i386-pc-mingw32
arch i386   
os   mingw32
system   i386, mingw32  
status   beta   
major2  
minor1.0
year 2005   
month04 
day  04 
language R  


Best,
Roland


+
This mail has been sent through the MPI for Demographic Rese...{{dropped}}

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] Vectorization

2005-06-15 Thread ManojW
Greetings,
Can anyone suggest me if we can vectorize the following problem
effectively?

I have two datasets, one dataset is portfolio of stocks returns on a
historical basis and another dataset consist of a bunch of factors (again on
a historical basis). I intend to compute a rolling n-day sensitivitiesfor
each stock for each factor, so the output will be a data frame with
tickerdtsensitivities.

How would you go onto vector this situation effectively?

I end up with a psuedo code like this:

# For each date
For curr dt in all dates
# Get Universe of stocks as of that date
Get Universe for curr date
# Calculate Sensitivity for each factor between n days back
dt to curr date
sensitivity=
sapply(univ{ticker},CalcSensitivity,n_days_back_dt,dt)
Next date

I would highly appreciate if the above logic could be improved (if at
all) by a more effective solution since I do get into such situations on a
regular basis.

Thanks in advance

Cheers

Manoj

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[Fwd: Re: [R] vectorization of a data-aggregation loop]

2005-02-02 Thread Christoph Lehmann
great! many thanks, Phil
Cheers
christoph
Phil Spector wrote:
Christoph -
   I think reshape is the function you're looking for:
tt - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3),
+ c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 'c',
+ 'a', 'b', 'a', 'b', 'c')))
 reshape(aggregate(as.numeric(tt$iwv),list(id=tt$id,type=tt$type),sum),idvar=id,timevar=type,direction=wide) 

  id x.a x.b x.c
1  1   6  13   6
2  2  10  NA   7
3  3   9  14   7
   - Phil Spector
 Statistical Computing Facility
 Department of Statistics
 UC Berkeley
 [EMAIL PROTECTED]
On Tue, 1 Feb 2005, Christoph Lehmann wrote:
Hi
I have a simple question:
the following data.frame
  id iwv type
1   1   1a
2   1   2b
3   1  11b
4   1   5a
5   1   6c
6   2   4c
7   2   3c
8   2  10a
9   3   6b
10  3   9a
11  3   8b
12  3   7c
shall be aggregated into the form:
 id t.a t.b t.c
1  1   6  13   6
6  2  10   0   7
9  3   9  14   7
means for each 'type' (a, b, c) a new column is introduced which
gets the sum of iwv for the respective observations 'id'
of course I can do this transformation/aggregation in a loop (see 
below), but is there a way to do this more efficiently, eg. in using 
tapply (or something similar)- since I have lot many rows?

thanks for a hint
christoph
#-- 

# the loop-way
t - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), 
c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 
'c', 'a', 'b', 'a', 'b', 'c')))
names(t) - c(id, iwv, type)
t$iwv - as.numeric(t$iwv)
t

# define the additional columns (type.a, type.b, type.c)
tt - rep(0, nrow(t) * length(levels(t$type)))
dim(tt) - c(nrow(t), length(levels(t$type)))
tt - data.frame(tt)
dimnames(tt)[[2]] - paste(t., levels(t$type), sep = )
t - cbind(t, tt)
t
obs - 0
obs.previous - 0
row.elim - rep(FALSE, nrow(t))
ta - which((names(t) == t.a)) #number of column which codes the 
first type
r.ctr - 0
for (i in 1:nrow(t)){
 obs - t[i,]$id
 if (obs == obs.previous) {
   row.elim[i] - TRUE
   r.ctr - r.ctr + 1 #increment
   type.col - as.numeric(t[i,]$type)
   t[i - r.ctr, ta - 1 + type.col] - t[i - r.ctr, ta - 1 +
 type.col] + t[i,]$iwv
 }
 else {
   r.ctr - 0 #record counter
   type.col - as.numeric(t[i,]$type)
   t[i, ta - 1 + type.col] - t[i,]$iwv
 }
 obs.previous - obs
}

t - t[!row.elim,]
t - subset(t, select = -c(iwv, type))
t
#-- 

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! 
http://www.R-project.org/posting-guide.html


__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


[R] vectorization of a data-aggregation loop

2005-02-01 Thread Christoph Lehmann
Hi
I have a simple question:
the following data.frame
   id iwv type
1   1   1a
2   1   2b
3   1  11b
4   1   5a
5   1   6c
6   2   4c
7   2   3c
8   2  10a
9   3   6b
10  3   9a
11  3   8b
12  3   7c
shall be aggregated into the form:
  id t.a t.b t.c
1  1   6  13   6
6  2  10   0   7
9  3   9  14   7
means for each 'type' (a, b, c) a new column is introduced which
gets the sum of iwv for the respective observations 'id'
of course I can do this transformation/aggregation in a loop (see 
below), but is there a way to do this more efficiently, eg. in using 
tapply (or something similar)- since I have lot many rows?

thanks for a hint
christoph
#--
# the loop-way
t - data.frame(cbind(c(1,1,1,1,1,2,2,2,3,3,3,3), 
c(10,12,8,33,34,3,27,77,34,45,4,39), c('a', 'b', 'b', 'a', 'c', 'c', 
'c', 'a', 'b', 'a', 'b', 'c')))
names(t) - c(id, iwv, type)
t$iwv - as.numeric(t$iwv)
t

# define the additional columns (type.a, type.b, type.c)
tt - rep(0, nrow(t) * length(levels(t$type)))
dim(tt) - c(nrow(t), length(levels(t$type)))
tt - data.frame(tt)
dimnames(tt)[[2]] - paste(t., levels(t$type), sep = )
t - cbind(t, tt)
t
obs - 0
obs.previous - 0
row.elim - rep(FALSE, nrow(t))
ta - which((names(t) == t.a)) #number of column which codes the first 
type
r.ctr - 0
for (i in 1:nrow(t)){
  obs - t[i,]$id
  if (obs == obs.previous) {
row.elim[i] - TRUE
r.ctr - r.ctr + 1 #increment
type.col - as.numeric(t[i,]$type)
t[i - r.ctr, ta - 1 + type.col] - t[i - r.ctr, ta - 1 +
  type.col] + t[i,]$iwv
  }
  else {
r.ctr - 0 #record counter
type.col - as.numeric(t[i,]$type)
t[i, ta - 1 + type.col] - t[i,]$iwv
  }
  obs.previous - obs
}

t - t[!row.elim,]
t - subset(t, select = -c(iwv, type))
t
#--
__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] vectorization of a data-aggregation loop

2005-02-01 Thread Marc Schwartz
On Tue, 2005-02-01 at 23:28 +0100, Christoph Lehmann wrote:
 Hi
 I have a simple question:
 
 the following data.frame
 
 id iwv type
 1   1   1a
 2   1   2b
 3   1  11b
 4   1   5a
 5   1   6c
 6   2   4c
 7   2   3c
 8   2  10a
 9   3   6b
 10  3   9a
 11  3   8b
 12  3   7c
 
 shall be aggregated into the form:
 
id t.a t.b t.c
 1  1   6  13   6
 6  2  10   0   7
 9  3   9  14   7
 
 means for each 'type' (a, b, c) a new column is introduced which
 gets the sum of iwv for the respective observations 'id'
 
 of course I can do this transformation/aggregation in a loop (see 
 below), but is there a way to do this more efficiently, eg. in using 
 tapply (or something similar)- since I have lot many rows?
 
 thanks for a hint


Well, I'll get you started using the sample data you have above.

Presuming that your data is in a data frame called 'df':

# Use aggregate to get the summations data by id and type
 df.a - aggregate(df$iwv, by = list(df$id, df$type), sum)

# Show the results
 df.a
  Group.1 Group.2  x
1   1   a  6
2   2   a 10
3   3   a  9
4   1   b 13
5   3   b 14
6   1   c  6
7   2   c  7
8   3   c  7

# Now use xtabs() to create a contingency table from df.a

 xtabs(x ~ Group.1 + Group.2, data = df.a)
   Group.2
Group.1 a  b  c 
  1  6 13  6
  2 10  0  7
  3  9 14  7

You can now modify the colnames in the result of the xtabs step as you
desire.

It's a little easier in two steps. See ?aggregate and ?xtabs for more
information.

HTH,

Marc Schwartz

__
R-help@stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html


Re: [R] vectorization question

2003-08-18 Thread Alberto Murta
Thank you very much to Tony Plate for his really clear explanation, and to 
Prof Ripley for his time solving this deficiency (IMHO)


On Friday 15 August 2003 08:44, Martin Maechler wrote:

 Thank you, Tony.  This certainly was the most precise
 explanation on this thread.

 Everyone note however, that this has been improved (by Brian
 Ripley) in the current R-devel {which should be come R 1.8 in October}.
 There, also $- assignment of data frames does check things
 and in this case will do the same replication as the [,] or [[]]
 assignments do.
 For back compatibility (with S-plus and earlier R versions), I'd
 still recommend using bracket [ rather than $ assignment for
 data frames.

 Martin Maechler [EMAIL PROTECTED]   http://stat.ethz.ch/~maechler/
 Seminar fuer Statistik, ETH-Zentrum  LEO C16  Leonhardstr. 27
 ETH (Federal Inst. Technology)8092 Zurich SWITZERLAND
 phone: x-41-1-632-3408fax: ...-1228   

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

-- 
 Alberto G. Murta
Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) 
Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062
Fax:+351 213015948 | http://www.ipimar-iniap.ipimar.pt/pelagicos/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] vectorization question

2003-08-15 Thread Martin Maechler
 Tony == Tony Plate [EMAIL PROTECTED]
 on Thu, 14 Aug 2003 11:43:11 -0600 writes:

Tony From ?data.frame:
 Details:
 
 A data frame is a list of variables of the same length with unique
 row names, given class `data.frame'.

Tony Your example constructs an object that does not
Tony conform to the definition of a data frame (the new
Tony column is not the same length as the old columns).
Tony Some data frame functions may work OK with such an
Tony object, but others will not.  For example, the print
Tony function for data.frame silently handles such an
Tony illegal data frame (which could be described as
Tony unfortunate.)  It would probably be far easier to
Tony construct a correct data frame in the first place than
Tony to try to find and fix functions that don't handle
Tony illegal data frames.  For adding a new column to a
Tony data frame, the expressions x[,new.column.name] -
Tony value and x[[new.column.name]] - value will
Tony replicate the value so that the new column is the same
Tony length as the existing ones, while the $ operator in
Tony an assignment will not replicate the value.  (One
Tony could argue that this is a deficiency, but I think it
Tony has been that way for a long time, and the behavior is
Tony the same in the current version of S-plus.)

 x1 - data.frame(a=1:3)
 x2 - x1
 x3 - x1
 x1$b - 0
 x2[,b] - 0
 x3[[b]] - 0
 sapply(x1, length)
Tony a b
Tony 3 1
 sapply(x2, length)
Tony a b
Tony 3 3
 sapply(x3, length)
Tony a b
Tony 3 3
 as.matrix(x2)
Tony a b
Tony 1 1 0
Tony 2 2 0
Tony 3 3 0
 as.matrix(x1)
Tony Error in as.matrix.data.frame(x1) : dim- length of dims do not match the 
Tony length of object

Thank you, Tony.  This certainly was the most precise
explanation on this thread.

Everyone note however, that this has been improved (by Brian
Ripley) in the current R-devel {which should be come R 1.8 in October}.
There, also $- assignment of data frames does check things
and in this case will do the same replication as the [,] or [[]]
assignments do.  
For back compatibility (with S-plus and earlier R versions), I'd
still recommend using bracket [ rather than $ assignment for
data frames.

Martin Maechler [EMAIL PROTECTED] http://stat.ethz.ch/~maechler/
Seminar fuer Statistik, ETH-Zentrum  LEO C16Leonhardstr. 27
ETH (Federal Inst. Technology)  8092 Zurich SWITZERLAND
phone: x-41-1-632-3408  fax: ...-1228   

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] vectorization question

2003-08-14 Thread Alberto Murta
Thank you very much. I just would expect that 'as.matrix' would have the same 
behaviour as 'data.matrix' when all columns in a data frame are numeric.
Regards

Alberto

On Thursday 14 August 2003 16:41, Liaw, Andy wrote:
 If you look at the structure, you'll see:
  x$V4 - 0
  str(x)

 `data.frame':   4 obs. of  4 variables:
  $ V1: int  1 2 3 4
  $ V2: int  5 6 7 8
  $ V3: int  9 10 11 12
  $ V4: num 0

 Don't know if this is the intended result.  In any case, you're probably
 better off using data.matrix, as

  data.matrix(x)

   V1 V2 V3 V4
 1  1  5  9  0
 2  2  6 10  0
 3  3  7 11  0
 4  4  8 12  0

 HTH,
 Andy

  -Original Message-
  From: Alberto Murta [mailto:[EMAIL PROTECTED]
  Sent: Thursday, August 14, 2003 12:50 PM
  To: [EMAIL PROTECTED]
  Subject: [R] vectorization question
 
 
  Dear all
 
  I recently noticed the following error when cohercing a
  data.frame into a
 
  matrix:
   example - matrix(1:12,4,3)
   example - as.data.frame(example)
   example$V4 - 0
   example
 
V1 V2 V3 V4
  1  1  5  9   0
  2  2  6 10  0
  3  3  7 11  0
  4  4  8 12  0
 
   example - as.matrix(example)
 
  Error in as.matrix.data.frame(example) : dim- length of dims
  do not match the
  length of object
 
  However, if the column to be added has the right number of
  lines, there's no
 
  error:
   example - matrix(1:12,4,3)
   example - as.data.frame(example)
   example$V4 - rep(0,4)
   example
 
V1 V2 V3 V4
  1  1  5  9  0
  2  2  6 10  0
  3  3  7 11  0
  4  4  8 12  0
 
   example - as.matrix(example)
   example
 
V1 V2 V3 V4
  1  1  5  9  0
  2  2  6 10  0
  3  3  7 11  0
  4  4  8 12  0
 
  Shouldn't it work well both ways? I checked the attributes
  and dims of the
  data frame and they are the same in both cases. Where's the
  difference that
  originates the error message?
  Thanks in advance
 
  Alberto
 
  platform i686-pc-linux-gnu
  arch i686
  os   linux-gnu
  system   i686, linux-gnu
  status
  major1
  minor7.1
  year 2003
  month06
  day  16
  language R
 
 
  --
   Alberto G. Murta
  Institute for Agriculture and Fisheries Research (INIAP-IPIMAR)
  Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351
  213027062 Fax:+351 213015948 |
  http://www.ipimar-iniap.ipimar.pt/pelagicos/
 
 
  __
  [EMAIL PROTECTED] mailing list
  https://www.stat.math.ethz.ch/mailman/listinfo /r-help

 ---
--- Notice:  This e-mail message, together with any attachments, contains
 information of Merck  Co., Inc. (Whitehouse Station, New Jersey, USA),
 and/or its affiliates (which may be known outside the United States as
 Merck Frosst, Merck Sharp  Dohme or MSD) that may be confidential,
 proprietary copyrighted and/or legally privileged, and is intended solely
 for the use of the individual or entity named on this message.  If you are
 not the intended recipient, and have received this message in error, please
 immediately return this by e-mail and then delete it.

 __
 [EMAIL PROTECTED] mailing list
 https://www.stat.math.ethz.ch/mailman/listinfo/r-help

-- 
 Alberto G. Murta
Institute for Agriculture and Fisheries Research (INIAP-IPIMAR) 
Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062
Fax:+351 213015948 | http://www.ipimar-iniap.ipimar.pt/pelagicos/

__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help


Re: [R] vectorization question

2003-08-14 Thread Tony Plate
From ?data.frame:
Details:

 A data frame is a list of variables of the same length with unique
 row names, given class `data.frame'.
Your example constructs an object that does not conform to the definition 
of a data frame (the new column is not the same length as the old 
columns).  Some data frame functions may work OK with such an object, but 
others will not.  For example, the print function for data.frame silently 
handles such an illegal data frame (which could be described as 
unfortunate.)  It would probably be far easier to construct a correct data 
frame in the first place than to try to find and fix functions that don't 
handle illegal data frames.  For adding a new column to a data frame, the 
expressions x[,new.column.name] - value and x[[new.column.name]] - 
value will replicate the value so that the new column is the same length 
as the existing ones, while the $ operator in an assignment will not 
replicate the value.  (One could argue that this is a deficiency, but I 
think it has been that way for a long time, and the behavior is the same in 
the current version of S-plus.)

 x1 - data.frame(a=1:3)
 x2 - x1
 x3 - x1
 x1$b - 0
 x2[,b] - 0
 x3[[b]] - 0
 sapply(x1, length)
a b
3 1
 sapply(x2, length)
a b
3 3
 sapply(x3, length)
a b
3 3
 as.matrix(x2)
  a b
1 1 0
2 2 0
3 3 0
 as.matrix(x1)
Error in as.matrix.data.frame(x1) : dim- length of dims do not match the 
length of object


At Thursday 04:50 PM 8/14/2003 +, Alberto Murta wrote:
Dear all

I recently noticed the following error when cohercing a data.frame into a
matrix:
 example - matrix(1:12,4,3)
 example - as.data.frame(example)
 example$V4 - 0
 example
  V1 V2 V3 V4
1  1  5  9   0
2  2  6 10  0
3  3  7 11  0
4  4  8 12  0
 example - as.matrix(example)
Error in as.matrix.data.frame(example) : dim- length of dims do not match 
the
length of object

However, if the column to be added has the right number of lines, there's no
error:
 example - matrix(1:12,4,3)
 example - as.data.frame(example)
 example$V4 - rep(0,4)
 example
  V1 V2 V3 V4
1  1  5  9  0
2  2  6 10  0
3  3  7 11  0
4  4  8 12  0
 example - as.matrix(example)
 example
  V1 V2 V3 V4
1  1  5  9  0
2  2  6 10  0
3  3  7 11  0
4  4  8 12  0
Shouldn't it work well both ways? I checked the attributes and dims of the
data frame and they are the same in both cases. Where's the difference that
originates the error message?
Thanks in advance
Alberto

platform i686-pc-linux-gnu
arch i686
os   linux-gnu
system   i686, linux-gnu
status
major1
minor7.1
year 2003
month06
day  16
language R
--
 Alberto G. Murta
Institute for Agriculture and Fisheries Research (INIAP-IPIMAR)
Av. Brasilia, 1449-006 Lisboa, Portugal | Phone: +351 213027062
Fax:+351 213015948 | http://www.ipimar-iniap.ipimar.pt/pelagicos/
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help
__
[EMAIL PROTECTED] mailing list
https://www.stat.math.ethz.ch/mailman/listinfo/r-help