Sorry - I should have clarified:
My identifiers (in column "item") will always be unique. In other words,
one entry in column "item" will never be repeated - neither in x nor in y.
Dimitri

On Wed, Jan 30, 2013 at 1:27 PM, Dimitri Liakhovitski <
dimitri.liakhovit...@gmail.com> wrote:

> Thank you, everyone! I'll try to test those different approaches. Really
> appreciate your help!
> Dimitri
>
>  On Wed, Jan 30, 2013 at 11:03 AM, arun <smartpink...@yahoo.com> wrote:
>
>> HI,
>>
>> Sorry, my previous solution doesn't work.
>> This should work for your dataset:
>> set.seed(1851)
>> x<-
>> data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F)
>> y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
>>  x[x$a%in%which.min(x[x$a<y$a,]$a),]<- y #if there are multiple minimum
>> values
>>
>> set.seed(1241)
>> x1<-
>> data.frame(item=sample(letters[1:10],1e4,replace=TRUE),a=sample(1:30,1e4,replace=TRUE),b=sample(1:100,1e4,replace=TRUE),stringsAsFactors=F)
>> y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
>> length(x1$a[x1$a==1])
>> #[1] 330
>>  system.time({x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1})
>> #   user  system elapsed
>>  # 0.000   0.000   0.001
>> length(x1$a[x1$a==1])
>> #[1] 0
>>
>>
>> #For some reason, it is not working when the multiple number of minimum
>> values > some value
>>
>> set.seed(1241)
>> x1<-
>> data.frame(item=sample(letters[1:10],1e5,replace=TRUE),a=sample(1:30,1e5,replace=TRUE),b=sample(1:100,1e5,replace=TRUE),stringsAsFactors=F)
>> y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
>> length(x1$a[x1$a==1])
>> #[1] 3404
>> x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1
>>  length(x1$a[x1$a==1])
>> #[1] 3404 #not getting replaced
>>
>> #However, if I try:
>> set.seed(1241)
>>  x1<-
>> data.frame(item=sample(letters[1:10],1e6,replace=TRUE),a=sample(1:5000,1e6,replace=TRUE),b=sample(1:100,1e6,replace=TRUE),stringsAsFactors=F)
>>  y1<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
>>  length(x1$a[x1$a==1])
>> #[1] 208
>>  system.time(x1[x1$a%in%which.min(x1[x1$a<y1$a,]$a),]<- y1)
>> #user  system elapsed
>>  # 0.124   0.016   0.138
>>   length(x1$a[x1$a==1])
>> #[1] 0
>>
>>
>> #Tried Jessica's solution:
>> set.seed(1851)
>>  x<-
>> data.frame(item=sample(letters[1:5],20,replace=TRUE),a=sample(1:15,20,replace=TRUE),b=sample(20:30,20,replace=TRUE),stringsAsFactors=F)
>>  y<- data.frame(item="f",a=3,b=10,stringsAsFactors=F)
>>  x[intersect(which(x$a < y$a),which.min(x$a)),] <- y
>>  x
>> #   item  a  b
>> #1     a  8 25
>> #2     a 10 26
>> #3     f  3 10 #replaced
>> #4     e 15 26
>> #5     b 13 20
>> #6     a  5 23
>> #7     d  4 29
>> #8     e  2 24
>> #9     c  7 30
>> #10    e 14 24
>> #11    d  2 20
>> #12    e 10 21
>> #13    c 13 27
>> #14    d 12 23
>> #15    b 11 26
>> #16    e  5 22
>> #17    c  1 26  #it is not replaced
>> #18    a  8 21
>> #19    e 10 26
>> #20    c  2 22
>>
>>
>>
>> A.K.
>>
>>
>>
>>
>>
>> ----- Original Message -----
>> From: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com>
>> To: r-help <r-help@r-project.org>
>> Cc:
>> Sent: Tuesday, January 29, 2013 4:11 PM
>> Subject: [R] Fastest way to compare a single value with all values in one
>> column of a data frame
>>
>>  Hello!
>>
>> I have a large data frame x:
>> x<-data.frame(item=letters[1:5],a=1:5,b=11:15)  # in actuality, x has 1000
>> rows
>> x$item<-as.character(x$item)
>> I also have a small data frame y with just 1 row:
>> y<-data.frame(item="f",a=3,b=10)
>> y$item<-as.character(y$item)
>>
>> I have to decide if y$a is larger than the smallest of all the values in
>> x$a. If it is, I want y to replace the whole row in x that has the lowest
>> value in column a.
>> This is how I'd do it.
>>
>> if(y$a>min(x$a)){
>>   whichmin<-which(x$a==min(x$a))
>>   x[whichmin,]<-y[1,]
>> }
>>
>>
>> I am wondering if there is a faster way of doing it. What would be the
>> fastest possible way? I'd have to do it, unfortunately, many-many times.
>>
>> Thank you very much!
>>
>> --
>> Dimitri Liakhovitski
>>  gfk.com <http://marketfusionanalytics.com/>
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> --
> Dimitri Liakhovitski
> gfk.com <http://marketfusionanalytics.com/>
>



-- 
Dimitri Liakhovitski
gfk.com <http://marketfusionanalytics.com/>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to