Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy
Dear Peter and Henrik, Thanks for your replies - this helps speed up a bit, but I thought there would be something much faster. What I mean is that I thought that a particular value of a level could be accessed instantly, similarly to a hash key. Since I've got about 6000 levels in that data

Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Erik Iverson
I still don't understand what you are doing. Can you make a small example that shows what you have and what you want? Is ?split what you are after? Emmanuel Levy wrote: Dear Peter and Henrik, Thanks for your replies - this helps speed up a bit, but I thought there would be something much

Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy
Sorry for being unclear, I thought the example above was clear enough. I have a data frame of the form: name info 1 YAL001C 1 2 YAL001C 1 3 YAL001C 1 4 YAL001C 1 5 YAL001C 0 6 YAL001C 1 7 YAL001C 1 8 YAL001C 1 9 YAL001C 1 10 YAL001C 1

Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread Emmanuel Levy
Wow great! Split was exactly what was needed. It takes about 1 second for the whole operation :D Thanks again - I can't believe I never used this function in the past. All the best, Emmanuel 2008/8/13 Erik Iverson [EMAIL PROTECTED]: I still don't understand what you are doing. Can you make

Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread jim holtman
split if probably what you are after. Here is an example: n - 270 x - data.frame(name=sample(1:6000,n,TRUE), value=runif(n)) # split it into 6000 lists system.time(y - split(x$value, x$name)) user system elapsed 0.800.201.07 str(y[1:10]) List of 10 $ 1 : num [1:454]

Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-13 Thread jim holtman
If you want the index, then use: system.time(y - split(seq(nrow(x)), x$name)) user system elapsed 0.810.060.88 str(y[1:10]) List of 10 $ 1 : int [1:454] 6924 17503 26880 39197 42881 50835 57896 62624 65767 75359 ... $ 2 : int [1:440] 9954 25619 25761 33776 56651 60372 61042

[R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-12 Thread Emmanuel Levy
Dear All, I have a large data frame ( 270 lines and 14 columns), and I would like to extract the information in a particular way illustrated below: Given a data frame df: col1=sample(c(0,1),10, rep=T) names = factor(c(rep(A,5),rep(B,5))) df = data.frame(names,col1) df names col1 1

Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-12 Thread Peter Cowan
Emmanuel, On Tue, Aug 12, 2008 at 4:35 PM, Emmanuel Levy [EMAIL PROTECTED] wrote: Dear All, I have a large data frame ( 270 lines and 14 columns), and I would like to extract the information in a particular way illustrated below: Given a data frame df: col1=sample(c(0,1),10, rep=T)

Re: [R] which(df$name==A) takes ~1 second! (df is very large), but can it be speeded up?

2008-08-12 Thread Henrik Bengtsson
To simplify: n - 2.7e6; x - factor(c(rep(A, n/2), rep(B, n/2))); # Identify 'A':s t1 - system.time(res - which(x == A)); # To compare a factor to a string, the factor is in practice # coerced to a character vector. t2 - system.time(res - which(as.character(x) == A)); # Interestingly enough,