Dear Peter and Henrik,
Thanks for your replies - this helps speed up a bit, but I thought
there would be something much faster.
What I mean is that I thought that a particular value of a level
could be accessed instantly, similarly to a hash key.
Since I've got about 6000 levels in that data
I still don't understand what you are doing. Can you make a small
example that shows what you have and what you want?
Is ?split what you are after?
Emmanuel Levy wrote:
Dear Peter and Henrik,
Thanks for your replies - this helps speed up a bit, but I thought
there would be something much
Sorry for being unclear, I thought the example above was clear enough.
I have a data frame of the form:
name info
1 YAL001C 1
2 YAL001C 1
3 YAL001C 1
4 YAL001C 1
5 YAL001C 0
6 YAL001C 1
7 YAL001C 1
8 YAL001C 1
9 YAL001C 1
10 YAL001C 1
Wow great! Split was exactly what was needed. It takes about 1 second
for the whole operation :D
Thanks again - I can't believe I never used this function in the past.
All the best,
Emmanuel
2008/8/13 Erik Iverson [EMAIL PROTECTED]:
I still don't understand what you are doing. Can you make
split if probably what you are after. Here is an example:
n - 270
x - data.frame(name=sample(1:6000,n,TRUE), value=runif(n))
# split it into 6000 lists
system.time(y - split(x$value, x$name))
user system elapsed
0.800.201.07
str(y[1:10])
List of 10
$ 1 : num [1:454]
If you want the index, then use:
system.time(y - split(seq(nrow(x)), x$name))
user system elapsed
0.810.060.88
str(y[1:10])
List of 10
$ 1 : int [1:454] 6924 17503 26880 39197 42881 50835 57896 62624
65767 75359 ...
$ 2 : int [1:440] 9954 25619 25761 33776 56651 60372 61042
Dear All,
I have a large data frame ( 270 lines and 14 columns), and I would like to
extract the information in a particular way illustrated below:
Given a data frame df:
col1=sample(c(0,1),10, rep=T)
names = factor(c(rep(A,5),rep(B,5)))
df = data.frame(names,col1)
df
names col1
1
Emmanuel,
On Tue, Aug 12, 2008 at 4:35 PM, Emmanuel Levy [EMAIL PROTECTED] wrote:
Dear All,
I have a large data frame ( 270 lines and 14 columns), and I would like to
extract the information in a particular way illustrated below:
Given a data frame df:
col1=sample(c(0,1),10, rep=T)
To simplify:
n - 2.7e6;
x - factor(c(rep(A, n/2), rep(B, n/2)));
# Identify 'A':s
t1 - system.time(res - which(x == A));
# To compare a factor to a string, the factor is in practice
# coerced to a character vector.
t2 - system.time(res - which(as.character(x) == A));
# Interestingly enough,
9 matches
Mail list logo