Hi:
Here's one approach:
# Function to process a list component into a data frame
ff <- function(x) {
data.frame(time = x[1], partitioning_mode = x[2], workload = x[3],
runtime = as.numeric(x[4:length(x)]) )
}
# Apply it to each element of the list:
do.call(rbind, lapply(data, ff))
or equivalently, using the plyr package,
library('plyr')
ldply(data, ff)
# Example:
L <- list(c("1", "sharding", "query", "607", "85", "52", "79", "77",
"67", "98"),
c("1", "sharding", "refresh", "2932", "2870", "2877", "2868"),
c("1", "replication", "query", "2891", "2907", "2922", "2937"))
do.call(rbind, lapply(L, ff))
time partitioning_mode workload runtime
1 1 sharding query 607
2 1 sharding query 85
3 1 sharding query 52
4 1 sharding query 79
5 1 sharding query 77
6 1 sharding query 67
7 1 sharding query 98
8 1 sharding refresh 2932
9 1 sharding refresh 2870
10 1 sharding refresh 2877
11 1 sharding refresh 2868
12 1 replication query 2891
13 1 replication query 2907
14 1 replication query 2922
15 1 replication query 2937
HTH,
Dennis
On Sun, Oct 23, 2011 at 8:38 AM, Giovanni Azua <[email protected]> wrote:
> Hello,
>
> I used R a lot one year ago and now I am a bit rusty :)
>
> I have my raw data which correspond to the list of runtimes per minute
> (minute "1" "2" "3" in two database modes "sharding" and "query" and two
> workload types "query" and "refresh") and as a list of char arrays that looks
> like this:
>
>> str(data)
> List of 122
> $ : chr [1:163] "1" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"
> ...
> $ : chr [1:313] "1" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
> $ : chr [1:57] "1" "replication" "query" "2891" "2907" "2922" "2937" ...
> $ : chr [1:278] "1" "replication refresh "79" "79" "89" "79" "89" "79" "79"
> "79" ...
> $ : chr [1:163] "2" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"
> ...
> $ : chr [1:313] "2" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
> $ : chr [1:57] "2" "replication" "query" "2891" "2907" "2922" "2937" ...
> $ : chr [1:278] "2" "replication refresh "79" "79" "89" "79" "89" "79" "79"
> "79" ...
> $ : chr [1:163] "3" "sharding" "query" "607" "85" "52" "79" "77" "67" "98"
> ...
> $ : chr [1:313] "3" "sharding" "refresh" "2932" "2870" "2877" "2868" ...
> $ : chr [1:57] "3" "replication" "query" "2891" "2907" "2922" "2937" ...
> $ : chr [1:278] "3" "replication refresh "79" "79" "89" "79" "89" "79" "79"
> "79" ...
>
> I would like to transform the one above into a data frame where this
> structure in unfolded in the following way:
>
> 'data.frame': N obs. of 3 variables:
> $ time : int 1 1 1 1 1 1 1 1 1 1 1 ...
> $ partitioning_mode : chr "sharding" "sharding" "sharding" "sharding"
> "sharding" "sharding" "sharding" "sharding" "sharding" "sharding" ...
> $ workload : chr "query" "query" "query" "query" "query" "query" "query"
> "refresh" "refresh" "refresh" "refresh" ...
> $ runtime : num 607 85 52 79 77 67 98 2932 2870 2877 2868...
>
> So instead of having an associative array (variable number of columns) it
> should become a simple list where the group or factors are repeated for every
> occurrence of the specific runtime. Basically my ultimate goal is to get a
> data frame structure that is "summarizeBy"-friendly and "ggplot2-friendly"
> i.e. using this data frame format.
>
> Help greatly appreciated!
>
> TIA,
> Best regards,
> Giovanni
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.