On Sun, Jan 5, 2014 at 6:55 AM, David Rosenberg <david.dav...@gmail.com> wrote:
>> Maybe we could have an option that would indicate the splitting char. >> The default would be none = don't split: >> >> > load_parallel_results(file,split="\t") >> myvar1 myvar2 V1 V2 >> 1 1 A Hello 1 >> 2 1 A Bye 2 >> 3 1 A Wow 3 >> 4 2 A Interesting 9 >> 5 1 B NewYork 3 >> >> > load_parallel_results(file) >> myvar1 myvar2 stdout stderr >> 1 1 A "Hello\t1\nBye\t2\nWow\t3\n" "" >> 2 2 A "Interesting\t9\n" "" >> 3 1 B "NewYork\t3\n" "" >> > > That seems reasonable. Giving it some more thought I think we also want a way to split on newlines but not on tabs: > load_parallel_results(file,splitnewline=T) myvar1 myvar2 stdout 1 1 A "Hello\t1" 2 1 A "Bye\t2" 3 1 A "Wow\t3" 4 2 A "Interesting\t9" 5 1 B "NewYork\t3" >> I believe I would prefer returning a data-structure, that you could >> select the relevant records from based on the arguments. And when you >> have the records you want, you can ask to have the stdout/stderr read >> in and possibly expanded as rows. This would be able to scale to much >> bigger stdout/stderr and many more jobs. So something like: load_parallel_results_filenames <- function(resdir) { # return ## myvar1 myvar2 stdout stderr ## [1,] "1" "A" "my/dir/1/A/stdout" "my/dir/1/A/stdout" } load_parallel_results_raw_content <- function(filenametable) { # return ## myvar1 myvar2 stdout stderr ## [1,] "1" "A" `cat my/dir/1/A/stdout` `cat my/dir/1/A/stdout` } load_parallel_results_split_on_newline <- function(filenametable) { # return ## myvar1 myvar2 stdout1 ## [1,] "1" "A" "stdout-line1" ## [2,] "1" "A" "stdout-line2" } load_parallel_results_split_to_columns <- function(filenametable) { # return ## myvar1 myvar2 stdout1 stdout2 ## [1,] "1" "A" "col1-line1" "col2-line1" ## [2,] "1" "A" "col1-line2" "col2-line2" } Maybe it makes sense that all these functions can be called from a single function by setting options: load_parallel_results(x,output=NULL,linesep="\n",colsep="\t") { if(x is string) { resdir <- x filenametable <- load_parallel_results_filenames(resdir); } if(x is table) { filenametable <- x } if(output==raw) { return(load_parallel_results_raw_content(filenametable)) } if(output==newline) { return(load_parallel_results_split_on_newline(filenametable,linesep)) } if(output==columns) { return(load_parallel_results_split_to_columns(filenametable,linesep,colsep)); } return(load_parallel_results_filenames(resdir)) } I have made (see below): load_parallel_results_raw_content(filenametable) load_parallel_results_filenames(resdir) But I would appreciate help with: load_parallel_results_split_on_newline(filenametable) load_parallel_results_split_to_columns(filenametable) /Ole load_parallel_results_filenames <- function(resdir) { ## Find files called .../stdout stdoutnames <- list.files(path=resdir, pattern="stdout", recursive=T); ## Find files called .../stderr stderrnames <- list.files(path=resdir, pattern="stderr", recursive=T); if(length(stdoutnames) == 0) { ## Return empty data frame if no files found return(data.frame()); } m <- matrix(unlist(strsplit(stdoutnames, "/")),nrow = length(stdoutnames),byrow=T); tbl <- as.table(m[,c(F,T)]); ## Append the stdout and stderr filenames tbl <- cbind(tbl,unlist(stdoutnames),unlist(stderrnames)); colnames(tbl) <- c(strsplit(stdoutnames[1],"/")[[1]][c(T,F)],"stderr"); return(tbl); } load_parallel_results_raw_content <- function(tbl) { ## Read them stdoutcontents <- lapply(tbl[,c("stdout")], function(x) { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } ); stderrcontents <- lapply(tbl[,c("stderr")], function(x) { return(paste(readLines(paste(resdir,x,sep="/")),collapse="\n")) } ); # Replace filenames with file contents tbl[,c("stdout","stderr")] <- c(as.character(stdoutcontents),as.character(stderrcontents)); return(tbl); }