Re: Use R to manage results from GNU Parallel

David Rosenberg Sat, 04 Jan 2014 21:56:15 -0800

>
> Your idea requires the user to make sure the output is \t separated.
>
>
Yes... I've been doing that for years and life has been better ever since.
 But sure, the separator should be a parameter.



> Maybe we could have an option that would indicate the splitting char.
> The default would be none = don't split:
>
> > load_parallel_results(file,split="\t")
>     myvar1 myvar2          V1 V2
>   1      1      A       Hello  1
>   2      1      A         Bye  2
>   3      1      A         Wow  3
>   4      2      A Interesting  9
>   5      1      B     NewYork  3
>
> > load_parallel_results(file)
>     myvar1 myvar2          stdout stderr
>   1      1      A       "Hello\t1\nBye\t2\nWow\t3\n" ""
>   2      2      A "Interesting\t9\n" ""
>   3      1      B     "NewYork\t3\n" ""
>
>
That seems reasonable.


>  I am also somewhat concerned that the current function loads all
> stdout/stderr files - even if they are never used. It would be better
> if that could be done lazily - see
>
> http://stackoverflow.com/questions/20923089/r-store-functions-in-a-data-frame
>

I'm not sure there's a 'right' answer here.  I think it depends on how
you'll use the results.

>
> I believe I would prefer returning a data-structure, that you could
> select the relevant records from based on the arguments. And when you
> have the records you want, you can ask to have the stdout/stderr read
> in and possibly expanded as rows. This would be able to scale to much
> bigger stdout/stderr and many more jobs.
>

Seems reasonable.

>
> Maybe the trivial solution is to simply return a table of the args+the
> filenames of stdout/stderr, and then have a function that turns that
> table into the read in files, which you can run either immediately or
> after you have selected the relevant rows.
>
> Yes -- I often do this: first go to the file system to collect all the
file paths I might be interested in and the relevant metadata (for me, it's
typically creation date).  Then I figure out when paths I want to load, and
then load them all in.

David

/Ole

Re: Use R to manage results from GNU Parallel

Reply via email to