[R] selecting a subset of files to be processed
Dear R People: I am using a Linux system in which I have about 3000 files. I would like to randomly select about 45 of those files to be processed in R. Could I make the selection in R or should I do it in Linux, please? This is with R-2.15.1. Thanks, erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting a subset of files to be processed
Hi Erin, It is not difficult to imagine doing it either in R or via the shell. If they are all in the same directory, I would tend towards R, just because you can easily set the seed and keep that information so you can reproduce your random selection. If the wd is in the directory with the files, from R just: set.seed(10) toprocess - sample(list.files(), size = 45) Cheers, Josh On Sat, Jul 28, 2012 at 10:49 AM, Erin Hodgess erinm.hodg...@gmail.com wrote: Dear R People: I am using a Linux system in which I have about 3000 files. I would like to randomly select about 45 of those files to be processed in R. Could I make the selection in R or should I do it in Linux, please? This is with R-2.15.1. Thanks, erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. -- Joshua Wiley Ph.D. Student, Health Psychology Programmer Analyst II, Statistical Consulting Group University of California, Los Angeles https://joshuawiley.com/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting a subset of files to be processed
Hello, If the files are to be processed in R select a random sample in R. Using list.files() you can assign a character vector with the filenames of interest and then sample from that vector. ?list.files filenames - list.files(path, pattern) rand.sampl - sample(filenames, 45) Hope this helps, Rui Barradas Em 28-07-2012 18:49, Erin Hodgess escreveu: Dear R People: I am using a Linux system in which I have about 3000 files. I would like to randomly select about 45 of those files to be processed in R. Could I make the selection in R or should I do it in Linux, please? This is with R-2.15.1. Thanks, erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting a subset of files to be processed
And, in addition to the tip from Rui (and similar from Joshua) below, I would advise that there is one good reason not to try doing it in pure Linux. The only source (that I know of) in Linux itself for random numbers can be tapped by something like cat /dev/random filename /dev/random stores noise generated by the timings of system events (keyboard presses, mouse-clicks, disk accesses, interrupts, etc.) after subjecting them to a high-entropy stirring process. See: man random It yields them in the form of random bytes (each of 8 random 0/1 bits) and you would have to devise some means of coverting those onto a form suitable for accessing a directory listing at random. Not a pretty task! There is also the command 'rand' available in the openSSL toolkit, but that still outputs the results in the same format as /dev/random. If you really want to do this outside R, the I would suggest writing a little C program (to be run from the Linux command line). C can do its own random number generation, with results returned as real (double), and then apply these to select at random from the contents of a file generated by something like ls filesdir filelist.txt and output the random selection. Ted. On 28-Jul-2012 18:00:38 Rui Barradas wrote: Hello, If the files are to be processed in R select a random sample in R. Using list.files() you can assign a character vector with the filenames of interest and then sample from that vector. ?list.files filenames - list.files(path, pattern) rand.sampl - sample(filenames, 45) Hope this helps, Rui Barradas Em 28-07-2012 18:49, Erin Hodgess escreveu: Dear R People: I am using a Linux system in which I have about 3000 files. I would like to randomly select about 45 of those files to be processed in R. Could I make the selection in R or should I do it in Linux, please? This is with R-2.15.1. Thanks, erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 28-Jul-2012 Time: 19:32:26 This message was sent by XFMail __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] selecting a subset of files to be processed
Thanks so much! On Sat, Jul 28, 2012 at 1:32 PM, Ted Harding ted.hard...@wlandres.net wrote: And, in addition to the tip from Rui (and similar from Joshua) below, I would advise that there is one good reason not to try doing it in pure Linux. The only source (that I know of) in Linux itself for random numbers can be tapped by something like cat /dev/random filename /dev/random stores noise generated by the timings of system events (keyboard presses, mouse-clicks, disk accesses, interrupts, etc.) after subjecting them to a high-entropy stirring process. See: man random It yields them in the form of random bytes (each of 8 random 0/1 bits) and you would have to devise some means of coverting those onto a form suitable for accessing a directory listing at random. Not a pretty task! There is also the command 'rand' available in the openSSL toolkit, but that still outputs the results in the same format as /dev/random. If you really want to do this outside R, the I would suggest writing a little C program (to be run from the Linux command line). C can do its own random number generation, with results returned as real (double), and then apply these to select at random from the contents of a file generated by something like ls filesdir filelist.txt and output the random selection. Ted. On 28-Jul-2012 18:00:38 Rui Barradas wrote: Hello, If the files are to be processed in R select a random sample in R. Using list.files() you can assign a character vector with the filenames of interest and then sample from that vector. ?list.files filenames - list.files(path, pattern) rand.sampl - sample(filenames, 45) Hope this helps, Rui Barradas Em 28-07-2012 18:49, Erin Hodgess escreveu: Dear R People: I am using a Linux system in which I have about 3000 files. I would like to randomly select about 45 of those files to be processed in R. Could I make the selection in R or should I do it in Linux, please? This is with R-2.15.1. Thanks, erin __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. - E-Mail: (Ted Harding) ted.hard...@wlandres.net Date: 28-Jul-2012 Time: 19:32:26 This message was sent by XFMail - -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodg...@gmail.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.