[R] selecting a subset of files to be processed

2012-07-28 Thread Erin Hodgess
Dear R People:

I am using a Linux system in which I have about 3000 files.

I would like to randomly select about 45 of those files to be processed in R.

Could I make the selection in R or should I do it in Linux, please?

This is with R-2.15.1.

Thanks,
erin


-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting a subset of files to be processed

2012-07-28 Thread Joshua Wiley
Hi Erin,

It is not difficult to imagine doing it either in R or via the shell.
If they are all in the same directory, I would tend towards R, just
because you can easily set the seed and keep that information so you
can reproduce your random selection.

If the wd is in the directory with the files, from R just:

set.seed(10)
toprocess - sample(list.files(), size = 45)

Cheers,

Josh

On Sat, Jul 28, 2012 at 10:49 AM, Erin Hodgess erinm.hodg...@gmail.com wrote:
 Dear R People:

 I am using a Linux system in which I have about 3000 files.

 I would like to randomly select about 45 of those files to be processed in R.

 Could I make the selection in R or should I do it in Linux, please?

 This is with R-2.15.1.

 Thanks,
 erin


 --
 Erin Hodgess
 Associate Professor
 Department of Computer and Mathematical Sciences
 University of Houston - Downtown
 mailto: erinm.hodg...@gmail.com

 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.



-- 
Joshua Wiley
Ph.D. Student, Health Psychology
Programmer Analyst II, Statistical Consulting Group
University of California, Los Angeles
https://joshuawiley.com/

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting a subset of files to be processed

2012-07-28 Thread Rui Barradas

Hello,

If the files are to be processed in R select a random sample in R.
Using list.files() you can assign a character vector with the filenames 
of interest and then sample from that vector.


?list.files
filenames - list.files(path, pattern)

rand.sampl - sample(filenames, 45)

Hope this helps,

Rui Barradas

Em 28-07-2012 18:49, Erin Hodgess escreveu:

Dear R People:

I am using a Linux system in which I have about 3000 files.

I would like to randomly select about 45 of those files to be processed in R.

Could I make the selection in R or should I do it in Linux, please?

This is with R-2.15.1.

Thanks,
erin




__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting a subset of files to be processed

2012-07-28 Thread Ted Harding
And, in addition to the tip from Rui (and similar from Joshua) below,
I would advise that there is one good reason not to try doing it
in pure Linux.

The only source (that I know of) in Linux itself for random numbers
can be tapped by something like

  cat /dev/random  filename

/dev/random stores noise generated by the timings of system events
(keyboard presses, mouse-clicks, disk accesses, interrupts, etc.)
after subjecting them to a high-entropy stirring process. See:

  man random

It yields them in the form of random bytes (each of 8 random 0/1 bits)
and you would have to devise some means of coverting those onto a
form suitable for accessing a directory listing at random. Not a
pretty task!

There is also the command 'rand' available in the openSSL toolkit,
but that still outputs the results in the same format as /dev/random.

If you really want to do this outside R, the I would suggest writing
a little C program (to be run from the Linux command line). C can
do its own random number generation, with results returned as
real (double), and then apply these to select at random from the
contents of a file generated by something like

  ls filesdir  filelist.txt

and output the random selection.

Ted.

On 28-Jul-2012 18:00:38 Rui Barradas wrote:
 Hello,
 
 If the files are to be processed in R select a random sample in R.
 Using list.files() you can assign a character vector with the filenames 
 of interest and then sample from that vector.
 
 ?list.files
 filenames - list.files(path, pattern)
 
 rand.sampl - sample(filenames, 45)
 
 Hope this helps,
 
 Rui Barradas
 
 Em 28-07-2012 18:49, Erin Hodgess escreveu:
 Dear R People:

 I am using a Linux system in which I have about 3000 files.

 I would like to randomly select about 45 of those files to be processed in
 R.

 Could I make the selection in R or should I do it in Linux, please?

 This is with R-2.15.1.

 Thanks,
 erin


 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

-
E-Mail: (Ted Harding) ted.hard...@wlandres.net
Date: 28-Jul-2012  Time: 19:32:26
This message was sent by XFMail

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] selecting a subset of files to be processed

2012-07-28 Thread Erin Hodgess
Thanks so much!


On Sat, Jul 28, 2012 at 1:32 PM, Ted Harding ted.hard...@wlandres.net wrote:
 And, in addition to the tip from Rui (and similar from Joshua) below,
 I would advise that there is one good reason not to try doing it
 in pure Linux.

 The only source (that I know of) in Linux itself for random numbers
 can be tapped by something like

   cat /dev/random  filename

 /dev/random stores noise generated by the timings of system events
 (keyboard presses, mouse-clicks, disk accesses, interrupts, etc.)
 after subjecting them to a high-entropy stirring process. See:

   man random

 It yields them in the form of random bytes (each of 8 random 0/1 bits)
 and you would have to devise some means of coverting those onto a
 form suitable for accessing a directory listing at random. Not a
 pretty task!

 There is also the command 'rand' available in the openSSL toolkit,
 but that still outputs the results in the same format as /dev/random.

 If you really want to do this outside R, the I would suggest writing
 a little C program (to be run from the Linux command line). C can
 do its own random number generation, with results returned as
 real (double), and then apply these to select at random from the
 contents of a file generated by something like

   ls filesdir  filelist.txt

 and output the random selection.

 Ted.

 On 28-Jul-2012 18:00:38 Rui Barradas wrote:
 Hello,

 If the files are to be processed in R select a random sample in R.
 Using list.files() you can assign a character vector with the filenames
 of interest and then sample from that vector.

 ?list.files
 filenames - list.files(path, pattern)

 rand.sampl - sample(filenames, 45)

 Hope this helps,

 Rui Barradas

 Em 28-07-2012 18:49, Erin Hodgess escreveu:
 Dear R People:

 I am using a Linux system in which I have about 3000 files.

 I would like to randomly select about 45 of those files to be processed in
 R.

 Could I make the selection in R or should I do it in Linux, please?

 This is with R-2.15.1.

 Thanks,
 erin



 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

 -
 E-Mail: (Ted Harding) ted.hard...@wlandres.net
 Date: 28-Jul-2012  Time: 19:32:26
 This message was sent by XFMail
 -



-- 
Erin Hodgess
Associate Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
mailto: erinm.hodg...@gmail.com

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.