Re: select N random lines in a file
On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote: I would like to write a script that will select N number of random lines in a file. Any suggestions on how to do this? This program has the advantage that it doesn't read the whole file into memory, which is important if the file is large. Save it as an executable file, randomlines, and then type randomlines N file to get N random lines from file (without repetition). The lines will be in the same order they were in the file. Type randomlines -r N file to get N random lines in random order. #! /usr/bin/perl -s $N=shift; #first arg is N srand; while(){ if(rand($.) $N){ if(@lines == $N){ # drop one random element splice @lines,int rand $N,1; } if($r){ splice @lines, int rand @lines+1, 0, $_; } else{ push @lines, $_; } } } print $_ for @lines; __END__ The proof that the algorithm is correct is by induction on the number of lines in the file (also, see Knuth reference below). It is based on a program in the perl documentation that returns 1 random line from a file, which I found by typing perldoc -q 'random line': How do I select a random line from a file? Here's an algorithm from the Camel Book: srand; rand($.) 1 ($line = $_) while ; This has a significant advantage in space over reading the whole file in. You can find a proof of this method in The Art of Computer Pro- gramming, Volume 2, Section 3.4.2, by Donald E. Knuth. You can use the File::Random module which provides a function for that algorithm: use File::Random qw/random_line/; my $line = random_line($filename); Another way is to use the Tie::File module, which treats the entire file as an array. Simply access a random array element. (END) Winston Smith, [EMAIL PROTECTED] where x=winstonsmith, y=ispwest.com -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
select N random lines in a file
Hello all, I would like to write a script that will select N number of random lines in a file. Any suggestions on how to do this? Lance -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: select N random lines in a file
On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote: Hello all, I would like to write a script that will select N number of random lines in a file. Any suggestions on how to do this? Here's one that prints random lines of files, ensuring not to overshoot the known length of the number of lines in the file: sed -n $[ $RANDOM % $(wc -l $FILE) + 1 ],$[ $RANDOM % $(wc -l ./foo) + 1 ]p $FILE Where $FILE is the filename. If you want to control the number of lines it will print, change the values. :) -- Thomas Adam -- Frankly, Mr. Shankly, since you ask. You are a flatulent pain in the arse. -- Morrissey. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: select N random lines in a file
On Sun, 2004-08-22 at 20:35, Lance Hoffmeyer wrote: Hello all, I would like to write a script that will select N number of random lines in a file. Any suggestions on how to do this? Use the rand() function in awk -- Oliver Elphick [EMAIL PROTECTED] Isle of Wight http://www.lfix.co.uk/oliver GPG: 1024D/A54310EA 92C8 39E7 280E 3631 3F0E 1EC0 5664 7A2F A543 10EA For yourselves know perfectly that the day of the Lord so cometh as a thief in the night. For when they shall say, Peace and safety; then sudden destruction cometh upon them, as travail upon a woman with child; and they shall not escape. I Thessalonians 5:2,3 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: select N random lines in a file
Thomas Adam wrote: On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote: Hello all, I would like to write a script that will select N number of random lines in a file. Any suggestions on how to do this? Here's one that prints random lines of files, ensuring not to overshoot the known length of the number of lines in the file: sed -n $[ $RANDOM % $(wc -l $FILE) + 1 ],$[ $RANDOM % $(wc -l ./foo) + 1 ]p $FILE Where $FILE is the filename. If you want to control the number of lines it will print, change the values. :) -- Thomas Adam That only prints one line whenever the second range number is less than the first[which happens roughly half the time][ignoring the fact that ./foo should also be $FILE]. signature.asc Description: OpenPGP digital signature
Re: select N random lines in a file
On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote: Hello all, I would like to write a script that will select N number of random lines in a file. Any suggestions on how to do this? Lance One way: - Count lines in file - Pick N line numbers - Show file, filtering out unpicked lines # implementation rnd () { perl -e 'print int(rand()*'$1'), \n' } LCT=`wc -l $2` PAT=tommy for x in `seq 1 $1`; do PAT=$PAT\\|$(rnd $LCT) done sed = $2 | sed -n -e 'N; s/\ / /; /^\('$PAT'\) /p' | sed 's/[^ ]* //' Another way: #! /usr/bin/perl @lines = STDIN; for ($i = 0; $i $ARGV[0]; $i++) { print $lines[ rand @lines ]; } Endless variations... -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: select N random lines in a file
On Sun, Aug 22, 2004 at 04:11:39PM -0400, Travis Crump wrote: That only prints one line whenever the second range number is less than the first[which happens roughly half the time][ignoring the fact that ./foo should also be $FILE]. Yeah, it's not perfect, but it was just off the top of my head. You can do all of this in perl, awk, etc. If you can suggest something, do so. -- Thomas Adam -- Frankly, Mr. Shankly, since you ask. You are a flatulent pain in the arse. -- Morrissey. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: select N random lines in a file
Lance Hoffmeyer [EMAIL PROTECTED] writes: Hello all, I would like to write a script that will select N number of random lines in a file. Any suggestions on how to do this? bogosort -n file | head -N .Henrik -- Henrik Christian Grove [EMAIL PROTECTED] Systems administrator -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: select N random lines in a file
On Sunday 22 August 2004 20:40, Lance Hoffmeyer wrote: Hello all, I would like to write a script that will select N number of random lines in a file. Any suggestions on how to do this? I can't guarantee that this is the best approach, but I would write a small Perl program. If you don't mind duplicates in the output, read them into an array of strings then pick N items at random from the array. If you want to avoid duplicates, read them into a hash so you can delete each item as it is picked. -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: select N random lines in a file
On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote: Hello all, I would like to write a script that will select N number of random lines in a file. Any suggestions on how to do this? Lance Hi Lance, any language preference? C,perl, bash? there are 1x 10^6 ways. Is this for a one-time-use or part of a bigger project? -Kev -- (__) (oo) /--\/ / ||| * /\---/\ ~~ ~~ Have you mooed today?... signature.asc Description: Digital signature
Re: select N random lines in a file
On Sun, Aug 22, 2004 at 09:16:37PM +0100, Thomas Adam wrote: On Sun, Aug 22, 2004 at 04:11:39PM -0400, Travis Crump wrote: That only prints one line whenever the second range number is less than the first[which happens roughly half the time][ignoring the fact that ./foo should also be $FILE]. Yeah, it's not perfect, but it was just off the top of my head. You can do all of this in perl, awk, etc. If you can suggest something, do so. -- Thomas Adam -- Hi Thom, ok. - #!/usr/bin/perl use strict; use warnings; my $lines = 5; my $file = /home/kevin/Mail/backup; # load file in line array (0..N-1), counting lines my @line; open (FH,$file); my $count=0; while (FH){ $line[$count++]=$_; } close(FH); # generate N random, non-duplicate numbers stored in a hash sub ran { # 0..N-1 my $max = shift; return int($max * rand()); } my %out; my $i=0; while($i $lines){ my $value = ran($count); if (not($out{$value})) { $out{$value}=1; $i++; print $line[$value]; #print the line out in RANDOM order } } -- -Kev ~ -- (__) (oo) /--\/ / ||| * /\---/\ ~~ ~~ Have you mooed today?... signature.asc Description: Digital signature