Re: select N random lines in a file

2004-08-23 Thread Winston Smith
On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote:
 I would like to write a script that will select N number of
 random lines in a file.  Any suggestions on how to do this?

This program has the advantage that it doesn't read the whole file into
memory, which is important if the file is large. Save it as an executable
file, randomlines, and then type randomlines N file to get N random
lines from file (without repetition). The lines will be in the same
order they were in the file. Type randomlines -r N file to get N random
lines in random order.

#! /usr/bin/perl -s

$N=shift; #first arg is N
srand;
while(){
if(rand($.)  $N){
if(@lines == $N){
# drop one random element
splice @lines,int rand $N,1;
}
if($r){
splice @lines, int rand @lines+1, 0, $_;
}
else{
push @lines, $_;
}
}
}

print $_ for @lines;

__END__

The proof that the algorithm is correct is by induction on the number of lines
in the file (also, see Knuth reference below). 

It is based on a program  in the perl documentation that returns 1 random
line from a file, which I found by typing perldoc -q 'random line':

  How do I select a random line from a file?
  
Here's an algorithm from the Camel Book:
  
srand;
rand($.)  1  ($line = $_) while ;
  
This has a significant advantage in space over reading the whole file
in.  You can find a proof of this method in The Art of Computer Pro-
gramming, Volume 2, Section 3.4.2, by Donald E. Knuth.
  
You can use the File::Random module which provides a function for that
algorithm:
  
use File::Random qw/random_line/;
my $line = random_line($filename);
  
Another way is to use the Tie::File module, which treats the entire
file as an array.  Simply access a random array element.
 (END)


Winston Smith, [EMAIL PROTECTED] where x=winstonsmith, y=ispwest.com


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



select N random lines in a file

2004-08-22 Thread Lance Hoffmeyer
Hello all,
I would like to write a script that will select N number of
random lines in a file.  Any suggestions on how to do this?
Lance
--
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: select N random lines in a file

2004-08-22 Thread Thomas Adam
On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote:
 Hello all,
 
 I would like to write a script that will select N number of
 random lines in a file.  Any suggestions on how to do this?

Here's one that prints random lines of files, ensuring not to overshoot the
known length of the number of lines in the file:

sed -n $[ $RANDOM % $(wc -l  $FILE) + 1 ],$[ $RANDOM % $(wc -l  ./foo) + 1 ]p $FILE

Where $FILE is the filename. If you want to control the number of lines it
will print, change the values. :)

-- Thomas Adam

--
Frankly, Mr. Shankly, since you ask. You are a flatulent pain in 
the arse. -- Morrissey.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: select N random lines in a file

2004-08-22 Thread Oliver Elphick
On Sun, 2004-08-22 at 20:35, Lance Hoffmeyer wrote:
 Hello all,
 
 I would like to write a script that will select N number of
 random lines in a file.  Any suggestions on how to do this?
 

Use the rand() function in awk

-- 
Oliver Elphick  [EMAIL PROTECTED]
Isle of Wight  http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA  92C8 39E7 280E 3631 3F0E  1EC0 5664 7A2F A543 10EA
 
 For yourselves know perfectly that the day of the Lord
  so cometh as a thief in the night. For when they shall
  say, Peace and safety; then sudden destruction cometh 
  upon them, as travail upon a woman with child; and 
  they shall not escape.  I Thessalonians 5:2,3 


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: select N random lines in a file

2004-08-22 Thread Travis Crump
Thomas Adam wrote:
On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote:
Hello all,
I would like to write a script that will select N number of
random lines in a file.  Any suggestions on how to do this?

Here's one that prints random lines of files, ensuring not to overshoot the
known length of the number of lines in the file:
sed -n $[ $RANDOM % $(wc -l  $FILE) + 1 ],$[ $RANDOM % $(wc -l  ./foo) + 1 ]p $FILE
Where $FILE is the filename. If you want to control the number of lines it
will print, change the values. :)
-- Thomas Adam

That only prints one line whenever the second range number is less than 
the first[which happens roughly half the time][ignoring the fact that 
./foo should also be $FILE].


signature.asc
Description: OpenPGP digital signature


Re: select N random lines in a file

2004-08-22 Thread Stefan O'Rear
On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote:
 Hello all,
 
 I would like to write a script that will select N number of
 random lines in a file.  Any suggestions on how to do this?
 
 Lance

One way:

- Count lines in file
- Pick N line numbers
- Show file, filtering out unpicked lines

# implementation

rnd () {
   perl -e 'print int(rand()*'$1'), \n'
}

LCT=`wc -l  $2`

PAT=tommy

for x in `seq 1 $1`; do
   PAT=$PAT\\|$(rnd $LCT)
done

sed =  $2 | sed -n -e 'N; s/\
/ /; /^\('$PAT'\) /p' | sed 's/[^ ]* //'

Another way:

#! /usr/bin/perl

@lines = STDIN;

for ($i = 0; $i  $ARGV[0]; $i++) {
print $lines[ rand @lines ];
}

Endless variations...


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: select N random lines in a file

2004-08-22 Thread Thomas Adam
On Sun, Aug 22, 2004 at 04:11:39PM -0400, Travis Crump wrote:
 
 That only prints one line whenever the second range number is less than 
 the first[which happens roughly half the time][ignoring the fact that 
 ./foo should also be $FILE].

Yeah, it's not perfect, but it was just off the top of my head. You can do all
of this in perl, awk, etc.  If you can suggest something, do so.

-- Thomas Adam
--
Frankly, Mr. Shankly, since you ask. You are a flatulent pain in 
the arse. -- Morrissey.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: select N random lines in a file

2004-08-22 Thread Henrik Christian Grove
Lance Hoffmeyer [EMAIL PROTECTED] writes:

 Hello all,
 
 I would like to write a script that will select N number of
 random lines in a file.  Any suggestions on how to do this?

bogosort -n file | head -N

.Henrik

-- 
Henrik Christian Grove
[EMAIL PROTECTED]
Systems administrator


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: select N random lines in a file

2004-08-22 Thread Adam Funk
On Sunday 22 August 2004 20:40, Lance Hoffmeyer wrote:

 Hello all,
 
 I would like to write a script that will select N number of
 random lines in a file.  Any suggestions on how to do this?

I can't guarantee that this is the best approach, but I would write a
small Perl program.  If you don't mind duplicates in the output, read
them into an array of strings then pick N items at random from the
array.  If you want to avoid duplicates, read them into a hash so you
can delete each item as it is picked.


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED] 
with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]



Re: select N random lines in a file

2004-08-22 Thread Kevin Mark
On Sun, Aug 22, 2004 at 02:35:14PM -0500, Lance Hoffmeyer wrote:
 Hello all,
 
 I would like to write a script that will select N number of
 random lines in a file.  Any suggestions on how to do this?
 
 
 Lance
Hi Lance,
any language preference? C,perl, bash? there are 1x 10^6 ways.
Is this for a one-time-use or part of a bigger project?
-Kev
-- 

(__)
(oo)
  /--\/
 / |||
*  /\---/\
   ~~   ~~
Have you mooed today?...


signature.asc
Description: Digital signature


Re: select N random lines in a file

2004-08-22 Thread Kevin Mark
On Sun, Aug 22, 2004 at 09:16:37PM +0100, Thomas Adam wrote:
 On Sun, Aug 22, 2004 at 04:11:39PM -0400, Travis Crump wrote:
  
  That only prints one line whenever the second range number is less than 
  the first[which happens roughly half the time][ignoring the fact that 
  ./foo should also be $FILE].
 
 Yeah, it's not perfect, but it was just off the top of my head. You can do all
 of this in perl, awk, etc.  If you can suggest something, do so.
 
 -- Thomas Adam
 --
Hi Thom,
ok.
-
#!/usr/bin/perl
use strict;
use warnings;

my $lines = 5;
my $file = /home/kevin/Mail/backup;

# load file in line array (0..N-1), counting lines
my @line;
open (FH,$file);
my $count=0;
while (FH){
$line[$count++]=$_;
}
close(FH);

# generate N random, non-duplicate numbers stored in a hash

sub ran { # 0..N-1
my $max = shift;
return int($max * rand());
}

my %out;
my $i=0;
while($i  $lines){
my $value = ran($count);
if (not($out{$value})) {
$out{$value}=1;
$i++;
print $line[$value]; #print the line out in RANDOM order
}
}
--
-Kev
~  
   

-- 

(__)
(oo)
  /--\/
 / |||
*  /\---/\
   ~~   ~~
Have you mooed today?...


signature.asc
Description: Digital signature