Re: Comparing two lists

2011-05-07 Thread Matthew Seaman
On 07/05/2011 01:09, Rolf Nielsen wrote:
 I have two text files, quite extensive ones. They have some lines in
 common and some lines are unique to one of the files. The lines that do
 exist in both files are not necessarily in the same location. Now I need
 to compare the files and output a list of lines that exist in both
 files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
 combination of two or more of them?

comm(1)

Which does exactly what you want -- showing lines that belong to one
file or another, and lines that belong to both.  The limitation is that
the files need to be sorted before being compared.

Cheers,

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.   7 Priory Courtyard
  Flat 3
PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate
JID: matt...@infracaninophile.co.uk   Kent, CT11 9PW



signature.asc
Description: OpenPGP digital signature


Re: Comparing two lists [SOLVED (at least it looks like that)]

2011-05-07 Thread Rolf Nielsen

2011-05-07 05:11, Yuri Pankov skrev:

On Sat, May 07, 2011 at 04:23:40AM +0200, Rolf Nielsen wrote:

2011-05-07 02:09, Rolf Nielsen skrev:

Hello all,

I have two text files, quite extensive ones. They have some lines in
common and some lines are unique to one of the files. The lines that do
exist in both files are not necessarily in the same location. Now I need
to compare the files and output a list of lines that exist in both
files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
combination of two or more of them?

TIA,

Rolf


sort file1 file2 | uniq -d


I very seriously doubt that this line does what you want...

$ printf a\na\na\nb\n  file1; printf c\nc\nb\n  file2; sort file1 file2 | 
uniq -d
a
b
c


Ok. I do understand the problem. Though the files I have do not have any 
duplicate lines, so that possibility didn't even cross my mind.





Try this instead (probably bloated):

sort  file1 | uniq | tr -s '\n' '\0' | xargs -0 -I % grep -Fx % file2 | sort | 
uniq

There is comm(1), of course, but it expects files to be already sorted.


The files are sorted, so comm would work. Several people have already 
suggested comm, though I haven't tried it, as combining sort and uniq 
does what I want with my specific files.





HTH,
Yuri



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists [SOLVED (at least it looks like that)]

2011-05-07 Thread Rolf Nielsen

2011-05-07 05:16, b. f. skrev:

2011-05-07 02:09, Rolf Nielsen skrev:

Hello all,

I have two text files, quite extensive ones. They have some lines in
common and some lines are unique to one of the files. The lines that do
exist in both files are not necessarily in the same location. Now I need
to compare the files and output a list of lines that exist in both
files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
combination of two or more of them?

...

sort file1 file2 | uniq -d


If the lines aren't repeated in only one file...


They aren't (see my reply to Yuri Pankov). :)



For future reference, comm(1) exists to handle problems like this,
although (of course) TIMTOWTDI.

b.



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists

2011-05-07 Thread Rolf Nielsen

2011-05-07 07:28, Robert Bonomi skrev:

 From listrea...@lazlarlyricon.com  Fri May  6 20:14:09 2011
Date: Sat, 07 May 2011 03:13:39 +0200
From: Rolf Nielsenlistrea...@lazlarlyricon.com
To: Robert Bonomibon...@mail.r-bonomi.com
CC: freebsd-questions@freebsd.org
Subject: Re: Comparing two lists

2011-05-07 02:54, Robert Bonomi skrev:

   From owner-freebsd-questi...@freebsd.org  Fri May  6 19:27:54 2011
Date: Sat, 07 May 2011 02:09:26 +0200
From: Rolf Nielsenlistrea...@lazlarlyricon.com
To: FreeBSDfreebsd-questions@freebsd.org
Subject: Comparing two lists

Hello all,

I have two text files, quite extensive ones. They have some lines in
common and some lines are unique to one of the files. The lines that do
exist in both files are not necessarily in the same location. Now I need
to compare the files and output a list of lines that exist in both
files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
combination of two or more of them?



If the files have only 'minor' differences -- i.e. no long runs of lines
that are in only one fie -- *and* the common lines are  in the same order
in each file, you can use diff(1), without any other shennigans.

If the above is -not- true, and If you need _only_ the common lines, AND
order is not important, then sort(1) both files, and use diff(1) on the
two sorted versions.


Beyond that it depends on what you mean by 'extensive' ones.  megabytes?
Gigabytes? or what??





Some 10,000 to 20,000 lines each. I do need only the common lines. Order
is not essential, but would make life easier. I've tried a little with
uniq, as suggested by Polyptron, but I guess 3am is not quite the right
time to do these things. Anyway, thanks.


Ok, 20k lines is only a medium-size file. There's no problem in fitting
the entire file 'in memory'.  ('big' files are ones that are larger than
available memory. :)


By quite extensive I was refering to the number of lines rather than 
the byte size, and 20k lines is, by my standards, quite a lot for a 
plain text file. :P

But that's beside the point. :)



Using uniq:
sort  {{file1}} {{file2}} |uniq -d


Yes, I found that solution on
http://www.catonmat.net/blog/set-operations-in-unix-shell
which is mainly about comm, but also lists other ways of doing things. I 
also found

grep -xF -f file1 file2
there, and I've tested that one too. Both seem to be doing what I want.



to maintain order, put the following in a file, call it 'common.awk'

  NR==FNR   { array[$0]=1; next; }
{ if (array[$0] == 1) print $0; }

then use the command:

   awk -f common.awk {{file1}} {{file2}}

This will output common lines, in the order they occur in _file2_.




I took the liberty of sending a copy of this to the list although you 
replied privately.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists

2011-05-07 Thread Chad Perrin
On Sat, May 07, 2011 at 02:09:26AM +0200, Rolf Nielsen wrote:
 
 I have two text files, quite extensive ones. They have some lines in 
 common and some lines are unique to one of the files. The lines that do 
 exist in both files are not necessarily in the same location. Now I need 
 to compare the files and output a list of lines that exist in both 
 files. Is there a simple way to do this? diff? awk? sed? cmp? Or a 
 combination of two or more of them?

Disclaimer:

This should probably be done with Unix command line utilities, and most
likely by way of comm, as others explain here.  On the other hand, the
others explaining that have done an admirable job of giving you some
pretty comprehensive advice on that front before I got here, so I'll give
you an alternative approach that is probably *not* how you should do it.

Alternative Approach:

You could always use a programming language reasonably well-suited to
admin scripting.  The following is a one-liner in Ruby.

ruby -e 'foo = File.open(foo.txt).readlines.map {|l| l.chomp}; \
bar = File.open(bar.txt).readlines.map {|l| l.chomp }; \
foo.each {|num| puts num if bar.include? num }'

Okay, so I'm kinda stretching the definition of one-liner if I'm
using semicolons and escaping newlines.  If you really want to cram it
all into one line of code, you could do something like replace the
semicolons (and newline escapes) with the and keyword in each case.

http://pastebin.com/nPR42760

-- 
Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]


pgpHck3jffPmG.pgp
Description: PGP signature


Re: Comparing two lists

2011-05-07 Thread Chip Camden
Quoth Chad Perrin on Saturday, 07 May 2011:
 On Sat, May 07, 2011 at 02:09:26AM +0200, Rolf Nielsen wrote:
  
  I have two text files, quite extensive ones. They have some lines in 
  common and some lines are unique to one of the files. The lines that do 
  exist in both files are not necessarily in the same location. Now I need 
  to compare the files and output a list of lines that exist in both 
  files. Is there a simple way to do this? diff? awk? sed? cmp? Or a 
  combination of two or more of them?
 
 Disclaimer:
 
 This should probably be done with Unix command line utilities, and most
 likely by way of comm, as others explain here.  On the other hand, the
 others explaining that have done an admirable job of giving you some
 pretty comprehensive advice on that front before I got here, so I'll give
 you an alternative approach that is probably *not* how you should do it.
 
 Alternative Approach:
 
 You could always use a programming language reasonably well-suited to
 admin scripting.  The following is a one-liner in Ruby.
 
 ruby -e 'foo = File.open(foo.txt).readlines.map {|l| l.chomp}; \
 bar = File.open(bar.txt).readlines.map {|l| l.chomp }; \
 foo.each {|num| puts num if bar.include? num }'
 
 Okay, so I'm kinda stretching the definition of one-liner if I'm
 using semicolons and escaping newlines.  If you really want to cram it
 all into one line of code, you could do something like replace the
 semicolons (and newline escapes) with the and keyword in each case.
 
 http://pastebin.com/nPR42760
 
 -- 
 Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ]


You could even just output the intersection of the two lists:

 ruby -e 'puts File.open(foo.txt).readlines.map {|l| l.chomp}  \
 File.open(bar.txt).readlines.map {|l| l.chomp }'

And to comply with DRY:

 ruby -e 'def fl(f) File.open(f).readlines.map {|l| l.chomp}; end; \
 puts fl(foo.txt)  fl(bar.txt)'

-- 
.O. | Sterling (Chip) Camden  | http://camdensoftware.com
..O | sterl...@camdensoftware.com | http://chipsquips.com
OOO | 2048R/D6DBAF91  | http://chipstips.com


pgpMqeRRzE65f.pgp
Description: PGP signature


Comparing two lists

2011-05-06 Thread Rolf Nielsen

Hello all,

I have two text files, quite extensive ones. They have some lines in 
common and some lines are unique to one of the files. The lines that do 
exist in both files are not necessarily in the same location. Now I need 
to compare the files and output a list of lines that exist in both 
files. Is there a simple way to do this? diff? awk? sed? cmp? Or a 
combination of two or more of them?


TIA,

Rolf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists

2011-05-06 Thread Polytropon
On Sat, 07 May 2011 02:09:26 +0200, Rolf Nielsen listrea...@lazlarlyricon.com 
wrote:
 Hello all,
 
 I have two text files, quite extensive ones. They have some lines in 
 common and some lines are unique to one of the files. The lines that do 
 exist in both files are not necessarily in the same location. Now I need 
 to compare the files and output a list of lines that exist in both 
 files. Is there a simple way to do this? diff? awk? sed? cmp? Or a 
 combination of two or more of them?

I would suggest using a combination of sort, uniq and diff.
Those are base system tools.


-- 
Polytropon
Magdeburg, Germany
Happy FreeBSD user since 4.0
Andra moi ennepe, Mousa, ...
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists

2011-05-06 Thread Rolf Nielsen

2011-05-07 02:33, Polytropon skrev:

On Sat, 07 May 2011 02:09:26 +0200, Rolf Nielsenlistrea...@lazlarlyricon.com  
wrote:

Hello all,

I have two text files, quite extensive ones. They have some lines in
common and some lines are unique to one of the files. The lines that do
exist in both files are not necessarily in the same location. Now I need
to compare the files and output a list of lines that exist in both
files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
combination of two or more of them?


I would suggest using a combination of sort, uniq and diff.
Those are base system tools.




Ah. I didn't know about uniq. That sure helped. :)
Thanks.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists

2011-05-06 Thread Robert Bonomi
 From owner-freebsd-questi...@freebsd.org  Fri May  6 19:27:54 2011
 Date: Sat, 07 May 2011 02:09:26 +0200
 From: Rolf Nielsen listrea...@lazlarlyricon.com
 To: FreeBSD freebsd-questions@freebsd.org
 Subject: Comparing two lists

 Hello all,

 I have two text files, quite extensive ones. They have some lines in 
 common and some lines are unique to one of the files. The lines that do 
 exist in both files are not necessarily in the same location. Now I need 
 to compare the files and output a list of lines that exist in both 
 files. Is there a simple way to do this? diff? awk? sed? cmp? Or a 
 combination of two or more of them?


If the files have only 'minor' differences -- i.e. no long runs of lines
that are in only one fie -- *and* the common lines are  in the same order
in each file, you can use diff(1), without any other shennigans.

If the above is -not- true, and If you need _only_ the common lines, AND 
order is not important, then sort(1) both files, and use diff(1) on the 
two sorted versions.


Beyond that it depends on what you mean by 'extensive' ones.  megabytes?
Gigabytes? or what??


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists

2011-05-06 Thread Rolf Nielsen

2011-05-07 02:54, Robert Bonomi skrev:

 From owner-freebsd-questi...@freebsd.org  Fri May  6 19:27:54 2011
Date: Sat, 07 May 2011 02:09:26 +0200
From: Rolf Nielsenlistrea...@lazlarlyricon.com
To: FreeBSDfreebsd-questions@freebsd.org
Subject: Comparing two lists

Hello all,

I have two text files, quite extensive ones. They have some lines in
common and some lines are unique to one of the files. The lines that do
exist in both files are not necessarily in the same location. Now I need
to compare the files and output a list of lines that exist in both
files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
combination of two or more of them?



If the files have only 'minor' differences -- i.e. no long runs of lines
that are in only one fie -- *and* the common lines are  in the same order
in each file, you can use diff(1), without any other shennigans.

If the above is -not- true, and If you need _only_ the common lines, AND
order is not important, then sort(1) both files, and use diff(1) on the
two sorted versions.


Beyond that it depends on what you mean by 'extensive' ones.  megabytes?
Gigabytes? or what??





Some 10,000 to 20,000 lines each. I do need only the common lines. Order 
is not essential, but would make life easier. I've tried a little with 
uniq, as suggested by Polyptron, but I guess 3am is not quite the right 
time to do these things. Anyway, thanks.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists [SOLVED (at least it looks like that)]

2011-05-06 Thread Rolf Nielsen

2011-05-07 02:09, Rolf Nielsen skrev:

Hello all,

I have two text files, quite extensive ones. They have some lines in
common and some lines are unique to one of the files. The lines that do
exist in both files are not necessarily in the same location. Now I need
to compare the files and output a list of lines that exist in both
files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
combination of two or more of them?

TIA,

Rolf


sort file1 file2 | uniq -d
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists

2011-05-06 Thread Eitan Adler

 They have some lines in common
 and some lines are unique to one of the files.

Use comm whenever you are dealing with set operations (in your case
the intersection operation):
http://www.catonmat.net/blog/set-operations-in-unix-shell


-- 
Eitan Adler
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists

2011-05-06 Thread John Levine
Some 10,000 to 20,000 lines each. I do need only the common lines. Order 
is not essential, but would make life easier. I've tried a little with 
uniq, as suggested by Polyptron, but I guess 3am is not quite the right 
time to do these things. Anyway, thanks.

 sort -u file1  sorted-file1
 sort -u file2  sorted-file2
 comm -12 sorted-file1 sorted-file2  result

R's,
John

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists [SOLVED (at least it looks like that)]

2011-05-06 Thread Yuri Pankov
On Sat, May 07, 2011 at 04:23:40AM +0200, Rolf Nielsen wrote:
 2011-05-07 02:09, Rolf Nielsen skrev:
  Hello all,
 
  I have two text files, quite extensive ones. They have some lines in
  common and some lines are unique to one of the files. The lines that do
  exist in both files are not necessarily in the same location. Now I need
  to compare the files and output a list of lines that exist in both
  files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
  combination of two or more of them?
 
  TIA,
 
  Rolf
 
 sort file1 file2 | uniq -d

I very seriously doubt that this line does what you want...

$ printf a\na\na\nb\n  file1; printf c\nc\nb\n  file2; sort file1 file2 | 
uniq -d
a
b
c


Try this instead (probably bloated):

sort  file1 | uniq | tr -s '\n' '\0' | xargs -0 -I % grep -Fx % file2 | sort | 
uniq

There is comm(1), of course, but it expects files to be already sorted.


HTH,
Yuri
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org


Re: Comparing two lists [SOLVED (at least it looks like that)]

2011-05-06 Thread b. f.
 2011-05-07 02:09, Rolf Nielsen skrev:
  Hello all,
 
  I have two text files, quite extensive ones. They have some lines in
  common and some lines are unique to one of the files. The lines that do
  exist in both files are not necessarily in the same location. Now I need
  to compare the files and output a list of lines that exist in both
  files. Is there a simple way to do this? diff? awk? sed? cmp? Or a
  combination of two or more of them?
...
 sort file1 file2 | uniq -d

If the lines aren't repeated in only one file...

For future reference, comm(1) exists to handle problems like this,
although (of course) TIMTOWTDI.

b.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to freebsd-questions-unsubscr...@freebsd.org