Re: Comparing two lists
Quoth Chad Perrin on Saturday, 07 May 2011: > On Sat, May 07, 2011 at 02:09:26AM +0200, Rolf Nielsen wrote: > > > > I have two text files, quite extensive ones. They have some lines in > > common and some lines are unique to one of the files. The lines that do > > exist in both files are not necessarily in the same location. Now I need > > to compare the files and output a list of lines that exist in both > > files. Is there a simple way to do this? diff? awk? sed? cmp? Or a > > combination of two or more of them? > > Disclaimer: > > This should probably be done with Unix command line utilities, and most > likely by way of comm, as others explain here. On the other hand, the > others explaining that have done an admirable job of giving you some > pretty comprehensive advice on that front before I got here, so I'll give > you an alternative approach that is probably *not* how you should do it. > > Alternative Approach: > > You could always use a programming language reasonably well-suited to > admin scripting. The following is a one-liner in Ruby. > > ruby -e 'foo = File.open("foo.txt").readlines.map {|l| l.chomp}; \ > bar = File.open("bar.txt").readlines.map {|l| l.chomp }; \ > foo.each {|num| puts num if bar.include? num }' > > Okay, so I'm kinda stretching the definition of "one-liner" if I'm > using semicolons and escaping newlines. If you really want to cram it > all into one line of code, you could do something like replace the > semicolons (and newline escapes) with the "and" keyword in each case. > > http://pastebin.com/nPR42760 > > -- > Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ] You could even just output the intersection of the two lists: ruby -e 'puts File.open("foo.txt").readlines.map {|l| l.chomp} & \ File.open("bar.txt").readlines.map {|l| l.chomp }' And to comply with DRY: ruby -e 'def fl(f) File.open(f).readlines.map {|l| l.chomp}; end; \ puts fl("foo.txt") & fl("bar.txt")' -- .O. | Sterling (Chip) Camden | http://camdensoftware.com ..O | sterl...@camdensoftware.com | http://chipsquips.com OOO | 2048R/D6DBAF91 | http://chipstips.com pgpMqeRRzE65f.pgp Description: PGP signature
Re: Comparing two lists
On Sat, May 07, 2011 at 02:09:26AM +0200, Rolf Nielsen wrote: > > I have two text files, quite extensive ones. They have some lines in > common and some lines are unique to one of the files. The lines that do > exist in both files are not necessarily in the same location. Now I need > to compare the files and output a list of lines that exist in both > files. Is there a simple way to do this? diff? awk? sed? cmp? Or a > combination of two or more of them? Disclaimer: This should probably be done with Unix command line utilities, and most likely by way of comm, as others explain here. On the other hand, the others explaining that have done an admirable job of giving you some pretty comprehensive advice on that front before I got here, so I'll give you an alternative approach that is probably *not* how you should do it. Alternative Approach: You could always use a programming language reasonably well-suited to admin scripting. The following is a one-liner in Ruby. ruby -e 'foo = File.open("foo.txt").readlines.map {|l| l.chomp}; \ bar = File.open("bar.txt").readlines.map {|l| l.chomp }; \ foo.each {|num| puts num if bar.include? num }' Okay, so I'm kinda stretching the definition of "one-liner" if I'm using semicolons and escaping newlines. If you really want to cram it all into one line of code, you could do something like replace the semicolons (and newline escapes) with the "and" keyword in each case. http://pastebin.com/nPR42760 -- Chad Perrin [ original content licensed OWL: http://owl.apotheon.org ] pgpHck3jffPmG.pgp Description: PGP signature
Re: Comparing two lists
2011-05-07 07:28, Robert Bonomi skrev: From listrea...@lazlarlyricon.com Fri May 6 20:14:09 2011 Date: Sat, 07 May 2011 03:13:39 +0200 From: Rolf Nielsen To: Robert Bonomi CC: freebsd-questions@freebsd.org Subject: Re: Comparing two lists 2011-05-07 02:54, Robert Bonomi skrev: From owner-freebsd-questi...@freebsd.org Fri May 6 19:27:54 2011 Date: Sat, 07 May 2011 02:09:26 +0200 From: Rolf Nielsen To: FreeBSD Subject: Comparing two lists Hello all, I have two text files, quite extensive ones. They have some lines in common and some lines are unique to one of the files. The lines that do exist in both files are not necessarily in the same location. Now I need to compare the files and output a list of lines that exist in both files. Is there a simple way to do this? diff? awk? sed? cmp? Or a combination of two or more of them? If the files have only 'minor' differences -- i.e. no long runs of lines that are in only one fie -- *and* the common lines are in the same order in each file, you can use diff(1), without any other shennigans. If the above is -not- true, and If you need _only_ the common lines, AND order is not important, then sort(1) both files, and use diff(1) on the two sorted versions. Beyond that it depends on what you mean by 'extensive' ones. megabytes? Gigabytes? or what?? Some 10,000 to 20,000 lines each. I do need only the common lines. Order is not essential, but would make life easier. I've tried a little with uniq, as suggested by Polyptron, but I guess 3am is not quite the right time to do these things. Anyway, thanks. Ok, 20k lines is only a medium-size file. There's no problem in fitting the entire file 'in memory'. ('big' files are ones that are larger than available memory. :) By "quite extensive" I was refering to the number of lines rather than the byte size, and 20k lines is, by my standards, quite a lot for a plain text file. :P But that's beside the point. :) Using uniq: sort {{file1}} {{file2}} |uniq -d Yes, I found that solution on http://www.catonmat.net/blog/set-operations-in-unix-shell which is mainly about comm, but also lists other ways of doing things. I also found grep -xF -f file1 file2 there, and I've tested that one too. Both seem to be doing what I want. to maintain order, put the following in a file, call it 'common.awk' NR==FNR { array[$0]=1; next; } { if (array[$0] == 1) print $0; } then use the command: awk -f common.awk {{file1}} {{file2}} This will output common lines, in the order they occur in _file2_. I took the liberty of sending a copy of this to the list although you replied privately. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists [SOLVED (at least it looks like that)]
2011-05-07 05:16, b. f. skrev: 2011-05-07 02:09, Rolf Nielsen skrev: Hello all, I have two text files, quite extensive ones. They have some lines in common and some lines are unique to one of the files. The lines that do exist in both files are not necessarily in the same location. Now I need to compare the files and output a list of lines that exist in both files. Is there a simple way to do this? diff? awk? sed? cmp? Or a combination of two or more of them? ... sort file1 file2 | uniq -d If the lines aren't repeated in only one file... They aren't (see my reply to Yuri Pankov). :) For future reference, comm(1) exists to handle problems like this, although (of course) TIMTOWTDI. b. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists [SOLVED (at least it looks like that)]
2011-05-07 05:11, Yuri Pankov skrev: On Sat, May 07, 2011 at 04:23:40AM +0200, Rolf Nielsen wrote: 2011-05-07 02:09, Rolf Nielsen skrev: Hello all, I have two text files, quite extensive ones. They have some lines in common and some lines are unique to one of the files. The lines that do exist in both files are not necessarily in the same location. Now I need to compare the files and output a list of lines that exist in both files. Is there a simple way to do this? diff? awk? sed? cmp? Or a combination of two or more of them? TIA, Rolf sort file1 file2 | uniq -d I very seriously doubt that this line does what you want... $ printf "a\na\na\nb\n"> file1; printf "c\nc\nb\n"> file2; sort file1 file2 | uniq -d a b c Ok. I do understand the problem. Though the files I have do not have any duplicate lines, so that possibility didn't even cross my mind. Try this instead (probably bloated): sort< file1 | uniq | tr -s '\n' '\0' | xargs -0 -I % grep -Fx % file2 | sort | uniq There is comm(1), of course, but it expects files to be already sorted. The files are sorted, so comm would work. Several people have already suggested comm, though I haven't tried it, as combining sort and uniq does what I want with my specific files. HTH, Yuri ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists
On 07/05/2011 01:09, Rolf Nielsen wrote: > I have two text files, quite extensive ones. They have some lines in > common and some lines are unique to one of the files. The lines that do > exist in both files are not necessarily in the same location. Now I need > to compare the files and output a list of lines that exist in both > files. Is there a simple way to do this? diff? awk? sed? cmp? Or a > combination of two or more of them? comm(1) Which does exactly what you want -- showing lines that belong to one file or another, and lines that belong to both. The limitation is that the files need to be sorted before being compared. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 7 Priory Courtyard Flat 3 PGP: http://www.infracaninophile.co.uk/pgpkey Ramsgate JID: matt...@infracaninophile.co.uk Kent, CT11 9PW signature.asc Description: OpenPGP digital signature
Re: Comparing two lists [SOLVED (at least it looks like that)]
> 2011-05-07 02:09, Rolf Nielsen skrev: > > Hello all, > > > > I have two text files, quite extensive ones. They have some lines in > > common and some lines are unique to one of the files. The lines that do > > exist in both files are not necessarily in the same location. Now I need > > to compare the files and output a list of lines that exist in both > > files. Is there a simple way to do this? diff? awk? sed? cmp? Or a > > combination of two or more of them? ... > sort file1 file2 | uniq -d If the lines aren't repeated in only one file... For future reference, comm(1) exists to handle problems like this, although (of course) TIMTOWTDI. b. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists [SOLVED (at least it looks like that)]
On Sat, May 07, 2011 at 04:23:40AM +0200, Rolf Nielsen wrote: > 2011-05-07 02:09, Rolf Nielsen skrev: > > Hello all, > > > > I have two text files, quite extensive ones. They have some lines in > > common and some lines are unique to one of the files. The lines that do > > exist in both files are not necessarily in the same location. Now I need > > to compare the files and output a list of lines that exist in both > > files. Is there a simple way to do this? diff? awk? sed? cmp? Or a > > combination of two or more of them? > > > > TIA, > > > > Rolf > > sort file1 file2 | uniq -d I very seriously doubt that this line does what you want... $ printf "a\na\na\nb\n" > file1; printf "c\nc\nb\n" > file2; sort file1 file2 | uniq -d a b c Try this instead (probably bloated): sort < file1 | uniq | tr -s '\n' '\0' | xargs -0 -I % grep -Fx % file2 | sort | uniq There is comm(1), of course, but it expects files to be already sorted. HTH, Yuri ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists
>Some 10,000 to 20,000 lines each. I do need only the common lines. Order >is not essential, but would make life easier. I've tried a little with >uniq, as suggested by Polyptron, but I guess 3am is not quite the right >time to do these things. Anyway, thanks. sort -u file1 > sorted-file1 sort -u file2 > sorted-file2 comm -12 sorted-file1 sorted-file2 > result R's, John ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists
> They have some lines in common > and some lines are unique to one of the files. Use comm whenever you are dealing with set operations (in your case the intersection operation): http://www.catonmat.net/blog/set-operations-in-unix-shell -- Eitan Adler ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists [SOLVED (at least it looks like that)]
2011-05-07 02:09, Rolf Nielsen skrev: Hello all, I have two text files, quite extensive ones. They have some lines in common and some lines are unique to one of the files. The lines that do exist in both files are not necessarily in the same location. Now I need to compare the files and output a list of lines that exist in both files. Is there a simple way to do this? diff? awk? sed? cmp? Or a combination of two or more of them? TIA, Rolf sort file1 file2 | uniq -d ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists
2011-05-07 02:54, Robert Bonomi skrev: From owner-freebsd-questi...@freebsd.org Fri May 6 19:27:54 2011 Date: Sat, 07 May 2011 02:09:26 +0200 From: Rolf Nielsen To: FreeBSD Subject: Comparing two lists Hello all, I have two text files, quite extensive ones. They have some lines in common and some lines are unique to one of the files. The lines that do exist in both files are not necessarily in the same location. Now I need to compare the files and output a list of lines that exist in both files. Is there a simple way to do this? diff? awk? sed? cmp? Or a combination of two or more of them? If the files have only 'minor' differences -- i.e. no long runs of lines that are in only one fie -- *and* the common lines are in the same order in each file, you can use diff(1), without any other shennigans. If the above is -not- true, and If you need _only_ the common lines, AND order is not important, then sort(1) both files, and use diff(1) on the two sorted versions. Beyond that it depends on what you mean by 'extensive' ones. megabytes? Gigabytes? or what?? Some 10,000 to 20,000 lines each. I do need only the common lines. Order is not essential, but would make life easier. I've tried a little with uniq, as suggested by Polyptron, but I guess 3am is not quite the right time to do these things. Anyway, thanks. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists
> From owner-freebsd-questi...@freebsd.org Fri May 6 19:27:54 2011 > Date: Sat, 07 May 2011 02:09:26 +0200 > From: Rolf Nielsen > To: FreeBSD > Subject: Comparing two lists > > Hello all, > > I have two text files, quite extensive ones. They have some lines in > common and some lines are unique to one of the files. The lines that do > exist in both files are not necessarily in the same location. Now I need > to compare the files and output a list of lines that exist in both > files. Is there a simple way to do this? diff? awk? sed? cmp? Or a > combination of two or more of them? If the files have only 'minor' differences -- i.e. no long runs of lines that are in only one fie -- *and* the common lines are in the same order in each file, you can use diff(1), without any other shennigans. If the above is -not- true, and If you need _only_ the common lines, AND order is not important, then sort(1) both files, and use diff(1) on the two sorted versions. Beyond that it depends on what you mean by 'extensive' ones. megabytes? Gigabytes? or what?? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists
2011-05-07 02:33, Polytropon skrev: On Sat, 07 May 2011 02:09:26 +0200, Rolf Nielsen wrote: Hello all, I have two text files, quite extensive ones. They have some lines in common and some lines are unique to one of the files. The lines that do exist in both files are not necessarily in the same location. Now I need to compare the files and output a list of lines that exist in both files. Is there a simple way to do this? diff? awk? sed? cmp? Or a combination of two or more of them? I would suggest using a combination of sort, uniq and diff. Those are base system tools. Ah. I didn't know about uniq. That sure helped. :) Thanks. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: Comparing two lists
On Sat, 07 May 2011 02:09:26 +0200, Rolf Nielsen wrote: > Hello all, > > I have two text files, quite extensive ones. They have some lines in > common and some lines are unique to one of the files. The lines that do > exist in both files are not necessarily in the same location. Now I need > to compare the files and output a list of lines that exist in both > files. Is there a simple way to do this? diff? awk? sed? cmp? Or a > combination of two or more of them? I would suggest using a combination of sort, uniq and diff. Those are base system tools. -- Polytropon Magdeburg, Germany Happy FreeBSD user since 4.0 Andra moi ennepe, Mousa, ... ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Comparing two lists
Hello all, I have two text files, quite extensive ones. They have some lines in common and some lines are unique to one of the files. The lines that do exist in both files are not necessarily in the same location. Now I need to compare the files and output a list of lines that exist in both files. Is there a simple way to do this? diff? awk? sed? cmp? Or a combination of two or more of them? TIA, Rolf ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"