Re: perl script question.
On Sat, Jan 10, 2004 at 11:02:18PM +, Matthew Seaman wrote: On Sat, Jan 10, 2004 at 11:39:07PM +0100, Björn Andersson wrote: On Sat, Jan 10, 2004 at 10:33:08PM +, Matthew Seaman wrote: On Sat, Jan 10, 2004 at 02:10:36PM -0800, Gary Kline wrote: Folks, Let's see if perl can do this one; it's as obscure a task as I've run into. I have scores of files with: A regular sentence, or phrase. then_one_containing_underscores_- between_each_word Followed by another regular, space-delimited sentence. Followed_by_another_string_with_underscaores. Is there a perl way to get rid of the string_containing_underscores and leave the regular sntences?? perl -pi.bak -e 's/\s+\w+_\w+\.?//;' filename If this occures more than once on a line we should have the line as this: perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename Good point. Also, if the stuff_separated_by_underscores wraps around onto more than one line, then there may not be any leading whitespace: perl -pi.bak -e 's/\s*\w+_\w+\.?//g;' filename The lines do indeed wrap so this does the job on a test file. I do have the re-exp book but this one is far ovr my head. What do the \s* mean, and also thr \.?/ ? Man, I'd never have gotten this one; at least not in *one* lines:-) Wow. thanks to everyone, gary -- Gary Kline [EMAIL PROTECTED] www.thought.org Public service Unix ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: perl script question.
Matthew Seaman wrote: On Sat, Jan 10, 2004 at 06:26:30PM -0500, Marty Landman wrote: At 06:02 PM 1/10/2004, Matthew Seaman wrote: On Sat, Jan 10, 2004 at 11:39:07PM +0100, Bj?rn Andersson wrote: If this occures more than once on a line we should have the line as this: perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename Good point. Also, if the stuff_separated_by_underscores wraps around onto more than one line, then there may not be any leading whitespace: I don't see why the translate sol'tn that Gary Kline gave first isn't adequate. Err --- Gary Kline was the OP asking how to do this: I think you mean Bernard El-Hagin's solution? % perl -i.bak -pe 'tr/_/ /' files That doesn't do the right thing. It turns: This is a sample ordinary sentence. This_is_joined_up_with_underscores. into: This is a sample ordinary sentence. This is joined up with underscores. but the requirement is to produce: This is a sample ordinary sentence. Yes, I completely misread the question. Sorry. -- Cheers, Bernard ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: perl script question.
On Sat, Jan 10, 2004 at 05:34:34PM -0800, Gary Kline wrote: On Sat, Jan 10, 2004 at 11:02:18PM +, Matthew Seaman wrote: perl -pi.bak -e 's/\s*\w+_\w+\.?//g;' filename The lines do indeed wrap so this does the job on a test file. I do have the re-exp book but this one is far ovr my head. What do the \s* mean, and also thr \.?/ ? OK. Time to disect a regular expression. Let's just isolate the RE bits from the surrounding stuff: \s*\w+_\w+\.? There are 5 parts to this: 1 \s* 2\w+ 3 _ 4\w+ 5 \.? 1) \s* -- '\s' is a metacharacter for matching whitespace: it's equivalent to saying [ \t\n\r\f]. The '*' operator says any number of these, including zero. 2) \w+ -- '\w' is a metacharacter for matching 'word' characters. What it means is locale dependent, but if you're using the ASCII locale it corresponds to [a-zA-Z_0-9]. The '+' operator means one or more or these. Note that while \w+ matches character sequences containing _, it will also match words that don't: hence 3) _ -- match a literal '_' character. ie. this forces the matched text to contain at least one underscore. 4) \w+ -- as (2) matches the rest of the stuff_separated_by_underscores after the underscore we've forced a match to[1]. 5) \.? -- \. matches a literal '.' It has to be escaped (with a \) because plain '.' on it's own is the used as the wildcard to match any character. The '?' operator means optional, or more precisely, either zero or one of those. Now, the whole command: perl -pi.bak -e 's/${re}//g;' filename scans through the file line_by_line, matching strings_connected_with underscores on each line. Björn Andersson noticed that you would need the 'g' option to the s/// substitution command which means repeat this substitution more than once, if necessary. Like in the first line_of_this_paragraph. Then I realised that there were situations, like the last line of the previous paragraph, where there wouldn't be any leading whitespace to match. Of course, this all depends on the sequences of words_connected_with_ underscores not wrapping around onto more than one line, as in this contrived example, where the word 'underscores' on the second line of this paragraph wouldn't be deleted. There are several other edge cases like that, if word-wrap is permitted. But it was never specified if that was the case or not and I've assumed not because coping with that sort of thing is a bit trickier. Cheers, Matthew [1] In fact, due to the way regular expressions work, the literal underscore (3) will actually match at the last underscore out of all the stuff we're matching, and the stuff matched by chunk (4) won't contain any underscores. -- Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK pgp0.pgp Description: PGP signature
Re: perl script question.
On Sun, Jan 11, 2004 at 11:52:37AM +, Matthew Seaman wrote: On Sat, Jan 10, 2004 at 05:34:34PM -0800, Gary Kline wrote: On Sat, Jan 10, 2004 at 11:02:18PM +, Matthew Seaman wrote: perl -pi.bak -e 's/\s*\w+_\w+\.?//g;' filename The lines do indeed wrap so this does the job on a test file. I do have the re-exp book but this one is far ovr my head. What do the \s* mean, and also thr \.?/ ? Thanks for your tutorial. Time to re-read Jeff Friedl's book. I'd forgotten some of perl's regex rules--specifically, 's' and 'w'; was headsratching what symbolized whitespace. Also did not realize the \w+_ would match one-or-more underscores. To me, this is the genius of the expression. I have a 994 perl script called reflow that does an outstanding job of formatting std ASCII|8859-N text. I filter any essay thru a program, joinlines, and reflow before handing it off to OpenOffice. What reflow doesn't do is to put two spaces between sentences. That's on my to-hack list:) have a good one, gary -- Gary Kline [EMAIL PROTECTED] www.thought.org Public service Unix ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: perl script question.
Gary Kline wrote: Folks, Let's see if perl can do this one; it's as obscure a task as I've run into. I have scores of files with: A regular sentence, or phrase. then_one_containing_underscores_- between_each_word Followed by another regular, space-delimited sentence. Followed_by_another_string_with_underscaores. Is there a perl way to get rid of the string_containing_underscores and leave the regular sntences?? Any thoughts very welcome!! Perhaps this will be enough: % perl -i.bak -pe 'tr/_/ /' files -- Cheers, Bernard ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: perl script question.
On Sat, Jan 10, 2004 at 02:10:36PM -0800, Gary Kline wrote: Folks, Let's see if perl can do this one; it's as obscure a task as I've run into. I have scores of files with: A regular sentence, or phrase. then_one_containing_underscores_- between_each_word Followed by another regular, space-delimited sentence. Followed_by_another_string_with_underscaores. Is there a perl way to get rid of the string_containing_underscores and leave the regular sntences?? perl -pi.bak -e 's/\s+\w+_\w+\.?//;' filename Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK pgp0.pgp Description: PGP signature
Re: perl script question.
If this occures more than once on a line we should have the line as this: perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename Notice the added g. :-) On Sat, Jan 10, 2004 at 10:33:08PM +, Matthew Seaman wrote: On Sat, Jan 10, 2004 at 02:10:36PM -0800, Gary Kline wrote: Folks, Let's see if perl can do this one; it's as obscure a task as I've run into. I have scores of files with: A regular sentence, or phrase. then_one_containing_underscores_- between_each_word Followed by another regular, space-delimited sentence. Followed_by_another_string_with_underscaores. Is there a perl way to get rid of the string_containing_underscores and leave the regular sntences?? perl -pi.bak -e 's/\s+\w+_\w+\.?//;' filename Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: perl script question.
On Sat, Jan 10, 2004 at 11:39:07PM +0100, Björn Andersson wrote: On Sat, Jan 10, 2004 at 10:33:08PM +, Matthew Seaman wrote: On Sat, Jan 10, 2004 at 02:10:36PM -0800, Gary Kline wrote: Folks, Let's see if perl can do this one; it's as obscure a task as I've run into. I have scores of files with: A regular sentence, or phrase. then_one_containing_underscores_- between_each_word Followed by another regular, space-delimited sentence. Followed_by_another_string_with_underscaores. Is there a perl way to get rid of the string_containing_underscores and leave the regular sntences?? perl -pi.bak -e 's/\s+\w+_\w+\.?//;' filename If this occures more than once on a line we should have the line as this: perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename Good point. Also, if the stuff_separated_by_underscores wraps around onto more than one line, then there may not be any leading whitespace: perl -pi.bak -e 's/\s*\w+_\w+\.?//g;' filename cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK pgp0.pgp Description: PGP signature
Re: perl script question.
At 06:02 PM 1/10/2004, Matthew Seaman wrote: On Sat, Jan 10, 2004 at 11:39:07PM +0100, Björn Andersson wrote: If this occures more than once on a line we should have the line as this: perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename Good point. Also, if the stuff_separated_by_underscores wraps around onto more than one line, then there may not be any leading whitespace: I don't see why the translate sol'tn that Gary Kline gave first isn't adequate. Marty Landman Face 2 Interface Inc 845-679-9387 Sign On Required: Web membership software for your site Make a Website: http://face2interface.com/Home/Demo.shtml ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: perl script question.
On Sat, Jan 10, 2004 at 06:26:30PM -0500, Marty Landman wrote: At 06:02 PM 1/10/2004, Matthew Seaman wrote: On Sat, Jan 10, 2004 at 11:39:07PM +0100, Björn Andersson wrote: If this occures more than once on a line we should have the line as this: perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename Good point. Also, if the stuff_separated_by_underscores wraps around onto more than one line, then there may not be any leading whitespace: I don't see why the translate sol'tn that Gary Kline gave first isn't adequate. Err --- Gary Kline was the OP asking how to do this: I think you mean Bernard El-Hagin's solution? % perl -i.bak -pe 'tr/_/ /' files That doesn't do the right thing. It turns: This is a sample ordinary sentence. This_is_joined_up_with_underscores. into: This is a sample ordinary sentence. This is joined up with underscores. but the requirement is to produce: This is a sample ordinary sentence. Cheers, Matthew -- Dr Matthew J Seaman MA, D.Phil. 26 The Paddocks Savill Way PGP: http://www.infracaninophile.co.uk/pgpkey Marlow Tel: +44 1628 476614 Bucks., SL7 1TH UK pgp0.pgp Description: PGP signature
Re: perl script question.
At 06:36 PM 1/10/2004, Matthew Seaman wrote: Err --- Gary Kline was the OP asking how to do this: I think you mean Bernard El-Hagin's solution? % perl -i.bak -pe 'tr/_/ /' files That doesn't do the right thing. Woops, not only can't I read the question right, can't read the poster's name right either. Maybe it really is time to start thinking about reading glasses. Marty Landman Face 2 Interface Inc 845-679-9387 Sign On Required: Web membership software for your site Make a Website: http://face2interface.com/Home/Demo.shtml ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: perl script question.
On Sat, Jan 10, 2004 at 11:36:45PM +, Matthew Seaman wrote: On Sat, Jan 10, 2004 at 06:26:30PM -0500, Marty Landman wrote: At 06:02 PM 1/10/2004, Matthew Seaman wrote: On Sat, Jan 10, 2004 at 11:39:07PM +0100, Björn Andersson wrote: If this occures more than once on a line we should have the line as this: perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename Good point. Also, if the stuff_separated_by_underscores wraps around onto more than one line, then there may not be any leading whitespace: I don't see why the translate sol'tn that Gary Kline gave first isn't adequate. Err --- Gary Kline was the OP asking how to do this: I think you mean Bernard El-Hagin's solution? % perl -i.bak -pe 'tr/_/ /' files That doesn't do the right thing. It turns: This is a sample ordinary sentence. This_is_joined_up_with_underscores. into: This is a sample ordinary sentence. This is joined up with underscores. but the requirement is to produce: This is a sample ordinary sentence. Exactly so. I could easily tr '_' to ' ', but not delete //g and entire string that contained undrscores. BTW, this kind of technique would be useful in filtering ^ Subject: lines like get.a.bigger.bustline or other such garbage. --But then the people who hack the antispam programs are do doubt expert at this... . gary -- Gary Kline [EMAIL PROTECTED] www.thought.org Public service Unix ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]