Re: perl script question.

2004-01-26 Thread Gary Kline
On Sat, Jan 10, 2004 at 11:02:18PM +, Matthew Seaman wrote:
 On Sat, Jan 10, 2004 at 11:39:07PM +0100, Björn Andersson wrote:
  On Sat, Jan 10, 2004 at 10:33:08PM +, Matthew Seaman wrote:
   On Sat, Jan 10, 2004 at 02:10:36PM -0800, Gary Kline wrote:
 
Folks,
 
Let's see if perl can do this one; it's as obscure a task
as I've run into.  I have scores of files with:
 
A regular sentence, or phrase. then_one_containing_underscores_-
between_each_word  Followed by another regular, space-delimited
sentence.  Followed_by_another_string_with_underscaores.
 
Is there a perl way to get rid of the
string_containing_underscores and leave the regular sntences??
 
   perl -pi.bak -e 's/\s+\w+_\w+\.?//;' filename
 
  If this occures more than once on a line we should have the line as this:
perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename
 
 Good point.  Also, if the stuff_separated_by_underscores wraps around
 onto more than one line, then there may not be any leading whitespace:
 
 perl -pi.bak -e 's/\s*\w+_\w+\.?//g;' filename
 

The lines do indeed wrap so this does the job on a test file.
I do have the re-exp book but this one is far ovr my head.
What do the \s* mean, and also thr \.?/ ?

Man, I'd never have gotten this one; at least not in *one*
lines:-)  Wow.  

thanks to everyone,

gary


-- 
   Gary Kline [EMAIL PROTECTED]   www.thought.org Public service Unix

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: perl script question.

2004-01-11 Thread Bernard El-Hagin
Matthew Seaman wrote:
 On Sat, Jan 10, 2004 at 06:26:30PM -0500, Marty Landman wrote:
  At 06:02 PM 1/10/2004, Matthew Seaman wrote:
  On Sat, Jan 10, 2004 at 11:39:07PM +0100, Bj?rn Andersson wrote:
  
   If this occures more than once on a line we should have the line as this:
 perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename
  
  Good point.  Also, if the stuff_separated_by_underscores wraps around
  onto more than one line, then there may not be any leading whitespace:
  
  I don't see why the translate sol'tn that Gary Kline gave first isn't 
  adequate.
 
 Err --- Gary Kline was the OP asking how to do this: I think you mean 
 Bernard El-Hagin's solution?
 
 % perl -i.bak -pe 'tr/_/ /' files
 
 That doesn't do the right thing.  It turns:
 
 This is a sample ordinary sentence.  This_is_joined_up_with_underscores.
 
 into:
 
 This is a sample ordinary sentence.  This is joined up with underscores.
 
 but the requirement is to produce:
 
 This is a sample ordinary sentence.


Yes, I completely misread the question. Sorry.

-- 
Cheers,
Bernard
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: perl script question.

2004-01-11 Thread Matthew Seaman
On Sat, Jan 10, 2004 at 05:34:34PM -0800, Gary Kline wrote:
 On Sat, Jan 10, 2004 at 11:02:18PM +, Matthew Seaman wrote:

  perl -pi.bak -e 's/\s*\w+_\w+\.?//g;' filename

   The lines do indeed wrap so this does the job on a test file.
   I do have the re-exp book but this one is far ovr my head.
   What do the \s* mean, and also thr \.?/ ?

OK.  Time to disect a regular expression.  Let's just isolate the RE
bits from the surrounding stuff:

\s*\w+_\w+\.?

There are 5 parts to this:

   1 \s*
   2\w+
   3   _
   4\w+
   5   \.?

1) \s* -- '\s' is a metacharacter for matching whitespace: it's equivalent
   to saying [ \t\n\r\f].  The '*' operator says any number of these,
   including zero.

2) \w+ -- '\w' is a metacharacter for matching 'word' characters.
   What it means is locale dependent, but if you're using the ASCII
   locale it corresponds to [a-zA-Z_0-9].  The '+' operator means one
   or more or these.  Note that while \w+ matches character sequences
   containing _, it will also match words that don't: hence

3) _ -- match a literal '_' character.  ie. this forces the matched
   text to contain at least one underscore.

4) \w+ -- as (2) matches the rest of the stuff_separated_by_underscores
   after the underscore we've forced a match to[1].

5) \.? -- \. matches a literal '.' It has to be escaped (with a \)
   because plain '.' on it's own is the used as the wildcard to match
   any character.  The '?' operator means optional, or more precisely,
   either zero or one of those.

Now, the whole command:

   perl -pi.bak -e 's/${re}//g;' filename

scans through the file line_by_line, matching strings_connected_with
underscores on each line.  Björn Andersson noticed that you would need
the 'g' option to the s/// substitution command which means repeat
this substitution more than once, if necessary.  Like in the first
line_of_this_paragraph.

Then I realised that there were situations, like the last line of the
previous paragraph, where there wouldn't be any leading whitespace to
match.

Of course, this all depends on the sequences of words_connected_with_
underscores not wrapping around onto more than one line, as in this
contrived example, where the word 'underscores' on the second line of
this paragraph wouldn't be deleted.  There are several other edge
cases like that, if word-wrap is permitted. But it was never specified
if that was the case or not and I've assumed not because coping with
that sort of thing is a bit trickier.

Cheers,

Matthew

[1] In fact, due to the way regular expressions work, the literal
underscore (3) will actually match at the last underscore out of all
the stuff we're matching, and the stuff matched by chunk (4) won't
contain any underscores.

-- 
Dr Matthew J Seaman MA, D.Phil.   26 The Paddocks
  Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey Marlow
Tel: +44 1628 476614  Bucks., SL7 1TH UK


pgp0.pgp
Description: PGP signature


Re: perl script question.

2004-01-11 Thread Gary Kline
On Sun, Jan 11, 2004 at 11:52:37AM +, Matthew Seaman wrote:
 On Sat, Jan 10, 2004 at 05:34:34PM -0800, Gary Kline wrote:
  On Sat, Jan 10, 2004 at 11:02:18PM +, Matthew Seaman wrote:
 
   perl -pi.bak -e 's/\s*\w+_\w+\.?//g;' filename
 
  The lines do indeed wrap so this does the job on a test file.
  I do have the re-exp book but this one is far ovr my head.
  What do the \s* mean, and also thr \.?/ ?
 

Thanks for your tutorial.  Time to re-read Jeff Friedl's
book.  I'd forgotten some of perl's regex rules--specifically,
's' and 'w'; was headsratching what symbolized whitespace.
Also did not realize the \w+_ would match one-or-more
underscores.  To me, this is the genius of the expression.

I have a 994 perl script called reflow that does an 
outstanding job of formatting std ASCII|8859-N text. 
I filter any essay thru a program, joinlines, and reflow
before handing it off to OpenOffice.  What reflow doesn't
do is to put two spaces between sentences.  That's on 
my to-hack list:)

have a good one,

gary


-- 
   Gary Kline [EMAIL PROTECTED]   www.thought.org Public service Unix

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: perl script question.

2004-01-10 Thread Bernard El-Hagin
Gary Kline wrote:
 
   Folks,
 
   Let's see if perl can do this one; it's as obscure a task
   as I've run into.  I have scores of files with:
 
   A regular sentence, or phrase. then_one_containing_underscores_-
   between_each_word  Followed by another regular, space-delimited
   sentence.  Followed_by_another_string_with_underscaores.
 
   Is there a perl way to get rid of the
   string_containing_underscores and leave the regular sntences??
 
   Any thoughts very welcome!!


Perhaps this will be enough:


% perl -i.bak -pe 'tr/_/ /' files


-- 
Cheers,
Bernard
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: perl script question.

2004-01-10 Thread Matthew Seaman
On Sat, Jan 10, 2004 at 02:10:36PM -0800, Gary Kline wrote:
 
   Folks,
 
   Let's see if perl can do this one; it's as obscure a task
   as I've run into.  I have scores of files with:
 
   A regular sentence, or phrase. then_one_containing_underscores_-
   between_each_word  Followed by another regular, space-delimited
   sentence.  Followed_by_another_string_with_underscaores.
 
   Is there a perl way to get rid of the
   string_containing_underscores and leave the regular sntences??
 

perl -pi.bak -e 's/\s+\w+_\w+\.?//;' filename

Cheers,

Matthew 

-- 
Dr Matthew J Seaman MA, D.Phil.   26 The Paddocks
  Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey Marlow
Tel: +44 1628 476614  Bucks., SL7 1TH UK


pgp0.pgp
Description: PGP signature


Re: perl script question.

2004-01-10 Thread Björn Andersson
If this occures more than once on a line we should have the line as this:
  perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename

Notice the added g. :-)

On Sat, Jan 10, 2004 at 10:33:08PM +, Matthew Seaman wrote:
 On Sat, Jan 10, 2004 at 02:10:36PM -0800, Gary Kline wrote:
  
  Folks,
  
  Let's see if perl can do this one; it's as obscure a task
  as I've run into.  I have scores of files with:
  
  A regular sentence, or phrase. then_one_containing_underscores_-
  between_each_word  Followed by another regular, space-delimited
  sentence.  Followed_by_another_string_with_underscaores.
  
  Is there a perl way to get rid of the
  string_containing_underscores and leave the regular sntences??
  
 
 perl -pi.bak -e 's/\s+\w+_\w+\.?//;' filename
 
   Cheers,
 
   Matthew 
 
 -- 
 Dr Matthew J Seaman MA, D.Phil.   26 The Paddocks
   Savill Way
 PGP: http://www.infracaninophile.co.uk/pgpkey Marlow
 Tel: +44 1628 476614  Bucks., SL7 1TH UK


___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: perl script question.

2004-01-10 Thread Matthew Seaman
On Sat, Jan 10, 2004 at 11:39:07PM +0100, Björn Andersson wrote:
 On Sat, Jan 10, 2004 at 10:33:08PM +, Matthew Seaman wrote:
  On Sat, Jan 10, 2004 at 02:10:36PM -0800, Gary Kline wrote:

 Folks,

 Let's see if perl can do this one; it's as obscure a task
 as I've run into.  I have scores of files with:

 A regular sentence, or phrase. then_one_containing_underscores_-
 between_each_word  Followed by another regular, space-delimited
 sentence.  Followed_by_another_string_with_underscaores.

 Is there a perl way to get rid of the
 string_containing_underscores and leave the regular sntences??

  perl -pi.bak -e 's/\s+\w+_\w+\.?//;' filename

 If this occures more than once on a line we should have the line as this:
   perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename

Good point.  Also, if the stuff_separated_by_underscores wraps around
onto more than one line, then there may not be any leading whitespace:

perl -pi.bak -e 's/\s*\w+_\w+\.?//g;' filename

cheers,

Matthew

-- 
Dr Matthew J Seaman MA, D.Phil.   26 The Paddocks
  Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey Marlow
Tel: +44 1628 476614  Bucks., SL7 1TH UK


pgp0.pgp
Description: PGP signature


Re: perl script question.

2004-01-10 Thread Marty Landman
At 06:02 PM 1/10/2004, Matthew Seaman wrote:
On Sat, Jan 10, 2004 at 11:39:07PM +0100, Björn Andersson wrote:

 If this occures more than once on a line we should have the line as this:
   perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename
Good point.  Also, if the stuff_separated_by_underscores wraps around
onto more than one line, then there may not be any leading whitespace:
I don't see why the translate sol'tn that Gary Kline gave first isn't adequate.

Marty Landman   Face 2 Interface Inc 845-679-9387
Sign On Required: Web membership software for your site
Make a Website: http://face2interface.com/Home/Demo.shtml
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: perl script question.

2004-01-10 Thread Matthew Seaman
On Sat, Jan 10, 2004 at 06:26:30PM -0500, Marty Landman wrote:
 At 06:02 PM 1/10/2004, Matthew Seaman wrote:
 On Sat, Jan 10, 2004 at 11:39:07PM +0100, Björn Andersson wrote:
 
  If this occures more than once on a line we should have the line as this:
perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename
 
 Good point.  Also, if the stuff_separated_by_underscores wraps around
 onto more than one line, then there may not be any leading whitespace:
 
 I don't see why the translate sol'tn that Gary Kline gave first isn't 
 adequate.

Err --- Gary Kline was the OP asking how to do this: I think you mean 
Bernard El-Hagin's solution?

% perl -i.bak -pe 'tr/_/ /' files

That doesn't do the right thing.  It turns:

This is a sample ordinary sentence.  This_is_joined_up_with_underscores.

into:

This is a sample ordinary sentence.  This is joined up with underscores.

but the requirement is to produce:

This is a sample ordinary sentence.

Cheers,

Matthew 


-- 
Dr Matthew J Seaman MA, D.Phil.   26 The Paddocks
  Savill Way
PGP: http://www.infracaninophile.co.uk/pgpkey Marlow
Tel: +44 1628 476614  Bucks., SL7 1TH UK


pgp0.pgp
Description: PGP signature


Re: perl script question.

2004-01-10 Thread Marty Landman
At 06:36 PM 1/10/2004, Matthew Seaman wrote:

Err --- Gary Kline was the OP asking how to do this: I think you mean
Bernard El-Hagin's solution?
% perl -i.bak -pe 'tr/_/ /' files

That doesn't do the right thing.
Woops, not only can't I read the question right, can't read the poster's 
name right either. Maybe it really is time to start thinking about reading 
glasses.

Marty Landman   Face 2 Interface Inc 845-679-9387
Sign On Required: Web membership software for your site
Make a Website: http://face2interface.com/Home/Demo.shtml
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: perl script question.

2004-01-10 Thread Gary Kline
On Sat, Jan 10, 2004 at 11:36:45PM +, Matthew Seaman wrote:
 On Sat, Jan 10, 2004 at 06:26:30PM -0500, Marty Landman wrote:
  At 06:02 PM 1/10/2004, Matthew Seaman wrote:
  On Sat, Jan 10, 2004 at 11:39:07PM +0100, Björn Andersson wrote:
  
   If this occures more than once on a line we should have the line as this:
 perl -pi.bak -e 's/\s+\w+_\w+\.?//g;' filename
  
  Good point.  Also, if the stuff_separated_by_underscores wraps around
  onto more than one line, then there may not be any leading whitespace:
  
  I don't see why the translate sol'tn that Gary Kline gave first isn't 
  adequate.
 
 Err --- Gary Kline was the OP asking how to do this: I think you mean 
 Bernard El-Hagin's solution?
 
 % perl -i.bak -pe 'tr/_/ /' files
 
 That doesn't do the right thing.  It turns:
 
 This is a sample ordinary sentence.  This_is_joined_up_with_underscores.
 
 into:
 
 This is a sample ordinary sentence.  This is joined up with underscores.
 
 but the requirement is to produce:
 
 This is a sample ordinary sentence.
 

Exactly so.  I could easily tr '_' to ' ', but not delete //g
and entire string that contained undrscores.  BTW, this 
kind of technique would be useful in filtering 
^ Subject: lines like get.a.bigger.bustline or other such
garbage.  --But then the people who hack the antispam 
programs are do doubt expert at this... .

gary




-- 
   Gary Kline [EMAIL PROTECTED]   www.thought.org Public service Unix

___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]