Perhaps this is the time for me to ask: does anyone know a way for grep

or awk to extract from a text file any sequence of up to, say, six words
that begins and ends with an initially-capitalised word -- whether or
not it is part of a larger matching sequence?

So if the input text was:

"Sally Lee Jones worked for the United Nations Support Team"

the output would be

Sally Lee
Lee Jones
Sally Lee Jones
Jones worked for the United
Jones worked for the United Nations
United Nations
Nations Support
Support Team
United Nations Support Team

I don't particularly care if it takes one pass or several, and I can
clean up duplicates afterwards.

This is not a serious problem for me  -- it falls into the 'would be
nice to have' category -- but I've been puzzling over it off and on for
a while, and the mention of the word 'greedy' reminded me of it.

Jon.

On 14/07/10 18:06, Nick Andrew wrote:

 (aaa...)&

 Where the stuff inside () is what's being matched. The matched part stops
 at the first&   or the end of the string. It's greedy so it matches as long
 a string as possible.

 Nick.



--
SLUG - Sydney Linux User's Group Mailing List - http://slug.org.au/
Subscription info and FAQs: http://slug.org.au/faq/mailinglists.html

Reply via email to