At the risk of this looking like a cheap tactic for getting someone else to do my work for me, I'll put the disclaimer at the top. Understand that what I'm looking for is new perspective on a problem that I've spent too much time dealing with.

PROBLEM:
I want to produce short "snippets" for search result pages. The search results are already partially processed (search terms get wrapped in [strong] tags), but I keep hitting dead-ends with this step. Now, I've tried a wide number of approaches, spent an embarasing amount of time and arrived at no viable solution. Learning regular expressions would probably have been less painfull, but there you have it.


REQUIREMENTS:
- the snippet should be made up of five words on either side of the search terms.
- end product cannot have any unclosed/unmatched tags.


SETUP:
- Search terms are (already) marked up in [strong] tags with a class attribute of term1 - term3.
- No other tags exist in the string (other than aforementioned [strong] tags)


EXAMPLE INPUT:
The journalist/critic Edmund Gosse begins championing <strong class="term3">Ibsen</strong> in English periodicals. His article "<strong class="term3">Ibsen</strong> the Norwegian Satirist" (Fortnightly Review, January 1873) is later expanded in his book Studies in the Literature of Northern Europe (1879). In contrast, the conservative dramatic critic Clement W. Scott begins writing columns for <strong class="term1">London</strong>'s Daily Telegraph; he comes to consider the influence of <strong class="term3">Ibsen</strong> the worst thing that ever happened to English drama. His hysterically negative attack on <strong class="term3">Ibsen's</strong> Ghosts in 1891 will be paraphrased (and reduced to ridicule) by <strong class="term2">Shaw</strong> in the first chapter of The Quintessence of <strong class="term3">Ibsen</strong>ism (1891).


EXAMPLE OUTPUT:
... journalist/critic Edmund Gosse begins championing <strong class="term3">Ibsen</strong> in English periodicals. His article ... Scott begins writing columns for <strong class="term1">London</strong>'s Daily Telegraph; he comes to consider ... (and reduced to ridicule) by <strong class="term2">Shaw</strong> in the first chapter of The Quintessence...


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Reply via email to