Philip Brown wrote:
Andrzej Bialecki wrote:
Philip Brown wrote:
Is it possible on some pages to crawl only between tags or have it
not crawl between tags.
ie.
<nocrawl>blah blah blah</nocrawl>
<crawlhere>the content only that I want to crawl</crawlhere>
<nocrawl>blah blah blah</nocrawl>
appreciate any input
kind regards
You can modify DOMContentUtils.java (found in parse-html plugin) to
implement this restriction.
Andrzej ,
thanks, i've had a look at DOMContentUtils.java file and it would take
me a while to figure it out. however, I thought about putting in the
cong/regex-normalizer.xml
<regex>
<pattern>(<donotcrawl>)(.^$*)(</donotcrawl>)</pattern>
<substitution></substitution>
</regex>
would I need: &lt; - &lt; in the paterns?
i've tried this to no success at this time. any suggestions.
kind regards,
Phil
ha, after some time trying with the conf/regex-normalizer.xml file... i
see that is for url's
I would appreciate any pointers on DOMContentUtils.java
kind regards,
Phlip Brown