Thanks for sharing the information, I'll try this, but if I got it right parse-plugins.xml contains rules for the parser and still undesirable documents will be fetched and stored in the segments. Is it possible to stop fetcher from crawling these pages?
> Hello, > i had also a similar problem, my little fix was to > edit the parse-plugins.xml file. There is a the rule: > <mimeType name="*"> > <plugin id="parse-text" /> > </mimeType> > Just uncomment this wilcard match. You might also check > the other rules for further unwanted content. > I don't know if this is the best place for such a change, > but it worked for me. > with best regards, > Heiko Dietze -- Best regards, Eugen mailto:[EMAIL PROTECTED]
