Hello again,
As I ran in trouble with parsing again and again because there are so
many strange file types around our university network, I am looking for
an easy way for only parsing html / text and may be pdf (but this takes
very long)
Can anybody tell me were and how I could configure it that the parser
works that way?
Thank you!
BTW: Is there a possibility to stop unwanted content during fetching? As
I see it, the only way is blocking file names in the
regex-urlfilter.txt, am I right?
- Parsing only common file types Marek Bachmann
-