Hello again,

As I ran in trouble with parsing again and again because there are so many strange file types around our university network, I am looking for an easy way for only parsing html / text and may be pdf (but this takes very long)

Can anybody tell me were and how I could configure it that the parser works that way?

Thank you!

BTW: Is there a possibility to stop unwanted content during fetching? As I see it, the only way is blocking file names in the regex-urlfilter.txt, am I right?

Reply via email to