Parsing only common file types

Marek Bachmann Wed, 31 Aug 2011 03:49:29 -0700

Hello again,

As I ran in trouble with parsing again and again because there are somany strange file types around our university network, I am looking foran easy way for only parsing html / text and may be pdf (but this takesvery long)

Can anybody tell me were and how I could configure it that the parserworks that way?


Thank you!

BTW: Is there a possibility to stop unwanted content during fetching? AsI see it, the only way is blocking file names in theregex-urlfilter.txt, am I right?

Parsing only common file types

Reply via email to