Hi,
Right now the state of the crawldb is set to success for items without a
parser that throw:
Exception in thread main org.apache.nutch.parse.ParseException: parser not
found for contentType=video/x-flv url=
at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:78)
at
It's a good point Markus. I would imagine that we would wish to do one
of two things
1) Create a parser to fetch the contentType in question (not the aim
of Nutch but geared more towards Tika contribution...)
2) As you mention, use a parser implementation which stores this
contentType as
2 matches
Mail list logo