[0] says that Tika 1.2 can only parse flash videos and no other video file formats.
[0] : http://tika.apache.org/1.2/formats.html#Video_formats On Mon, Mar 4, 2013 at 1:29 PM, <[email protected]> wrote: > Hi, > > I am using Nutch 1.5.1 and I am trying to crawl and parse video/mp4, > video/x-ms-wmv. I do not see any mp4 files being fetched or parsed and I > am getting following error for a wmv file in the logs: > > Error parsing: http://www.server-abc.com/Darpa_Video_Final.wmv: > failed(2,0): Can't retrieve Tika parser for mime-type video/x-ms-wmv > > Here is my regex-urlfilter.txt configuration file: > > -\(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZIP|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|BMP|js|JS)$ > > Parse-plugins.xml has following: > > <mimeType name="video/x-ms-wmv"> > <plugin id="parse-tika" /> > </mimeType> > > <mimeType name="video/mp4"> > <plugin id="parse-tika" /> > </mimeType> > > Is there anything else I need to check or missing? Does the http.accept > property need to have all the mime types that can be accepted? I am going > to try and add it next after my current crawl finishes. Any help will be > greatly appreciated. > > Thanks, > Madhvi > > >

