I am not aware of any java library which you can use for parsing wmv. Nutch currently has parser for swf and mostly delegates parsing to Tika. Typically video files are not crawled by search engines. Only their meta information is useful.
On Wed, Mar 6, 2013 at 7:51 AM, <[email protected]> wrote: > Thank you so much Tejas. That explains the wmv parsing error. I thought > that video/mp4 could run an Adobe Flash but I am not sure. I am inquiring > from our company's media expert. Since Tika only parses flash files is > there any other plugin available that we can use? > > On 3/4/13 11:04 PM, "Tejas Patil" <[email protected]> wrote: > > >[0] says that Tika 1.2 can only parse flash videos and no other video file > >formats. > > > >[0] : http://tika.apache.org/1.2/formats.html#Video_formats > > > > > >On Mon, Mar 4, 2013 at 1:29 PM, <[email protected]> wrote: > > > >> Hi, > >> > >> I am using Nutch 1.5.1 and I am trying to crawl and parse video/mp4, > >> video/x-ms-wmv. I do not see any mp4 files being fetched or parsed and > >>I > >> am getting following error for a wmv file in the logs: > >> > >> Error parsing: http://www.server-abc.com/Darpa_Video_Final.wmv: > >> failed(2,0): Can't retrieve Tika parser for mime-type video/x-ms-wmv > >> > >> Here is my regex-urlfilter.txt configuration file: > >> > >> > >>-\(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|ZI > >>P|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|BMP > >>|js|JS)$ > >> > >> Parse-plugins.xml has following: > >> > >> <mimeType name="video/x-ms-wmv"> > >> <plugin id="parse-tika" /> > >> </mimeType> > >> > >> <mimeType name="video/mp4"> > >> <plugin id="parse-tika" /> > >> </mimeType> > >> > >> Is there anything else I need to check or missing? Does the http.accept > >> property need to have all the mime types that can be accepted? I am > >>going > >> to try and add it next after my current crawl finishes. Any help will > >>be > >> greatly appreciated. > >> > >> Thanks, > >> Madhvi > >> > >> > >> > >

