Thanks Tejas. I am trying with tags in video files. On 3/6/13 12:27 PM, "Tejas Patil" <[email protected]> wrote:
>I am not aware of any java library which you can use for parsing wmv. >Nutch >currently has parser for swf and mostly delegates parsing to Tika. >Typically video files are not crawled by search engines. Only their meta >information is useful. > > >On Wed, Mar 6, 2013 at 7:51 AM, <[email protected]> wrote: > >> Thank you so much Tejas. That explains the wmv parsing error. I thought >> that video/mp4 could run an Adobe Flash but I am not sure. I am >>inquiring >> from our company's media expert. Since Tika only parses flash files is >> there any other plugin available that we can use? >> >> On 3/4/13 11:04 PM, "Tejas Patil" <[email protected]> wrote: >> >> >[0] says that Tika 1.2 can only parse flash videos and no other video >>file >> >formats. >> > >> >[0] : http://tika.apache.org/1.2/formats.html#Video_formats >> > >> > >> >On Mon, Mar 4, 2013 at 1:29 PM, <[email protected]> wrote: >> > >> >> Hi, >> >> >> >> I am using Nutch 1.5.1 and I am trying to crawl and parse video/mp4, >> >> video/x-ms-wmv. I do not see any mp4 files being fetched or parsed >>and >> >>I >> >> am getting following error for a wmv file in the logs: >> >> >> >> Error parsing: http://www.server-abc.com/Darpa_Video_Final.wmv: >> >> failed(2,0): Can't retrieve Tika parser for mime-type video/x-ms-wmv >> >> >> >> Here is my regex-urlfilter.txt configuration file: >> >> >> >> >> >>>>-\(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip| >>>>ZI >> >>>>P|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|B >>>>MP >> >>|js|JS)$ >> >> >> >> Parse-plugins.xml has following: >> >> >> >> <mimeType name="video/x-ms-wmv"> >> >> <plugin id="parse-tika" /> >> >> </mimeType> >> >> >> >> <mimeType name="video/mp4"> >> >> <plugin id="parse-tika" /> >> >> </mimeType> >> >> >> >> Is there anything else I need to check or missing? Does the >>http.accept >> >> property need to have all the mime types that can be accepted? I am >> >>going >> >> to try and add it next after my current crawl finishes. Any help >>will >> >>be >> >> greatly appreciated. >> >> >> >> Thanks, >> >> Madhvi >> >> >> >> >> >> >> >>

