Thanks Tejas. I am trying with tags in video files.

On 3/6/13 12:27 PM, "Tejas Patil" <[email protected]> wrote:

>I am not aware of any java library which you can use for parsing wmv.
>Nutch
>currently has parser for swf and mostly delegates parsing to Tika.
>Typically video files are not crawled by search engines. Only their meta
>information is useful.
>
>
>On Wed, Mar 6, 2013 at 7:51 AM, <[email protected]> wrote:
>
>> Thank you so much Tejas. That explains the wmv parsing error. I thought
>> that video/mp4 could run an Adobe Flash but I am not sure. I am
>>inquiring
>> from our company's media expert. Since Tika only parses flash files is
>> there any other plugin available that we can use?
>>
>> On 3/4/13 11:04 PM, "Tejas Patil" <[email protected]> wrote:
>>
>> >[0] says that Tika 1.2 can only parse flash videos and no other video
>>file
>> >formats.
>> >
>> >[0] : http://tika.apache.org/1.2/formats.html#Video_formats
>> >
>> >
>> >On Mon, Mar 4, 2013 at 1:29 PM, <[email protected]> wrote:
>> >
>> >> Hi,
>> >>
>> >> I am using Nutch 1.5.1 and I am trying to crawl and parse video/mp4,
>> >> video/x-ms-wmv. I do not see any mp4 files being fetched or parsed
>>and
>> >>I
>> >> am getting following error for a wmv file in the logs:
>> >>
>> >> Error parsing: http://www.server-abc.com/Darpa_Video_Final.wmv:
>> >> failed(2,0): Can't retrieve Tika parser for mime-type video/x-ms-wmv
>> >>
>> >> Here is my regex-urlfilter.txt configuration file:
>> >>
>> >>
>> 
>>>>-\(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|CSS|sit|SIT|eps|EPS|wmf|WMF|zip|
>>>>ZI
>> 
>>>>P|mpg|MPG|xls|XLS|gz|GZ|rpm|RPM|tgz|TGZ|mov|MOV|exe|EXE|jpeg|JPEG|bmp|B
>>>>MP
>> >>|js|JS)$
>> >>
>> >> Parse-plugins.xml has following:
>> >>
>> >> <mimeType name="video/x-ms-wmv">
>> >>    <plugin id="parse-tika" />
>> >> </mimeType>
>> >>
>> >> <mimeType name="video/mp4">
>> >>    <plugin id="parse-tika" />
>> >> </mimeType>
>> >>
>> >> Is there anything else I need to check or missing? Does the
>>http.accept
>> >> property need to have all the mime types that can be accepted? I am
>> >>going
>> >> to try and add it next after my current crawl finishes.  Any help
>>will
>> >>be
>> >> greatly appreciated.
>> >>
>> >> Thanks,
>> >> Madhvi
>> >>
>> >>
>> >>
>>
>>

Reply via email to