>  If you're interested in helping ...
~
 Yes, I can and would offer man/mind hours to including movie media
files parsing (and eventually processing) in tika
~
 I am definitely more inclined to use ffmpeg (your third option) but I
think we should carefully think about and probably use more than one
option. There already is a Java port of parts of the FFMPEG project
(jffmpeg.sourceforge.net) but as you may know already ;-) its
licensing is messy
~
 About your second option all the info is in the containers anyway,
codecs are just encoded data
~
 the box I am using right now:
~
$ ffmpeg -version
ffmpeg version 0.7.1-4:0.7.1-5, Copyright (c) 2000-2011 the Libav developers
  built on Sep  5 2011 06:18:41 with gcc 4.6.1
ffmpeg 0.7.1-4:0.7.1-5
libavutil    51.  7. 0 / 51.  7. 0
libavcodec   53.  5. 0 / 53.  5. 0
libavformat  53.  2. 0 / 53.  2. 0
libavdevice  53.  0. 0 / 53.  0. 0
libavfilter   2.  4. 0 /  2.  4. 0
libswscale    2.  0. 0 /  2.  0. 0
libpostproc  52.  0. 0 / 52.  0. 0
~
 supports (handling of) subtitles for the following formats:
~
$ ffmpeg -codecs | grep VSD
 DEVSD  ffvhuff         Huffyuv FFmpeg variant
 DEVSD  flv             Flash Video (FLV) / Sorenson Spark / Sorenson H.263
 DEVSDT h263            H.263 / H.263-1996
 D VSD  h263i           Intel H.263
 DEVSD  huffyuv         Huffyuv / HuffYUV
 DEVSDT mpeg1video      MPEG-1 video
 DEVSDT mpeg2video      MPEG-2 video
 DEVSDT mpeg4           MPEG-4 part 2
 D VSDT mpegvideo       MPEG-1 video
 D VSDT mpegvideo_xvmc  MPEG-1/2 video XvMC (X-Video Motion Compensation)
 DEVSD  msmpeg4         MPEG-4 part 2 Microsoft variant version 3
 D VSD  msmpeg4v1       MPEG-4 part 2 Microsoft variant version 1
 DEVSD  msmpeg4v2       MPEG-4 part 2 Microsoft variant version 2
 D VSD  svq3            Sorenson Vector Quantizer 3 / Sorenson Video 3 / SVQ3
 D VSD  theora          Theora
 D VSD  vp3             On2 VP3
 DEVSD  wmv1            Windows Media Video 7
 DEVSD  wmv2            Windows Media Video 8
~
 Could you guide me/us of a running list of what you think needs to be done?
~
 I know there are developers extracting the sequences of images of the
subtitles and using OCR to change them to text ... Any one could see
how useful such a thing could be. Could tika reach out to those deep
waters?
~
 The thing is that virtually anything is offered nowadays in some for of media
~
 lbrtchx

Reply via email to