> If you're interested in helping ...
~
Yes, I can and would offer man/mind hours to including movie media
files parsing (and eventually processing) in tika
~
I am definitely more inclined to use ffmpeg (your third option) but I
think we should carefully think about and probably use more than one
option. There already is a Java port of parts of the FFMPEG project
(jffmpeg.sourceforge.net) but as you may know already ;-) its
licensing is messy
~
About your second option all the info is in the containers anyway,
codecs are just encoded data
~
the box I am using right now:
~
$ ffmpeg -version
ffmpeg version 0.7.1-4:0.7.1-5, Copyright (c) 2000-2011 the Libav developers
built on Sep 5 2011 06:18:41 with gcc 4.6.1
ffmpeg 0.7.1-4:0.7.1-5
libavutil 51. 7. 0 / 51. 7. 0
libavcodec 53. 5. 0 / 53. 5. 0
libavformat 53. 2. 0 / 53. 2. 0
libavdevice 53. 0. 0 / 53. 0. 0
libavfilter 2. 4. 0 / 2. 4. 0
libswscale 2. 0. 0 / 2. 0. 0
libpostproc 52. 0. 0 / 52. 0. 0
~
supports (handling of) subtitles for the following formats:
~
$ ffmpeg -codecs | grep VSD
DEVSD ffvhuff Huffyuv FFmpeg variant
DEVSD flv Flash Video (FLV) / Sorenson Spark / Sorenson H.263
DEVSDT h263 H.263 / H.263-1996
D VSD h263i Intel H.263
DEVSD huffyuv Huffyuv / HuffYUV
DEVSDT mpeg1video MPEG-1 video
DEVSDT mpeg2video MPEG-2 video
DEVSDT mpeg4 MPEG-4 part 2
D VSDT mpegvideo MPEG-1 video
D VSDT mpegvideo_xvmc MPEG-1/2 video XvMC (X-Video Motion Compensation)
DEVSD msmpeg4 MPEG-4 part 2 Microsoft variant version 3
D VSD msmpeg4v1 MPEG-4 part 2 Microsoft variant version 1
DEVSD msmpeg4v2 MPEG-4 part 2 Microsoft variant version 2
D VSD svq3 Sorenson Vector Quantizer 3 / Sorenson Video 3 / SVQ3
D VSD theora Theora
D VSD vp3 On2 VP3
DEVSD wmv1 Windows Media Video 7
DEVSD wmv2 Windows Media Video 8
~
Could you guide me/us of a running list of what you think needs to be done?
~
I know there are developers extracting the sequences of images of the
subtitles and using OCR to change them to text ... Any one could see
how useful such a thing could be. Could tika reach out to those deep
waters?
~
The thing is that virtually anything is offered nowadays in some for of media
~
lbrtchx