David Liontooth wrote:
Carl Karsten wrote:
I have about 1100 .dv files totaling 3tb. I pretty much have no clue
what each file is, other than it was recorded at US PyCon 08. There
are 10 or so people that will help, so it seems the first step is to
identify them all.
This plan is a little too labor intensive: buy 30 1t external drives,
copy data, ship drives to people. mainly because "copy data" means
"plug in drive, start copy, check back in a few hours, repeat 30
times" which means I will be doing this dance for a week or something.
So I am considering this: encode the hell out of them into something
barley recognizable, throw it on my cable modem web server, and let
these people pull them down, view enough to figure out what it is, log
it to a database.
I think things can further be helped by only grabbing the first 5 or
10 min of each file (some files are 30 min, some are 6 hours.)
I might even be able to figure out how to hook up a web page that will
let the person select a file and time, and give them a full res image
file of that frame. most talks start with the talk title projected on
the screen, and being able to read that might be helpful.
For the "encode the hell out of them" part, any suggestions on what
format/options I should try?
How about exporting some frames instead of converting the videos? The
first few frames should catch the talk title.
nice -n 19 transcode -q 0 -o $FIL.img -y im -F png -x ffmpeg,null -i
$DIR/$FIL.dv -c \
0:00:00-0:00:00.1,0:00:10-0:00:10.1,0:00:20-0:00:20.1,0:00:30-0:00:30.1,0:00:40-0:00:40.1,0:00:50-0:00:50.1,\
0:01:00-0:01:00.1,0:01:10-0:01:10.1,0:01:20-0:01:20.1,0:01:30-0:01:30.1,0:01:40-0:01:40.1,0:01:50-0:01:50.1,\
0:02:00-0:02:00.1,0:02:10-0:02:10.1,0:02:20-0:02:20.1,0:02:30-0:02:30.1,0:02:40-0:02:40.1,0:02:50-0:02:50.1
2> /dev/null
I ran that on a random file, and I now have 17 images of an empty podium and a
blue projector screen (what you get from no signal.)
Most of the recordings start a min or 2 before the 'session chair' would give a
30-120 second verbal announcement, and then the talk might not actually start
for another 30 seconds. so somewhere in the frist 5 min will be the talk title
you are looking for.
But, now that I think about it, the title is probably displayed for 30 seconds,
so if bump the "every 10 seconds" to 30, i should catch it.
now to write some .py to generate the -c...
Carl K