I searched the archives and did not find this precise issue. I have a vob file extracted from a DVD. Call it 0055743.vob if you like.
vlc plays this vob fine and displays the subtitles as they should be. I use this transcode based command to extract the substream: tccat -i 0055743.vob | tcextract -x ps1 -t vob -a 0x20 > 0055743.en_0.subtrack; and subtitle2pgm to break it out into images for OCR subtitle2pgm -o 0055743.en_0 -c 255,0,0,255 < 0055743.en_0.subtrack Then I use various OCR engines etc to get an srt file. The problem is that when I follow this some of the timings and subs come out wrong. Very often a sub will be repeated where there should be two different subs. This often happens where the endpoint of one is the start of another. Here is an example my process gives of this type: 11 00:01:24,180 --> 00:01:26,819 30 barrels of rice for land taxes. 12 00:01:26,819 --> 00:01:29,510 30 barrels of rice for land taxes. When it should give this: 11 00:01:24,180 --> 00:01:26,819 Yoza, it seems you have collected 12 00:01:26,819 --> 00:01:29,510 30 barrels of rice for land taxes. Obviously the pgms extracted by subtitle2pgm are wrong. Sometimes there are larger errors consisting of a sequence of pgms all displaced by one. My question, is this a problem with tcextract or with subtitle2pgm? Where should I look first for a fix? Has anybody else seen this, or related problems. I can host the 4G vob for anybody to download to test their setup on. Also what other simple ways are there to do this process another way. I extract a lot of subs so it has to be command line based and managable. Thanks in advance, Simon.