Re: training ocropus

squiggly Sun, 16 Nov 2008 06:08:54 -0800

ok, thanks,
I've gone a little further with the docs and by looking at data-
lines.tar.gz structure.
managed to get some .cseg.png files and trained the bpnet.
Line extraction went well : I have 113 png files from the source and
they're perfectly fitted. But I only managed to build 11 cseg.png from
them.
All other lines generated too high costs. Don't know if language is
for anything at this stage (my source is a french article).
I've tried the web interface from IUPR website and the results look
better, at first sight.


for your info, here is the use I'd be glad to make with ocropus :
I work with searchable pdfs in which ocr cache is not fitted with the
image due to user reshaping. I convert the pdf pages into images re-
ocrized them (with ocropus) and get a new separated ocr cache (I mean
the hOCR file). From that I plan to guess real positions to apply
annotations on,( in the original pdf file,) when extracting the
position from the initial pdf-cache gives wrong results.

I can provide the dataset.

Regards



On 16 nov, 13:48, "Thomas Breuel" <[EMAIL PROTECTED]> wrote:
> There's some documentation here:
>
> http://sites.google.com/site/ocropus/documentation/text-line-recognit...
>
> There will be new recognizers and documentation probably in about a month.
>
> Tom
>
> On Sun, Nov 16, 2008 at 12:58, squiggly <[EMAIL PROTECTED]> wrote:
>
> > Hi,
> > I managed to install and perform some tests on Fedora 9 with press
> > articles.
> > I followed the instructions from the wiki to build a training data set
> > to get better results.
> > ocropus extracted about one hundred png corresponding to lines and
> > I've populated a transcriptions text file as described :
> > transcriptions lines look like this
> > p0001_l0105.png Mais il se débrouille toujours pour
>
> > But I can't figure out how to feed ocropus with the data ! Do I have
> > to make a script using transcription file as input to generate all the
> > txt files according to png or can I launch an ocropus script to digest
> > all this ?
> > How to tell ocropus to learn ?
> > Thanks
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"ocropus" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/ocropus?hl=en
-~----------~----~----~----~------~----~------~--~---

Re: training ocropus

Reply via email to