Just a hint: there is a fork that tries to output HOCR details in a TSV
format file <https://code.google.com/r/email-hocr-tsv/>[1].
I did not test it :-), so I have not clue if it fits to the original
request...

[1] https://code.google.com/r/email-hocr-tsv/source/list

Zdenko

On Tue, Oct 14, 2014 at 5:14 PM, Sven Pedersen <sven.peder...@gmail.com>
wrote:

> Hi Maureen,
> I generally use PSM 4 or 3. Tesseract cannot actually product a CSV (or
> any other delimited file) but you can get a clean text file and make a CSV
> from that with a little editing. To actually created a CSV in an automated
> fashion you'd have to write custom code and use the API.
>
>
> http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data
> --Sven
>
> On Mon, Oct 13, 2014 at 7:33 PM, Maureen Kole <maureenk...@gmail.com>
> wrote:
>
>> Sven,
>>
>> I apologize for my delayed response. I just saw your post. Thank you for
>> your response. As I said in my post to Andrew, I am still working on this
>> issue.
>>
>> I investigated the PSM mode prior to posting my question here on the
>> forum and found this website to be useful for describing the PSM options.
>> https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html
>>
>> Have you used any of these options to produce delimited output csv or
>> other?
>>
>> Cheers,
>> Maureen
>>
>>
>> On Wednesday, October 8, 2014 6:16:02 AM UTC-6, sventech wrote:
>>>
>>> You should look at the different tesseract page segmentation (PSM)
>>> modes. The data you have is in a table and you'll need to process it
>>> differently. hOCR format is HTML, so it will not work as CSV format, though
>>> it does supply accuracy info, so if you want to evaluate that and product
>>> CSV you can.
>>> --Sven
>>>
>>  --
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to tesseract-ocr+unsubscr...@googlegroups.com.
>> To post to this group, send email to tesseract-ocr@googlegroups.com.
>> Visit this group at http://groups.google.com/group/tesseract-ocr.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/tesseract-ocr/7c7b9fa5-470a-4db0-934f-88f1609c8b93%40googlegroups.com
>> <https://groups.google.com/d/msgid/tesseract-ocr/7c7b9fa5-470a-4db0-934f-88f1609c8b93%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> ``All that is gold does not glitter,
>   not all those who wander are lost;
> the old that is strong does not wither,
>   deep roots are not reached by the frost.
> From the ashes a fire shall be woken,
>   a light from the shadows shall spring;
> renewed shall be blade that was broken,
>   the crownless again shall be king.”
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at http://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/CAFTC0i7OWcojtXX0ci5TLRNse-wR-HNJ%3DHJqUBHqM97rUMsvMA%40mail.gmail.com
> <https://groups.google.com/d/msgid/tesseract-ocr/CAFTC0i7OWcojtXX0ci5TLRNse-wR-HNJ%3DHJqUBHqM97rUMsvMA%40mail.gmail.com?utm_medium=email&utm_source=footer>
> .
>
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at http://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8wV9ct4Q1r%2BUshyABK9Sm4tSxoYVpi9F-5d3xOEqQPWkw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to