Hi Maureen, I generally use PSM 4 or 3. Tesseract cannot actually product a CSV (or any other delimited file) but you can get a clean text file and make a CSV from that with a little editing. To actually created a CSV in an automated fashion you'd have to write custom code and use the API.
http://tesseract-ocr.googlecode.com/svn-history/trunk/doc/tesseract.1.html#_config_files_and_augmenting_with_user_data --Sven On Mon, Oct 13, 2014 at 7:33 PM, Maureen Kole <[email protected]> wrote: > Sven, > > I apologize for my delayed response. I just saw your post. Thank you for > your response. As I said in my post to Andrew, I am still working on this > issue. > > I investigated the PSM mode prior to posting my question here on the forum > and found this website to be useful for describing the PSM options. > https://tesseract-ocr.googlecode.com/svn/trunk/doc/tesseract.1.html > > Have you used any of these options to produce delimited output csv or > other? > > Cheers, > Maureen > > > On Wednesday, October 8, 2014 6:16:02 AM UTC-6, sventech wrote: >> >> You should look at the different tesseract page segmentation (PSM) modes. >> The data you have is in a table and you'll need to process it differently. >> hOCR format is HTML, so it will not work as CSV format, though it does >> supply accuracy info, so if you want to evaluate that and product CSV you >> can. >> --Sven >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To post to this group, send email to [email protected]. > Visit this group at http://groups.google.com/group/tesseract-ocr. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/7c7b9fa5-470a-4db0-934f-88f1609c8b93%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/7c7b9fa5-470a-4db0-934f-88f1609c8b93%40googlegroups.com?utm_medium=email&utm_source=footer> > . > > For more options, visit https://groups.google.com/d/optout. > -- ``All that is gold does not glitter, not all those who wander are lost; the old that is strong does not wither, deep roots are not reached by the frost. >From the ashes a fire shall be woken, a light from the shadows shall spring; renewed shall be blade that was broken, the crownless again shall be king.” -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at http://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAFTC0i7OWcojtXX0ci5TLRNse-wR-HNJ%3DHJqUBHqM97rUMsvMA%40mail.gmail.com. For more options, visit https://groups.google.com/d/optout.

