I am writing a program for my final project and part of it extracts quantity item name and price from a restaurant receipt using tesseract. I am using ionic with angular and a rails api to pass the image from a phone to the rails api where it converts the image and passes back the extracted information via a server to be displayed via angular and ionic again. The issue im having is that when testing with restaurant receipts found online,
Receipt image i was using <http://mirror-us-ga1.gallery.hd.org/_exhibits/money/_more2007/_more03/receipt-from-Spanish-tapas-restaurant-La-Tasca-20070303-in-Kingston-London-England-mono-1-DHD.gif> and cropping the image to contain just the items and total it worked fine. But when printing out this receipt image and taking a photo of it from my phone then cropping and passing it to the following methods the results are basically inconclusive and useless. Here is the image processing code: module Converter def tesseract system("convert #{Bill.last.image.url} -scale 50% receipt.jpg") system("convert receipt.jpg -type Grayscale receipt.jpg") system("tesseract receipt.jpg output") find_total create_items system("rm output.txt") system("rm receipt.jpg") end private def find_total a = File.readlines('./output.txt').grep(/TOTAL/) b = a.map {|x| x[/\d+(?:[.,]\d+)?/].to_f}[0] Bill.last.update(total:"#{b}") end def create_items File.open './output.txt', 'r' do |file| file.each_line do |line| if search_for_words(line).length != 0 Item.create( name: search_for_words(line), price: search_for_float(line), quantity: search_for_integer(line), bill_id: Bill.last.id ) end end end end def search_for_float(line) line.gsub!(',','.') line.scan(/(\d+[,.]\d+)/).flatten[0].to_f end def search_for_integer(line) line.gsub!(',','.') line.scan(/(\d+)/).flatten[0].to_i end def search_for_words(line) line.split(" ").select{|word|word.match(/([a-z])/)}.join(" ") end end I had version and compatability troubles when using the tesseract gem so resorted to using it via the command line instead. Any insights on whether is should be resizing etc the image and so on would be great. Thanks in advance -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To post to this group, send email to [email protected]. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/0bb2b46d-74fc-43e0-822e-3d7c05df932c%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.

