I am writing a program for my final project and part of it extracts 
quantity item name and price from a restaurant receipt using tesseract. I 
am using ionic with angular and a rails api to pass the image from a phone 
to the rails api where it converts the image and passes back the extracted 
information via a server to be displayed via angular and ionic again. The 
issue im having is that when testing with restaurant receipts found online,

Receipt image i was using 
<http://mirror-us-ga1.gallery.hd.org/_exhibits/money/_more2007/_more03/receipt-from-Spanish-tapas-restaurant-La-Tasca-20070303-in-Kingston-London-England-mono-1-DHD.gif>

and cropping the image to contain just the items and total it worked fine. 
But when printing out this receipt image and taking a photo of it from my 
phone then cropping and passing it to the following methods the results are 
basically inconclusive and useless.

Here is the image processing code:


module Converter


  def tesseract
    system("convert #{Bill.last.image.url}  -scale 50% receipt.jpg")
    system("convert receipt.jpg -type Grayscale receipt.jpg")
    system("tesseract receipt.jpg output")
    find_total
    create_items
    system("rm output.txt")
    system("rm receipt.jpg")
  end

  private

  def find_total
   a = File.readlines('./output.txt').grep(/TOTAL/)
   b = a.map {|x| x[/\d+(?:[.,]\d+)?/].to_f}[0]
   Bill.last.update(total:"#{b}")
  end

  def create_items
   File.open './output.txt', 'r' do |file|
     file.each_line do |line|
       if search_for_words(line).length != 0
         Item.create(
         name: search_for_words(line),
         price: search_for_float(line),
         quantity: search_for_integer(line),
         bill_id: Bill.last.id
         )
       end
     end
   end
  end

  def search_for_float(line)
    line.gsub!(',','.')
    line.scan(/(\d+[,.]\d+)/).flatten[0].to_f
  end

  def search_for_integer(line)
    line.gsub!(',','.')
    line.scan(/(\d+)/).flatten[0].to_i
  end

  def search_for_words(line)
    line.split(" ").select{|word|word.match(/([a-z])/)}.join(" ")
  end
end

I had version and compatability troubles when using the tesseract gem so 
resorted to using it via the command line instead. Any insights on whether is 
should be resizing etc the image and so on would be great.

Thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/0bb2b46d-74fc-43e0-822e-3d7c05df932c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to