Hi, Issue still exists in the latest beta version of Tess4J also.
If I convert the image to 300 DPI, whether i can get the coordinates of the text corresponding to original image using hocr on the converted 300 DPI image. If yes, can u guide me how to convert the image to 300 DPI using Java. Note: In the old Tess4J (which doesn't support hocr), able to extract text from this image. Hence you can look into the code to handle any image resolution if possible. Thanks&Regards, Harry John Asir On Apr 25, 6:41 am, Quan Nguyen <[email protected]> wrote: > Your image resolution is too low; it needs be rescaled to a higher one, 300 > DPI, for instance. > > Please try with the latest beta version just uploaded today. Thanks. > > > > On Tuesday, April 24, 2012 1:21:32 AM UTC-5, harry asir wrote: > > > Hi, > > > I have sent the image and log files yo your mail id . > > > Regards, > > Harry John Asir > > > On Apr 24, 7:07 am, Quan Nguyen <[email protected]> wrote: > > > Execution for .exe and .dll+Java follow different paths: one calling > > > ProcessPage with Leptonica Pix image and one calling TesseractRect or > > > GetUTF8Text with raw image. It seems that Pix image get thresholded > > before > > > recognition and thus produces better and more consistent results. > > > > I'm trying to research what other function calls need to be made to > > > produce, for raw images, results similar to that for Pix image. I would > > > rather want to use TessBaseAPI functions to perform these image > > processing > > > than do it with Java, but it seems, for Tesseract 3.02, all of the > > provided > > > image processing functions are geared for Pix type, not raw image. > > > > Meanwhile, can you attach your color image for testing? > > > > On Sunday, April 22, 2012 11:42:19 PM UTC-5, harry asir wrote: > > > > > Hi, > > > > > When i run Tesseract.exe (Not Tess4j) in My PC, it is extracting > > > > characters. That is Tesseract.exe is working fine. > > > > > But, issues mentioned in the above thread is happening only in Tess4J > > > > > I have tried with JRE versions 1.7, 1.6.0_20, 1.6.0.14 and the issue > > > > mentioned in the above thread is happening with all these JRE. Whether > > > > these JRE versions are ok to use or whether I need to use any other > > > > JRE versions. > > > > > Image details: > > > > 1. Coloured image > > > > 2. 320 x 480 pixels, 3.5 inches (~165 ppi pixel density) > > > > > Please guide me to solve these issues. > > > > > Regards, > > > > Harry John Asir > > > > > On Apr 20, 7:30 am, Quan Nguyen <[email protected]> wrote: > > > > > Code? Pix? Does Tesseract itself work with those images? > > > > > > On Thursday, April 19, 2012 1:25:29 AM UTC-5, harry asir wrote: > > > > > > > Hi, > > > > > > > Thanks for your reply. > > > > > > > With Woindows7, any image other than default image present in > > tess4j > > > > > > folder (Zipped one) crash is happening at JRE. Similar crash is > > > > > > happening with any images (including default image present in > > tess4j > > > > > > folder) > > > > > > > Logs: > > > > > > > # > > > > > > # A fatal error has been detected by the Java Run Time > > Environment: > > > > > > # EXCEPTION_ACCESS_VIOLATION (0x0000005) at pc=0x72064673, > > pid=6184, > > > > > > tid=3628 > > > > > > # > > > > > > # JRE version: 6.0_20-b02 > > > > > > # Java VM: Java HotSpot(TM) Client VM (16.3-b01 mixed mode, > > sharing > > > > > > windows-x86) > > > > > > # problematic frame: > > > > > > # C [MSVCR90.dll+0X24673] > > > > > > # > > > > > > # An error report file with more information is saved as: > > > > > > # C:\Users\pcrmd\workspace\Tess4jNew\hs_err_pid6184.log > > > > > > # > > > > > > # If you would like to submit a bug report, please visit: > > > > > > # http://java.sun.com/webapps/bugreport/crash.jsp > > > > > > # The crash happened outside the Java Virtual Machine in native > > coe. > > > > > > # See problematic frame for where to report the bug. > > > > > > # > > > > > > > Please help me how to solve this issue. > > > > > > > Regards, > > > > > > Harry John Asir > > > > > > > On Apr 19, 9:18 am, Quan Nguyen <[email protected]> wrote: > > > > > > > Tess4J is just a simple wrapper around TessBaseAPI. It can do > > only > > > > what > > > > > > > Tesseract can. For color images, you probably will need to apply > > > > > > > thresholding to create binary images suitable for OCR > > operations. > > > > > > > > On Wednesday, April 18, 2012 5:45:53 AM UTC-5, harry asir wrote: > > > > > > > > > OCR is always failing for coloured images. With the test > > images > > > > > > > > present in Tess4J folder (Ziped one), Ocr is working. Can you > > help > > > > me > > > > > > > > in doing ocr for coloured images using Tess4J. > > > > > > > > I am using Windows 7 PC. > > > > > > > > > Regards, > > > > > > > > Harry John Asir > > > > > > > > > On Apr 17, 8:14 am, Quan Nguyen <[email protected]> wrote: > > > > > > > > > A JNA-based wrapper for Tesseract OCR 3.02 DLL, the library > > > > provides > > > > > > > > > optical character recognition (OCR) support for: > > > > > > > > > > * TIFF, JPEG, GIF, PNG, and BMP image formats > > > > > > > > > * Multi-page TIFF images > > > > > > > > > * PDF document format > > > > > > > > > > This version is still in early beta development; as such, it > > has > > > > > > rough > > > > > > > > > edges and not undergone thorough testing. Any > > > > > > > > feedback/comment/suggestion > > > > > > > > > is welcome. > > > > > > > > > >http://tess4j.sf.net > > > > > > > On Wednesday, April 18, 2012 5:45:53 AM UTC-5, harry asir wrote: > > > > > > > > > OCR is always failing for coloured images. With the test > > images > > > > > > > > present in Tess4J folder (Ziped one), Ocr is working. Can you > > help > > > > me > > > > > > > > in doing ocr for coloured images using Tess4J. > > > > > > > > I am using Windows 7 PC. > > > > > > > > > Regards, > > > > > > > > Harry John Asir > > > > > > > > > On Apr 17, 8:14 am, Quan Nguyen <[email protected]> wrote: > > > > > > > > > A JNA-based wrapper for Tesseract OCR 3.02 DLL, the library > > > > provides > > > > > > > > > optical character recognition (OCR) support for: > > > > > > > > > > * TIFF, JPEG, GIF, PNG, and BMP image formats > > > > > > > > > * Multi-page TIFF images > > > > > > > > > * PDF document format > > > > > > > > > > This version is still in early beta development; as such, it > > has > > > > > > rough > > > > > > > > > edges and not undergone thorough testing. Any > > > > > > > > feedback/comment/suggestion > > > > > > > > > is welcome. > > > > > > > > > >http://tess4j.sf.net > > > > > > > On Wednesday, April 18, 2012 5:45:53 AM UTC-5, harry asir wrote: > > > > > > > > > OCR is always failing for coloured images. With the test > > images > > > > > > > > present in Tess4J folder (Ziped one), Ocr is working. Can you > > help > > > > me > > > > > > > > in doing ocr for coloured images using Tess4J. > > > > > > > > I am using Windows 7 PC. > > > > > > > > > Regards, > > > > > > > > Harry John Asir > > > > > > > > > On Apr 17, 8:14 am, Quan Nguyen <[email protected]> wrote: > > > > > > > > > A JNA-based wrapper for Tesseract OCR 3.02 DLL, the library > > > > provides > > > > > > > > > optical character recognition (OCR) support for: > > > > > > > > > > * TIFF, JPEG, GIF, PNG, and BMP image formats > > > > > > > > > * Multi-page TIFF images > > > > > > > > > * PDF document format > > > > > > > > > > This version is still in early beta development; as such, it > > has > > > > > > rough > > > > > > > > > edges and not undergone thorough testing. Any > > > > > > > > feedback/comment/suggestion > > > > > > > > > is welcome. > > > > > > > > > >http://tess4j.sf.net > > > > > > > On Wednesday, April 18, 2012 5:45:53 AM UTC-5, harry asir wrote: > > > > > > > > > OCR is always failing for coloured images. With the test > > images > > > > > > > > present in Tess4J folder (Ziped one), Ocr is working. Can you > > help > > > > me > > > > > > > > in doing ocr for coloured images using Tess4J. > > > > > > > > I am using Windows 7 PC. > > > > > > > > > Regards, > > > > > > > > Harry John Asir > > > > > > > > > On Apr 17, 8:14 am, Quan Nguyen <[email protected]> wrote: > > > > > > > > > A JNA-based wrapper for Tesseract OCR 3.02 DLL, the library > > > > provides > > > > > > > > > optical character recognition (OCR) support for: > > > > > > > > > > * TIFF, JPEG, GIF, PNG, and BMP image formats > > > > > > > > > * Multi-page TIFF images > > > > > > > > > * PDF document format > > > > > > > > > > This version is still in early beta development; as such, it > > has > > > > > > rough > > > > > > > > > edges and not undergone thorough testing. Any > > > > > > > > feedback/comment/suggestion > > > > > > > > > is welcome. > > > > > > > > > >http://tess4j.sf.net-Hidequotedtext - > > > > > > > > - Show quoted text -- Hide quoted text - > > > > > > - Show quoted text -- Hide quoted text - > > > > - Show quoted text -- Hide quoted text - > > - Show quoted text - -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

