I tried the tessdata path during wrapper writting, but it works for some of the files, not all. The best solution I see is to write a "tesseract Windows service pipeline". - A pipeline to allow only one OCR at a time - A service to be able to start a new process (for memory leak cleanup at exit)
App (here the wep app) create an XML file with all the parameters (bitmap path, lang, zone, etc...) App send a process signal to wake up the service The service look at XML files and process them one at a time. It save result in an XML file in another directory Your app periodicaly look if the XML result file is available Remi On Apr 29, 5:36 pm, ZioZione <[email protected]> wrote: > Hi Remi, > just a last hint: I found this > discussionhttp://social.msdn.microsoft.com/Forums/en-US/windowsazure/thread/8cf... > , where seems to have been found a possible workaround for the dll > memory lack (I think...). > Obviously, the Init() method differs from standard syntax (it added > the "tessdata" path, that you recently allow to give through > SetRootPath method), but sounds interesting... > Unfortunately, at this moment, I don't have a C++ compiler for trying > myself these modifications, but I will try in next days.... > Best Regards > ZioZione > > On 29 Apr, 11:44, ZioZione <[email protected]> wrote: > > > Hi Remi, > > I tried to do following: > > > [Serializable] > > public class myOCR > > { > > public tessnet2.Tesseract ocr; > > } > > > myOCR myocr = null; > > MemoryStream ms = null; > > > void InitialiseData() > > { > > myocr = new myOCR(); > > myocr.ocr = new tessnet2.Tesseract(); > > ms = new MemoryStream(); > > tessDataPath = Server.MapPath("../tessdata"); > > BinarySerialize(ms, tessDataPath, "ita", myocr); > > } > > > public void BinarySerialize(Stream ms, string dataPath, string > > language, myOCR myocr) > > { > > myocr.ocr.SetRootPath(dataPath, language); > > myocr.ocr.Init(language, false); > > > try > > { > > BinaryFormatter binaryFormatter = new BinaryFormatter > > (); > > binaryFormatter.Serialize(ms, myocr); > > } > > catch (Exception ex) > > { > > throw ex; > > } > > finally > > { > > ms.Close(); > > } > > } > > > but BinarySerialize throws following SerializationException: > > > Type 'tessnet2.Tesseract' in Assembly 'tessnet2, Version=2.0.3.7, > > Culture=neutral, PublicKeyToken=null' is not marked as serializable. > > > How did you think I have to modify my code for bypass this problem? > > Thank you very much! > > Best Regards > > ZioZione > > > On 29 Apr, 11:04, ZioZione <[email protected]> wrote: > > > > Hi Remi, > > > thank you for your reply. > > > I read the interesting article you linked, but, as a real newbie about > > > serialization, I have some difficulties to apply it to tesseract > > > settings. > > > In particular, what stream should I use for serializing OCR? Could be > > > the image itself, or what else? > > > Thank you very much for your support! > > > Best Regards > > > ZioZione > > > > On 28 Apr, 21:45, Remi Thomas <[email protected]> wrote: > > > > > Search for "serilization" in this group. > > > > > Remi > > > > > On Apr 28, 2:58 pm, ZioZione <[email protected]> wrote: > > > > > > Hi Remi, > > > > > I was afraid about such reply... > > > > > But what you intend when you talk about "invoke it from another > > > > > process"? Did you mean: "start a new thread in which you invoke > > > > > tesseract"? Did you think this could work? Did you have some working > > > > > samples? Thank you! > > > > > Best Regards > > > > > ZioZione > > > > > > On 28 Apr, 14:31, Remi Thomas <[email protected]> wrote: > > > > > > > I see, > > > > > > If you have read the forum you now tesseract contains some memory > > > > > > leak. > > > > > > It's impossible to reset it correctly. The only solution I know is > > > > > > to > > > > > > exit the running process and restart it. > > > > > > With IIS it's impossible, the only soultion I see is to invoke it > > > > > > from > > > > > > another process. > > > > > > .NET can't unload a loaded DLL. > > > > > > > Remi > > > > > > > On Apr 28, 1:43 pm, ZioZione <[email protected]> wrote: > > > > > > > > Hi Remi, thank for your quick reply! > > > > > > > Yes, tessDataPath is the same folder both for Release and Debug. > > > > > > > My Web Application is structured as follows > > > > > > > > \bin --> contains tessnet2.dll > > > > > > > \obj --> contains Debug and Release folders > > > > > > > \Web --> contains application pages > > > > > > > \Web\Home\frm_HOM.aspx --> main page > > > > > > > \Web\Home\frm_HOM.aspc.cs --> main page codebehind > > > > > > > \Web\Images --> contains all images to be OCRed > > > > > > > \Web\tessdata --> contains "eng" and "ita" languages tessdata > > > > > > > folders > > > > > > > > I tested also your latest dll, but, unfortunately, the behavior > > > > > > > is the > > > > > > > same. I will go more in details. > > > > > > > Lets call "RunDebug" when you select from VS2005 the menu item > > > > > > > Debug | > > > > > > > Start Debugging, and "RunWODebug" when you select Debug | Start > > > > > > > Without Debugging. After every run, I close .NET Development > > > > > > > Server > > > > > > > (WebDev.WebServer.exe) for reset all sessions. Here are all > > > > > > > results: > > > > > > > > Debug, RunDebug --> OK > > > > > > > Debug, RunWODebug --> error > > > > > > > Release, RunDebug (makes no sense for me, but...) --> OK > > > > > > > Release, RunWODebug --> error > > > > > > > > Also, if I RunNoDebug after a RunDebug without resetting > > > > > > > Development > > > > > > > Server, it works correctly. If I RunDebug without set > > > > > > > breakpoints, it > > > > > > > works correctly. > > > > > > > > Hope this helps for pointing out what could have been happened to > > > > > > > the > > > > > > > dll. > > > > > > > Please let me know if I have to try some other things. Thank you! > > > > > > > Best Regards > > > > > > > ZioZione > > > > > > > > On 28 Apr, 11:56, Remi Thomas <[email protected]> wrote: > > > > > > > > > Hi, > > > > > > > > > Is tessDataPath is the same in release and debug mode? > > > > > > > > I did change a compilation option in tessnet2.dll, please try > > > > > > > > with the > > > > > > > > new > > > > > > > > one.http://www.pixel-technology.com/freeware/tessnet2/bin.zip > > > > > > > > > Remi > > > > > > > > > On Apr 28, 9:09 am, ZioZione <[email protected]> wrote: > > > > > > > > > > Hi everybody, > > > > > > > > > I am writing a C# Web Application that must perform OCR over > > > > > > > > > various > > > > > > > > > TIFF files. I am using the latest tessnet2 release (april 21, > > > > > > > > > 2009). > > > > > > > > > Here is the codebehind of my page: > > > > > > > > > > tessnet2.Tesseract ocr = null; > > > > > > > > > string tessImgPath = ""; > > > > > > > > > string tessDataPath = ""; > > > > > > > > > > protected void Page_Load(object sender, EventArgs e) > > > > > > > > > { > > > > > > > > > InitialiseData(); > > > > > > > > > > tessImgPath = > > > > > > > > > Server.MapPath("../Images/P0028594.tif"); > > > > > > > > > Bitmap image = new Bitmap(tessImgPath); > > > > > > > > > > List<tessnet2.Word> m_words; > > > > > > > > > m_words = ocr.DoOCR(image, Rectangle.Empty); > > > > > > > > > > if (m_words != null) > > > > > > > > > { > > > > > > > > > int lc = > > > > > > > > > tessnet2.Tesseract.LineCount(m_words); > > > > > > > > > for (int i = 0; i < lc; i++) > > > > > > > > > { > > > > > > > > > > > > > > > > > > Response.Write(tessnet2.Tesseract.GetLineText > > > > > > > > > (m_words, i) + "<br>"); > > > > > > > > > } > > > > > > > > > } > > > > > > > > > } > > > > > > > > > > void InitialiseData() > > > > > > > > > { > > > > > > > > > ocr = new tessnet2.Tesseract(); > > > > > > > > > tessDataPath = Server.MapPath("../tessdata"); > > > > > > > > > ocr.SetRootPath(tessDataPath, "ita"); > > > > > > > > > ocr.Init("ita", false); > > > > > > > > > } > > > > > > > > > > I have the following problem: if the application is running > > > > > > > > > in Debug > > > > > > > > > mode, it works correctly, while running in Release mode (or > > > > > > > > > also Start > > > > > > > > > Without Debugging), the row > > > > > > > > > > ocr.Init("ita", false); > > > > > > > > > > causes following error: > > > > > > > > > > Attempted to read or write protected memory. This is often an > > > > > > > > > indication that other memory is corrupt. > > > > > > > > > > Here is the Stack Trace: > > > > > > > > > > [AccessViolationException: Attempted to read or write > > > > > > > > > protected > > > > > > > > > memory. This is often an indication that other memory is > > > > > > > > > corrupt.] > > > > > > > > > fgets(SByte* , Int32 , _iobuf* ) +0 > > > > > > > > > read_word_list(SByte* filename, UInt64* dawg, Int32 > > > > > > > > > max_num_edges, > > > > > > > > > Int32 reserved_edges) +562 > > > > > > > > > init_permute() +543 > > > > > > > > > program_editup(SByte* configfile) +335 > > > > > > > > > tessnet2.Tesseract.Init(String lang, Boolean numericMode) > > > > > > > > > +66 > > > > > > > > > SDG_MODI.Web.Home2.Home.InitialiseData() in C:\Produzione > > > > > > > > > \PROGETTI_NET\SDG_MODI\SDG_MODI\Web\Home2\frm_HOM2.aspx.cs:110 > > > > > > > > > SDG_MODI.Web.Home2.Home.Page_Load(Object sender, EventArgs > > > > > > > > > e) in C: > > > > > > > > > \Produzione\PROGETTI_NET\SDG_MODI\SDG_MODI\Web\Home2\frm_HOM2.aspx.cs: > > > > > > > > > 28 > > > > > > > > > System.Web.Util.CalliHelper.EventArgFunctionCaller(IntPtr > > > > > > > > > fp, > > > > > > > > > Object o, Object t, EventArgs e) +15 > > > > > > > > > > > > > > > > > > System.Web.Util.CalliEventHandlerDelegateProxy.Callback(Object > > > > > > > > > sender, EventArgs e) +33 > > > > > > > > > System.Web.UI.Control.OnLoad(EventArgs e) +99 > > > > > > > > > System.Web.UI.Control.LoadRecursive() +47 > > > > > > > > > System.Web.UI.Page.ProcessRequestMain(Boolean > > > > > > > > > includeStagesBeforeAsyncPoint, Boolean > > > > > > > > > includeStagesAfterAsyncPoint) > > > > > > > > > +1436 > > > > > > > > > > What I have to do for avoid this error? > > > > > > > > > Another question: I need also to show the > > ... > > read more » --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en -~----------~----~----~----~------~----~------~--~---

