Mostafa should try to contact Ray directly, seriously. Things may have changed over time
-- Dmitri 2011/5/19 zdenko podobny <[email protected]>: > > 2011/5/19 Mostafa <[email protected]> >> >> Hi Again, >> >> Seems no body knows where it is hiding. >> Should I contact with CIA agent ? lol > > If somebody is really interesting she/he can know answer ;-). Within 1 > minute ;-) ([1] [2] [3]). BTW: there is Developers forum. > >> >> But I am kinda serious about the data. > > There were several requests for training data (in forum, in issues). I did > it too. There was no official reply to such requests. AFAIK Google is > not obliged to release them. So I guess they have a reason for not providing > them. > On other hand this could be opportunity for tesseract community :-): to > create alternative training set. As Ray mentioned ([3]) they use "more > automated training process based on rendering text from fonts", so training > base on "real world" scanned documents could be interesting (but more > difficult) > > Zdenko > > [1] http://code.google.com/p/tesseract-ocr/people/list > [2] http://code.google.com/p/tesseract-ocr/source/list > [3] http://groups.google.com/group/tesseract-dev/msg/1cdf3ebe8743d935 > >> >> Mostafa >> >> On May 18, 2:43 am, Илья <[email protected]> wrote: >> > He need for table that contains all supported alphabetics characters. >> > Also, Parts of scanned books could not be protected by copyright. >> > >> > Can you give any contacts of "jpn.traindata" dev team? >> > >> > -- >> > Best regards, >> > Ilia. >> > >> > В Втр, 17/05/2011 в 18:24 +0200, zdenko podobny пишет: >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > > On Tue, May 17, 2011 at 5:01 PM, Илья <[email protected]> wrote: >> > > IMHO alphabets can't be protected by copyright. >> > >> > > Mostafa did not asked for an alphabets. He asked for 'all the tif >> > > files that used for creating...' and content of tiff file (e.g. >> > > scanned books) could be protected by copyright. >> > >> > > -- >> > > Best regards, >> > > Ilia. >> > >> > > В Втр, 17/05/2011 в 09:24 -0400, Dmitri Silaev пишет: >> > >> > > > I think copyright issues are preventing the dev team from >> > > publishing >> > > > these source files. However you can try to contact this >> > > forum's >> > > > moderator directly - he probably can take decision to share. >> > >> > > > -- >> > > > Dmitri >> > >> > > > On Tue, May 17, 2011 at 4:58 AM, Mostafa >> > > <[email protected]> wrote: >> > > > > Hi, >> > >> > > > > I am interested to get all the tif files that used for >> > > creating the >> > > > >jpn.traindata. >> > > > > I just want to see how many characters are supported in >> > > that file. >> > > > > Because I have some other Japanese characters that can't >> > > be recognized >> > > > > by >> > > > > the tesseract OCR. >> > >> > > > > Does anybody know, where are those tif files ? >> > >> > > > > Thanks >> > >> > > > > -- >> > > > > You received this message because you are subscribed to >> > > the Google >> > > > > Groups "tesseract-ocr" group. >> > > > > To post to this group, send email to >> > > [email protected] >> > > > > To unsubscribe from this group, send email to >> > > > > [email protected] >> > > > > For more options, visit this group at >> > > > >http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > > -- >> > > You received this message because you are subscribed to the >> > > Google >> > > Groups "tesseract-ocr" group. >> > > To post to this group, send email to >> > > [email protected] >> > > To unsubscribe from this group, send email to >> > > [email protected] >> > > For more options, visit this group at >> > > http://groups.google.com/group/tesseract-ocr?hl=en >> > >> > > -- >> > > You received this message because you are subscribed to the Google >> > > Groups "tesseract-ocr" group. >> > > To post to this group, send email to [email protected] >> > > To unsubscribe from this group, send email to >> > > [email protected] >> > > For more options, visit this group at >> > >http://groups.google.com/group/tesseract-ocr?hl=en >> >> -- >> You received this message because you are subscribed to the Google >> Groups "tesseract-ocr" group. >> To post to this group, send email to [email protected] >> To unsubscribe from this group, send email to >> [email protected] >> For more options, visit this group at >> http://groups.google.com/group/tesseract-ocr?hl=en > > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en

