Welcome, Tristan. I don't speak Yiddish, but I'm just a bit east in Boston. I can't speak with certainty about RTL languages or Unicode in general, but I know that there are at least two areas that OpenLibrary search struggles with:
- diacritic folding - normalizing composed vs decomposed forms It wouldn't surprise me to find that it couldn't deal well with non-Roman scripts. As you noticed there are repos on Github (be sure to choose the right one(s) though -- I think InternetArchive == good, OpenLibrary != good). I have a pull request mostly complete for diacritic normalization. I'd be happy to help with other search fixes, but there's really no predicting when or if they'd ever get accepted. The Internet Archive (owner of OpenLibrary) staff is off working on non-OpenLibrary stuff that's higher priority to the people that run Internet Archive. Tom On Mon, Dec 9, 2013 at 2:20 PM, Tristan Chambers <[email protected]>wrote: > Hello ol-tech, > > I just joined the mailing list. My name is Tristan Chambers and I'm a web > developer at the Yiddish Book Center. We have a collection of digitized > books in Internet Archive and we are very excited about Open Library and > what it means for our collections. I hope we can work together to improve > and build this great tool. > > My first question at the moment is: does the OCR text layer search on OL > support non-roman scripts? I tried searching a word from the Yiddish OCR > test on archive.org and nothing came up.[1] I figure it could just be > because it's flagged a certain way because it's a test. However, typing in > a common Hebrew word doesn't seem to bring back anything either.[2] > > If it doesn't work what would it take to get it fixed? We may be able to > devote some time to improving this tool. I see that you have a git > repository! > > Best regards, > > Tristan Chambers > > [1] https://openlibrary.org/search/inside?q=פרײדענקער doesn't bring up > this https://archive.org/details/ocr_test_yiddish > [2] https://openlibrary.org/search/inside?q=ספר > > > -- > > Tristan Chambers > Junior Web Developer > Yiddish Book Center > Amherst, MA - USA413 256-4900 x122 > > > _______________________________________________ > Ol-tech mailing list > [email protected] > http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech > Archives: http://www.mail-archive.com/[email protected]/ > To unsubscribe from this mailing list, send email to > [email protected] >
_______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech Archives: http://www.mail-archive.com/[email protected]/ To unsubscribe from this mailing list, send email to [email protected]
