A Diumenge, 15 d'agost de 2010, [email protected] va escriure: > On Sat, 14 Aug 2010 21:18:56 +0100 > > Albert Astals Cid <[email protected]> wrote: > >A Dissabte, 31 de juliol de 2010, [email protected] va escriure: > >> Sorry for a silence in a while. Checking the source, > >> I found following points. > >> > >> 1) poppler-qt4 page object issue > >> > >> On the other hand, getText() is device specific method, > >> only in TextOutputDev.cc, so changing getText() is > >> easier. > >> > >> 2) TextOutputDev::getText() issue > >> > >> I think, raw-ordered text from MS Office's tricky vertical > >> text can be applicable for text search, but physically- > >> layouted text cannot be applicable for text search. > > > >WoW, that's a huge mail :D > > Sorry, my post was too lengthy to find what is my proposal > to poppler maintainers. > > >So my understanding is that "proper" CJK searching is a lot > >of work and you advocate for just exposing the raw text to > >the upper layers (users of poppler-qt4) so they can do the > >work if they need it? > > Yes. I think exposing the raw text to the upper layers would > be the reasonable starting point for various non-left-to-right > scripts, because it is script-independent. > > # about the insertion of the space (U+0020) between the words, > # still I've not decided what is good.
I don't think this makes sense, if we are being raw, we should be raw, and adding a space that is not there is not being raw. So if you agree on not adding the space i will commit your patch. Albert > > Also I've written a preliminary patch to modify TextPage::findText() > in TextOutputDev to support the device created in rawOrder mode > (if required, I will post here). Now I'm waiting for Cobra's feedback > to see if it works for his purpose. > > Regards, > mpsuzuki > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
