A Divendres, 28 d'octubre de 2011, [email protected] vàreu escriure: > Quoting [email protected]: > > Quoting Albert Astals Cid <[email protected]>: > > > A Dijous, 6 d'octubre de 2011, [email protected] và reu escriure: > > > > Quoting Albert Astals Cid <[email protected]>: > > > > > A Dimecres, 5 d'octubre de 2011, [email protected] và reu escriure: > > > > > > Dear all, > > > > > > > > > > Hi > > > > > > > > > > > for some months I had the need for a regex find to dig > > > > > > out into huge pdf docs. Please find a patch attached > > > > > > that implements this feature on top of xpdf-3.03. It > > > > > > support ASCII only, backward and case-sensitive > > > > > > searches (word-only check-box has no effect any more). > > > > > > The xpdf MMI haven't been modified so that you can only > > > > > > perform regex searches with this patch! I saw that > > > > > > xpdf-3.03 is being merge in Poppler. Hope that it could > > > > > > help to make a review :) Let me know if you are > > > > > > interested in this patch so that I can help to merge it > > > > > > in Poppler. > > > > > > > > > > We still have not merged xpdf-3.03 and it will probably > > > > > still take a while, but anyways i am not sure ASCII only is > > > > > a good idea. Why that limitation? > > > > > > > > > > Albert > > > > > > > > Hi, > > > > > > > > In fact, this basic implementation relies on POSIX regex > > > > functions > > > > regcomp, > > > > > > regexec, regerror, regfree. These functions takes char strings > > > > and not Unicode strings in input. Thus, ASCII control chars and > > > > ASCII printable chars can be matched. Supporting > > > > Unicode-compatible regex search is much eavy to implement and > > > > out of my scope for the time being. I would like to support > > > > much more but I forecast a huge effort to gain Unicode.> > > Morerover, > > > > > > ASCII matches 99% of my need in term of search in English > > > > data-sheets :)> > > > > Sure, it might match your needs, but if you contribute it to > > > poppler, > > > > people > > > > > will start demanding that it works with non ASCII characters and you > > > will probably not be here anymore and the burden will be on our > > > side. > > > > > > Albert > > > > > > > I know that this patch has some weaknesses but I think it can be > > > > great to get regex search in some applications such as Evince > > > > of which is gui _ according to me _ smarter than xpdf one. > > > > > > > > Best regards > > > > Jerry > > > > > > > > PS: Sorry for my poor English and my clumsy proposal :) > > > > > > > > > > Best regards > > > > > > Jerry > > > > > > > > > > _______________________________________________ > > > > > poppler mailing list > > > > > [email protected] > > > > > http://lists.freedesktop.org/mailman/listinfo/poppler > > > > > > > > _______________________________________________ > > > > poppler mailing list > > > > [email protected] > > > > http://lists.freedesktop.org/mailman/listinfo/poppler > > > > > > _______________________________________________ > > > poppler mailing list > > > [email protected] > > > http://lists.freedesktop.org/mailman/listinfo/poppler > > > > Hi Albert, > > > > To be more precise, the patch supports also extended ASCII 0x7F-0xFF as > > well as > > control chars 0x01-0x1F and printable chars 0x20-0x7F. This means that > > on my Ubuntu 10.04 I can input and find ASCII and all iso latin 1 chars > > (iso-8859-1) > > such as e acute 'é', a grave 'à' and so on. All other extended ASCII > > sets are supported according to your computer configuration and > > keyboard settings. > > > > I think that it covers not only my needs but also most of EMEA users' > > ones. RegEx search is a well-known old feature for many editors and > > script language. > > This patch brings this powerful feature to xpdf and it can be a totally > > new on > > Poppler. Supporting only 1-byte charset encoding is more a restriction > > for APAC > > users than a bug. > > > > For instance, mind that you are searching a sentence beginning by "The " > > followed by any word and then by " is" you just have to type "The .* is" > > regex > > in find dialog box. Only regex offers this possibility and combinations > > are quiet infinite. > > > > Maybe may I push my modified xpdf binary so that you can test it? > > > > With best regards, > > Jerry > > _______________________________________________ > > poppler mailing list > > [email protected] > > http://lists.freedesktop.org/mailman/listinfo/poppler > > Hi All, > > I just see Marc's mail in Poppler archive. > "[poppler] whole word search?" > It seems that he is almost the only one to refer to regex in pdf search > engine > :( > > There is a possibility to support regex over Unicode in Poppler (which is > quiet difficult with xpdf). > But I would like to know if there is some Poppler's contributers interested > in. In this case spending my time implementing a clean patch supporting > Unicode will be more reasonable. Else I will keep it for me ...
We are interested in code that makes sense from a library point of view, a search that only works with latin characters does not make much sense, one that works on Unicode makes sense. Albert > > Best regards > Jerry > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
