Quoting [email protected]: > Quoting Albert Astals Cid <[email protected]>: > > > A Dijous, 6 d'octubre de 2011, [email protected] và reu escriure: > > > Quoting Albert Astals Cid <[email protected]>: > > > > A Dimecres, 5 d'octubre de 2011, [email protected] và reu escriure: > > > > > Dear all, > > > > > > > > Hi > > > > > > > > > for some months I had the need for a regex find to dig out into huge > > > > > pdf docs. Please find a patch attached that implements this feature > > > > > on top of xpdf-3.03. It support ASCII only, backward and > > > > > case-sensitive searches (word-only check-box has no effect any > > > > > more). The xpdf MMI haven't been modified so that you can only > > > > > perform regex searches with this patch! I saw that xpdf-3.03 is > > > > > being merge in Poppler. Hope that it could help to make a review :) > > > > > Let me know if you are interested in this patch so that I can help > > > > > to merge it in Poppler. > > > > > > > > We still have not merged xpdf-3.03 and it will probably still take a > > > > while, but anyways i am not sure ASCII only is a good idea. Why that > > > > limitation? > > > > > > > > Albert > > > > > > Hi, > > > > > > In fact, this basic implementation relies on POSIX regex functions > regcomp, > > > regexec, regerror, regfree. These functions takes char strings and not > > > Unicode strings in input. Thus, ASCII control chars and ASCII printable > > > chars can be matched. Supporting Unicode-compatible regex search is much > > > eavy to implement and out of my scope for the time being. I would like to > > > support much more but I forecast a huge effort to gain Unicode. > Morerover, > > > ASCII matches 99% of my need in term of search in English data-sheets :) > > > > Sure, it might match your needs, but if you contribute it to poppler, > people > > will start demanding that it works with non ASCII characters and you will > > probably not be here anymore and the burden will be on our side. > > > > Albert > > > > > I know that this patch has some weaknesses but I think it can be great to > > > get regex search in some applications such as Evince of which is gui _ > > > according to me _ smarter than xpdf one. > > > > > > Best regards > > > Jerry > > > > > > PS: Sorry for my poor English and my clumsy proposal :) > > > > > > > > Best regards > > > > > Jerry > > > > > > > > _______________________________________________ > > > > poppler mailing list > > > > [email protected] > > > > http://lists.freedesktop.org/mailman/listinfo/poppler > > > > > > _______________________________________________ > > > poppler mailing list > > > [email protected] > > > http://lists.freedesktop.org/mailman/listinfo/poppler > > _______________________________________________ > > poppler mailing list > > [email protected] > > http://lists.freedesktop.org/mailman/listinfo/poppler > > > > Hi Albert, > > To be more precise, the patch supports also extended ASCII 0x7F-0xFF as well > as > control chars 0x01-0x1F and printable chars 0x20-0x7F. This means that on my > Ubuntu 10.04 I can input and find ASCII and all iso latin 1 chars > (iso-8859-1) > such as e acute 'é', a grave 'à' and so on. All other extended ASCII sets are > supported according to your computer configuration and keyboard settings. > > I think that it covers not only my needs but also most of EMEA users' ones. > RegEx search is a well-known old feature for many editors and script > language. > This patch brings this powerful feature to xpdf and it can be a totally new > on > Poppler. Supporting only 1-byte charset encoding is more a restriction for > APAC > users than a bug. > > For instance, mind that you are searching a sentence beginning by "The " > followed by any word and then by " is" you just have to type "The .* is" > regex > in find dialog box. Only regex offers this possibility and combinations are > quiet infinite. > > Maybe may I push my modified xpdf binary so that you can test it? > > With best regards, > Jerry > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler >
Hi All, I just see Marc's mail in Poppler archive. "[poppler] whole word search?" It seems that he is almost the only one to refer to regex in pdf search engine :( There is a possibility to support regex over Unicode in Poppler (which is quiet difficult with xpdf). But I would like to know if there is some Poppler's contributers interested in. In this case spending my time implementing a clean patch supporting Unicode will be more reasonable. Else I will keep it for me ... Best regards Jerry _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
