Quoting Albert Astals Cid <[email protected]>: > A Dijous, 6 d'octubre de 2011, [email protected] và reu escriure: > > Quoting Albert Astals Cid <[email protected]>: > > > A Dimecres, 5 d'octubre de 2011, [email protected] và reu escriure: > > > > Dear all, > > > > > > Hi > > > > > > > for some months I had the need for a regex find to dig out into huge > > > > pdf docs. Please find a patch attached that implements this feature > > > > on top of xpdf-3.03. It support ASCII only, backward and > > > > case-sensitive searches (word-only check-box has no effect any > > > > more). The xpdf MMI haven't been modified so that you can only > > > > perform regex searches with this patch! I saw that xpdf-3.03 is > > > > being merge in Poppler. Hope that it could help to make a review :) > > > > Let me know if you are interested in this patch so that I can help > > > > to merge it in Poppler. > > > > > > We still have not merged xpdf-3.03 and it will probably still take a > > > while, but anyways i am not sure ASCII only is a good idea. Why that > > > limitation? > > > > > > Albert > > > > Hi, > > > > In fact, this basic implementation relies on POSIX regex functions regcomp, > > regexec, regerror, regfree. These functions takes char strings and not > > Unicode strings in input. Thus, ASCII control chars and ASCII printable > > chars can be matched. Supporting Unicode-compatible regex search is much > > eavy to implement and out of my scope for the time being. I would like to > > support much more but I forecast a huge effort to gain Unicode. Morerover, > > ASCII matches 99% of my need in term of search in English data-sheets :) > > Sure, it might match your needs, but if you contribute it to poppler, people > will start demanding that it works with non ASCII characters and you will > probably not be here anymore and the burden will be on our side. > > Albert > > > I know that this patch has some weaknesses but I think it can be great to > > get regex search in some applications such as Evince of which is gui _ > > according to me _ smarter than xpdf one. > > > > Best regards > > Jerry > > > > PS: Sorry for my poor English and my clumsy proposal :) > > > > > > Best regards > > > > Jerry > > > > > > _______________________________________________ > > > poppler mailing list > > > [email protected] > > > http://lists.freedesktop.org/mailman/listinfo/poppler > > > > _______________________________________________ > > poppler mailing list > > [email protected] > > http://lists.freedesktop.org/mailman/listinfo/poppler > _______________________________________________ > poppler mailing list > [email protected] > http://lists.freedesktop.org/mailman/listinfo/poppler >
Hi Albert, To be more precise, the patch supports also extended ASCII 0x7F-0xFF as well as control chars 0x01-0x1F and printable chars 0x20-0x7F. This means that on my Ubuntu 10.04 I can input and find ASCII and all iso latin 1 chars (iso-8859-1) such as e acute 'é', a grave 'à' and so on. All other extended ASCII sets are supported according to your computer configuration and keyboard settings. I think that it covers not only my needs but also most of EMEA users' ones. RegEx search is a well-known old feature for many editors and script language. This patch brings this powerful feature to xpdf and it can be a totally new on Poppler. Supporting only 1-byte charset encoding is more a restriction for APAC users than a bug. For instance, mind that you are searching a sentence beginning by "The " followed by any word and then by " is" you just have to type "The .* is" regex in find dialog box. Only regex offers this possibility and combinations are quiet infinite. Maybe may I push my modified xpdf binary so that you can test it? With best regards, Jerry _______________________________________________ poppler mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/poppler
