Re: [HACKERS] Future of our regular expression code

2012-03-10 Thread Vik Reykja
On Sat, Feb 18, 2012 at 21:16, Vik Reykja vikrey...@gmail.com wrote: I would be willing to have a go at translating test cases. I do not (yet) have the C knowledge to maintain the regex code, though. I got suddenly swamped and forgot I had signed up for this. I'm still pretty swamped and I

Re: [HACKERS] Future of our regular expression code

2012-03-10 Thread Tom Lane
Vik Reykja vikrey...@gmail.com writes: On Sat, Feb 18, 2012 at 21:16, Vik Reykja vikrey...@gmail.com wrote: I would be willing to have a go at translating test cases. I do not (yet) have the C knowledge to maintain the regex code, though. I got suddenly swamped and forgot I had signed up for

Re: [HACKERS] Future of our regular expression code

2012-02-20 Thread Billy Earney
Jay, Good links, and I've also looked at a few others with benchmarks. I believe most of the benchmarks are done before PCRE implemented jit. I haven't found a benchmark with jit enabled, so I'm not sure if it will make a difference. Also I'm not sure how accurately the benchmarks will show

Re: [HACKERS] Future of our regular expression code

2012-02-20 Thread Tom Lane
Billy Earney billy.ear...@gmail.com writes: Also would it be possible to set a session variable (lets say PGREGEXTYPE) and set it to ARE (current alg), RE2, or PCRE, that way users could choose which implementation they want (unless we find a single implementation that beats the others in

Re: [HACKERS] Future of our regular expression code

2012-02-20 Thread Billy Earney
Tom, Thanks for your reply. So is the group leaning towards just maintaining the current regex code base, or looking into introducing a new library (RE2, PCRE, etc)? Or is this still open for discussion? Thanks! Billy On Mon, Feb 20, 2012 at 3:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:

Re: [HACKERS] Future of our regular expression code

2012-02-20 Thread Tom Lane
Billy Earney billy.ear...@gmail.com writes: Thanks for your reply. So is the group leaning towards just maintaining the current regex code base, or looking into introducing a new library (RE2, PCRE, etc)? Or is this still open for discussion? Well, introducing a new library would create

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Brendan Jurd
On 19 February 2012 15:49, Tom Lane t...@sss.pgh.pa.us wrote: That sounds great. BTW, if you don't have it already, I'd highly recommend getting a copy of Friedl's Mastering Regular Expressions.  It's aimed at users not implementers, but there is a wealth of valuable context information in

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Tom Lane
Brendan Jurd dire...@gmail.com writes: Are you far enough into the backrefs bug that you'd prefer to see it through, or would you like me to pick it up? Actually, what I've been doing today is a brain dump. This code is never going to be maintainable by anybody except its original author

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Brendan Jurd
On 20 February 2012 10:42, Tom Lane t...@sss.pgh.pa.us wrote: I have also got a bunch of text about the colormap management code, which I think is interesting right now because that is what we are going to have to fix if we want decent performance for Unicode \w and related classes (cf the

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Billy Earney
Tom, I did a google search, and found the following: http://www.arglist.com/regex/ Which states that Tcl uses the same library from Henry. Maybe someone involved with that project would help explain the library? Also I noticed at the url above is a few ports people did from Henry's code. I

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Tom Lane
Billy Earney billy.ear...@gmail.com writes: I did a google search, and found the following: http://www.arglist.com/regex/ Hmm ... might be worth looking at those two pre-existing attempts at making a standalone library from Henry's code, just to see what choices they made. Which states that

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Billy Earney
Thanks Tom. I looked at the code in the libraries I referred to earlier, and it looks like the code in the regex directory is exactly the same as Walter Waldo's version, which has at least one comment from the middle of last decade (~ 2003). Has people thought about migrating to the pcre

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Stephen Frost
Billy, * Billy Earney (billy.ear...@gmail.com) wrote: Thanks Tom. I looked at the code in the libraries I referred to earlier, and it looks like the code in the regex directory is exactly the same as Walter Waldo's version, which has at least one comment from the middle of last decade (~

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Greg Stark
On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote:  A larger point is that it'd be a real shame for the Spencer regex engine to die off, because it is in fact one of the best pieces of regex technology on the planet. ... Another possible long-term answer is to finish the work

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Stephen Frost
Greg, * Greg Stark (st...@mit.edu) wrote: I can't see how your first claim that the Spencer code is worth keeping around because it's just a superior regex implementation has much force unless we can accomplish the latter. If the library can be split off into a standalone library then it

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Tom Lane
Greg Stark st...@mit.edu writes: ... We need a library that can be used to defend against malicious regexes and i suspect neither Perl's nor Python's library will suffice for this. Yeah. Did you read the Russ Cox papers referenced upthread? One of the things Google wanted was provably

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Greg Smith
On 02/19/2012 10:28 PM, Greg Stark wrote: One thing that concerns me more and more is that most sufficiently powerful regex implementations are susceptible to DOS attacks. There's a list of evil regexes at http://en.wikipedia.org/wiki/ReDoS The Perl community's reaction to Russ Cox's regex

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Jay Levitt
Stephen Frost wrote: Alright, I'll bite.. Which existing regexp implementation that's well written, well maintained, and which is well protected against malicious regexes should we be considering then? FWIW, there's a benchmark here that compares a number of regexp engines, including PCRE,

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Simon Riggs
On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: So I'm feeling that we gotta suck it up and start acting like we are the lead maintainers for this code, not just consumers. By we, I take it you mean you personally? There are many requests I might make for allocations of

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote: On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: So I'm feeling that we gotta suck it up and start acting like we are the lead maintainers for this code, not just consumers. By we, I take it you mean you personally? I'm

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes: * Simon Riggs (si...@2ndquadrant.com) wrote: Do we have volunteers that might save Tom from taking on this task? It's not something that requires too much knowledge and experience of PostgreSQL, so is an easier task for a newcomer. Sure, it doesn't

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Simon Riggs
On Sat, Feb 18, 2012 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: One immediate consequence of deciding that we are lead maintainers and not just consumers is that we should put in some regression tests, instead of taking the attitude that the Tcl guys are in charge of that. I have a head

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Vik Reykja
On Sat, Feb 18, 2012 at 21:04, Simon Riggs si...@2ndquadrant.com wrote: On Sat, Feb 18, 2012 at 7:52 PM, Tom Lane t...@sss.pgh.pa.us wrote: One immediate consequence of deciding that we are lead maintainers and not just consumers is that we should put in some regression tests, instead of

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Andrew Dunstan
On 02/18/2012 02:25 PM, Stephen Frost wrote: Do we have volunteers that might save Tom from taking on this task? It's not something that requires too much knowledge and experience of PostgreSQL, so is an easier task for a newcomer. Sure, it doesn't require knowledge of PG, but I dare say

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Vik Reykja vikrey...@gmail.com writes: On Sat, Feb 18, 2012 at 21:04, Simon Riggs si...@2ndquadrant.com wrote: Translating the test cases is a great way in for a volunteer, so please leave a few easy things to get people started on the road to maintaining that. I would be willing to have a

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Dimitri Fontaine
Tom Lane t...@sss.pgh.pa.us writes: Yeah ... if you *don't* know the difference between a DFA and an NFA, you're likely to find yourself in over your head. Having said that, So, here's a paper I found very nice to get started into this subject: http://swtch.com/~rsc/regexp/regexp1.html If

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: Yeah ... if you *don't* know the difference between a DFA and an NFA, you're likely to find yourself in over your head. Having said that, So, here's a paper I found very nice to get started into this subject:

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Marko Kreen
On Sun, Feb 19, 2012 at 1:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: Dimitri Fontaine dimi...@2ndquadrant.fr writes: Tom Lane t...@sss.pgh.pa.us writes: Yeah ... if you *don't* know the difference between a DFA and an NFA, you're likely to find yourself in over your head.  Having said that,

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Christopher Browne
On Sat, Feb 18, 2012 at 7:24 PM, Marko Kreen mark...@gmail.com wrote: About our Spencer code - if we don't have resources (not called Tom) Is there anything that would be worth talking about directly with Henry? He's in one of my circles of colleagues; had dinner with a group that included him

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Christopher Browne cbbro...@gmail.com writes: On Sat, Feb 18, 2012 at 7:24 PM, Marko Kreen mark...@gmail.com wrote: About our Spencer code - if we don't have resources (not called Tom) Is there anything that would be worth talking about directly with Henry? He's in one of my circles of

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Brendan Jurd
On 19 February 2012 06:52, Tom Lane t...@sss.pgh.pa.us wrote: Yeah ... if you *don't* know the difference between a DFA and an NFA, you're likely to find yourself in over your head.  Having said that, this is eminently learnable stuff and pretty self-contained, so somebody who had the time and

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Brendan Jurd dire...@gmail.com writes: On 19 February 2012 06:52, Tom Lane t...@sss.pgh.pa.us wrote: Yeah ... if you *don't* know the difference between a DFA and an NFA, you're likely to find yourself in over your head.  Having said that, this is eminently learnable stuff and pretty