Re: [HACKERS] Future of our regular expression code

2012-03-10 Thread Tom Lane
Vik Reykja writes: > On Sat, Feb 18, 2012 at 21:16, Vik Reykja wrote: >> I would be willing to have a go at translating test cases. I do not (yet) >> have the C knowledge to maintain the regex code, though. > I got suddenly swamped and forgot I had signed up for this. I'm still > pretty swampe

Re: [HACKERS] Future of our regular expression code

2012-03-10 Thread Vik Reykja
On Sat, Feb 18, 2012 at 21:16, Vik Reykja wrote: > I would be willing to have a go at translating test cases. I do not (yet) > have the C knowledge to maintain the regex code, though. I got suddenly swamped and forgot I had signed up for this. I'm still pretty swamped and I would like these r

Re: [HACKERS] Future of our regular expression code

2012-02-20 Thread Tom Lane
Billy Earney writes: > Thanks for your reply. So is the group leaning towards just maintaining > the current regex code base, or looking into introducing a new library > (RE2, PCRE, etc)? Or is this still open for discussion? Well, introducing a new library would create compatibility issues th

Re: [HACKERS] Future of our regular expression code

2012-02-20 Thread Billy Earney
Tom, Thanks for your reply. So is the group leaning towards just maintaining the current regex code base, or looking into introducing a new library (RE2, PCRE, etc)? Or is this still open for discussion? Thanks! Billy On Mon, Feb 20, 2012 at 3:35 PM, Tom Lane wrote: > Billy Earney writes:

Re: [HACKERS] Future of our regular expression code

2012-02-20 Thread Tom Lane
Billy Earney writes: > Also would it be possible to set a session variable (lets say PGREGEXTYPE) > and set it to ARE (current alg), RE2, or PCRE, that way users could choose > which implementation they want (unless we find a single implementation that > beats the others in almost all categories)

Re: [HACKERS] Future of our regular expression code

2012-02-20 Thread Billy Earney
Jay, Good links, and I've also looked at a few others with benchmarks. I believe most of the benchmarks are done before PCRE implemented jit. I haven't found a benchmark with jit enabled, so I'm not sure if it will make a difference. Also I'm not sure how accurately the benchmarks will show how

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Jay Levitt
Stephen Frost wrote: Alright, I'll bite.. Which existing regexp implementation that's well written, well maintained, and which is well protected against malicious regexes should we be considering then? FWIW, there's a benchmark here that compares a number of regexp engines, including PCRE, TR

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Greg Smith
On 02/19/2012 10:28 PM, Greg Stark wrote: One thing that concerns me more and more is that most sufficiently powerful regex implementations are susceptible to DOS attacks. There's a list of "evil regexes" at http://en.wikipedia.org/wiki/ReDoS The Perl community's reaction to Russ Cox's regex p

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Tom Lane
Greg Stark writes: > ... We need a library that can be used to defend > against malicious regexes and i suspect neither Perl's nor Python's > library will suffice for this. Yeah. Did you read the Russ Cox papers referenced upthread? One of the things Google wanted was provably limited resource

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Stephen Frost
Greg, * Greg Stark (st...@mit.edu) wrote: > I can't see how your first claim that the Spencer code is worth > keeping around because it's just a superior regex implementation has > much force unless we can accomplish the latter. If the library can be > split off into a standalone library then it m

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Greg Stark
On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane wrote: >  A larger point is that it'd be a real shame > for the Spencer regex engine to die off, because it is in fact one of > the best pieces of regex technology on the planet. ... > Another possible long-term answer is to finish the work Henry never did

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Stephen Frost
Billy, * Billy Earney (billy.ear...@gmail.com) wrote: > Thanks Tom. I looked at the code in the libraries I referred to earlier, > and it looks like the code in the regex directory is exactly the same as > Walter Waldo's version, which has at least one comment from the middle of > last decade (~

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Billy Earney
Thanks Tom. I looked at the code in the libraries I referred to earlier, and it looks like the code in the regex directory is exactly the same as Walter Waldo's version, which has at least one comment from the middle of last decade (~ 2003). Has people thought about migrating to the pcre library

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Tom Lane
Billy Earney writes: > I did a google search, and found the following: > http://www.arglist.com/regex/ Hmm ... might be worth looking at those two pre-existing attempts at making a standalone library from Henry's code, just to see what choices they made. > Which states that Tcl uses the same lib

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Billy Earney
Tom, I did a google search, and found the following: http://www.arglist.com/regex/ Which states that Tcl uses the same library from Henry. Maybe someone involved with that project would help explain the library? Also I noticed at the url above is a few ports people did from Henry's code. I did

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Brendan Jurd
On 20 February 2012 10:42, Tom Lane wrote: > I have also got > a bunch of text about the colormap management code, which I think > is interesting right now because that is what we are going to have > to fix if we want decent performance for Unicode \w and related > classes (cf the other current -h

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Tom Lane
Brendan Jurd writes: > Are you far enough into the backrefs bug that you'd prefer to see it > through, or would you like me to pick it up? Actually, what I've been doing today is a brain dump. This code is never going to be maintainable by anybody except its original author without some internal

Re: [HACKERS] Future of our regular expression code

2012-02-19 Thread Brendan Jurd
On 19 February 2012 15:49, Tom Lane wrote: > That sounds great. > > BTW, if you don't have it already, I'd highly recommend getting a copy > of Friedl's "Mastering Regular Expressions".  It's aimed at users not > implementers, but there is a wealth of valuable context information in > there, as we

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Brendan Jurd writes: > On 19 February 2012 06:52, Tom Lane wrote: >> Yeah ... if you *don't* know the difference between a DFA and an NFA, >> you're likely to find yourself in over your head.  Having said that, >> this is eminently learnable stuff and pretty self-contained, so somebody >> who had

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Brendan Jurd
On 19 February 2012 06:52, Tom Lane wrote: > Yeah ... if you *don't* know the difference between a DFA and an NFA, > you're likely to find yourself in over your head.  Having said that, > this is eminently learnable stuff and pretty self-contained, so somebody > who had the time and interest could

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Christopher Browne writes: > On Sat, Feb 18, 2012 at 7:24 PM, Marko Kreen wrote: >> About our Spencer code - if we don't have resources (not called Tom) > Is there anything that would be worth talking about directly with > Henry? He's in one of my circles of colleagues; had dinner with a > grou

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Christopher Browne
On Sat, Feb 18, 2012 at 7:24 PM, Marko Kreen wrote: > About our Spencer code - if we don't have resources (not called Tom) Is there anything that would be worth talking about directly with Henry? He's in one of my circles of colleagues; had dinner with a group that included him on Thursday. --

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Marko Kreen
On Sun, Feb 19, 2012 at 1:55 AM, Tom Lane wrote: > Dimitri Fontaine writes: >> Tom Lane writes: >>> Yeah ... if you *don't* know the difference between a DFA and an NFA, >>> you're likely to find yourself in over your head.  Having said that, > >> So, here's a paper I found very nice to get star

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Dimitri Fontaine writes: > Tom Lane writes: >> Yeah ... if you *don't* know the difference between a DFA and an NFA, >> you're likely to find yourself in over your head. Having said that, > So, here's a paper I found very nice to get started into this subject: > http://swtch.com/~rsc/regexp/r

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Dimitri Fontaine
Tom Lane writes: > Yeah ... if you *don't* know the difference between a DFA and an NFA, > you're likely to find yourself in over your head. Having said that, So, here's a paper I found very nice to get started into this subject: http://swtch.com/~rsc/regexp/regexp1.html If anyone's interest

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Vik Reykja writes: > On Sat, Feb 18, 2012 at 21:04, Simon Riggs wrote: >> Translating the test cases is a great way in for a volunteer, so >> please leave a few easy things to get people started on the road to >> maintaining that. > I would be willing to have a go at translating test cases. I d

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Andrew Dunstan
On 02/18/2012 02:25 PM, Stephen Frost wrote: Do we have volunteers that might save Tom from taking on this task? It's not something that requires too much knowledge and experience of PostgreSQL, so is an easier task for a newcomer. Sure, it doesn't require knowledge of PG, but I dare say there

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Vik Reykja
On Sat, Feb 18, 2012 at 21:04, Simon Riggs wrote: > On Sat, Feb 18, 2012 at 7:52 PM, Tom Lane wrote: > > > One immediate consequence of deciding that we are lead maintainers and > > not just consumers is that we should put in some regression tests, > > instead of taking the attitude that the Tcl

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Simon Riggs
On Sat, Feb 18, 2012 at 7:52 PM, Tom Lane wrote: > One immediate consequence of deciding that we are lead maintainers and > not just consumers is that we should put in some regression tests, > instead of taking the attitude that the Tcl guys are in charge of that. > I have a head cold today and a

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Tom Lane
Stephen Frost writes: > * Simon Riggs (si...@2ndquadrant.com) wrote: >> Do we have volunteers that might save Tom from taking on this task? >> It's not something that requires too much knowledge and experience of >> PostgreSQL, so is an easier task for a newcomer. > Sure, it doesn't require knowl

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Stephen Frost
* Simon Riggs (si...@2ndquadrant.com) wrote: > On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane wrote: > > So I'm feeling that we gotta suck it up and start acting like we are > > the lead maintainers for this code, not just consumers. > > By "we", I take it you mean you personally? I'm pretty sure he

Re: [HACKERS] Future of our regular expression code

2012-02-18 Thread Simon Riggs
On Sat, Feb 18, 2012 at 6:15 PM, Tom Lane wrote: > So I'm feeling that we gotta suck it up and start acting like we are > the lead maintainers for this code, not just consumers. By "we", I take it you mean you personally? There are many requests I might make for allocations of your time and tha