Re: Tomcat 7 regex

2010-12-29 Thread Henri Yandell
May I suggest you emit a warning to the logs when comma is used in the
regexp for a few versions?

Hen

On Fri, Dec 24, 2010 at 10:34 AM, Mark Thomas ma...@apache.org wrote:
 There are a number of configuration properties defined as comma
 separated regular expressions. As someone pointed out at at ApacheCon
 that is a little odd. It stops , being used in an expression and is
 inefficient.

 Having just been bitten by this while setting up the new Jira instance,
 I intend change all properties that take regex in Tomcat 7 to use a
 single regex. This will simplify the code, simplify configuration and
 make the regex processing faster.

 I probably won't get around to actually doing this until the new year.

 Mark

 -
 To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: dev-h...@tomcat.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Tomcat 7 regex

2010-12-27 Thread Rainer Jung

On 25.12.2010 20:53, Mark Thomas wrote:

On 25/12/2010 01:49, Tim Funk wrote:

+0.5 - I wonder if in some cases - it may be preferable to use a
property called split which lets the user define the separator which we
can pass to String.split(). [Which OTOH may be more confusing (yet
powerful) since the user is using a regex to split get a list of regex]


I think that just makes it more complicated. There is no need to split
anything up since the regex can just use |. Rather than mix regex plus
our own proprietary splitting mechanism, I think we should just use the
support already provided by regex to do the same thing.


+1

Maybe reminding users about | with a *simple* example helps.

Concerning IP addresses and regexp: The escaping of the contained dots 
is not really necessary, as long as one is matching the whole address 
from the beginning to the end because the dots in the real address can 
only match the (escaped or not) dots in the pattern.


What might be helpful is a general way of expressing networks (network 
address plus netmask or bits). For httpd 2.4 there will be a general 
expression parser which contains an operator called -ipmatch so 
network addresses can be used not only in Allow/Deny (now: Require) but 
also everywhere else.


Regards,

Rainer

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Tomcat 7 regex

2010-12-27 Thread Christopher Schultz
Mark,

On 12/24/2010 1:34 PM, Mark Thomas wrote:
 There are a number of configuration properties defined as comma
 separated regular expressions. As someone pointed out at at ApacheCon
 that is a little odd. It stops , being used in an expression and is
 inefficient.

A comma can still be used in a regular expression as long as the rules
about how we split the whole value are well-defined (like commas can be
escaped for in-regexp use).

 Having just been bitten by this while setting up the new Jira instance,
 I intend change all properties that take regex in Tomcat 7 to use a
 single regex. This will simplify the code, simplify configuration and
 make the regex processing faster.

So the plan would be to have users convert values like this:

127\.0\.0\.1, 10\.10\.10\.1, 192\.168\.1\.[0-9]+

to this:

(127\.0\.0\.1|10\.10\.10\.1|192\.168\.1\.[0-9]+)

I have some recommendations:

1. If it's not okay to break the configuration interface, you should
change the name(s) of the attribute(s) so that old configurations are
easier to adapt to new environments. Something like allowedIPs might
become allowedIPPattern. I'm not sure if incompatibility is something
we're concerned about, though there have been a number of pre-releases
on the 7.0 branch and this sounds like quite a breaking change.

2. Make it clear /which/ regular expressions will be supported. I hate
it when an API says use a regular expression and then they don't tell
you they're using Jakarta-ORO which doesn't (conveniently) support
Unicode and you have to spend a long time figuring out why your patterns
aren't working. Presumably, we'll be using the JDK's regular expression
classes: please just state that explicitly.

3. Please make it clear, on a per-attribute basis if appropriate,
whether the pattern will implicitly use start-of-input and end-of-input
markers on the ends. I've been bitten several times by the operational
differences between using Matcher.matches (which is implicitly ^...$)
and Matcher.find/Matcher.replaceAll. Presumable, we'll be using
Matcher.matches and therefore ^...$ is not necessary in any values being
provided by the user: please just state that explicitly.

4. Please ensure that the documentation clearly reminds readers (in each
attribute, rather than requiring the reader to go to a unified short
blurb about regular expressions) that a . is anything and not just a
dot. Lots of (otherwise) smart people often write regular expressions
for IP addresses like this: 10.10.1.1.

Thanks!
-chris



signature.asc
Description: OpenPGP digital signature


Re: Tomcat 7 regex

2010-12-27 Thread Christopher Schultz
Tim,

On 12/25/2010 3:34 PM, Tim Funk wrote:
 I am thinking from an admin point of view. While you can combine OR
 conditionals in regex's - when you get something more complicated - you
 may encounter a nasty nesting of () to get all the nested OR's correct.

One can always use more, simpler | expressions rather than using a ton
of () expressions to get the shortest regular expression.

-chris



signature.asc
Description: OpenPGP digital signature


Re: Tomcat 7 regex

2010-12-27 Thread Rainer Jung

On 27.12.2010 21:22, Christopher Schultz wrote:

So the plan would be to have users convert values like this:

127\.0\.0\.1, 10\.10\.10\.1, 192\.168\.1\.[0-9]+

to this:

(127\.0\.0\.1|10\.10\.10\.1|192\.168\.1\.[0-9]+)


which is equivalent to

127\.0\.0\.1|10\.10\.10\.1|192\.168\.1\.[0-9]+

if we do not want to reference the resulting matches.

One way to get rid of the dot escaping would be

^(127.0.0.1|10.10.10.1|192.168.1.[0-9]+)$

because the verbatim dots in the IP addresses can only match the any 
char dot in that expression.



3. Please make it clear, on a per-attribute basis if appropriate,
whether the pattern will implicitly use start-of-input and end-of-input
markers on the ends. I've been bitten several times by the operational
differences between using Matcher.matches (which is implicitly ^...$)
and Matcher.find/Matcher.replaceAll. Presumable, we'll be using
Matcher.matches and therefore ^...$ is not necessary in any values being
provided by the user: please just state that explicitly.


+1


4. Please ensure that the documentation clearly reminds readers (in each
attribute, rather than requiring the reader to go to a unified short
blurb about regular expressions) that a . is anything and not just a
dot. Lots of (otherwise) smart people often write regular expressions
for IP addresses like this: 10.10.1.1.


... which can be OK, see above.

Regards,

Rainer

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Tomcat 7 regex

2010-12-25 Thread Konstantin Kolinko
1) It it were configurable,  in certain places it makes sense to use
space as a separator (e.g. in IP addresses).
- any whitespace? - \w+ and we end up with using a regex to split a
list of regexes.

2) It might make sense to require regex expressions to be surrounded by '/'.
E.g. /192\.168\.1\.\d{1,3}/ is a regex, but 192.168.1.17 is a literal value.

3) I wonder if it makes sense to manipulate RequestFilterValve though
JMX. E.g. to add/remove some filtering patterns at runtime.


Mark, are there other places than RequestFilterValve and its
subclasses (RemoteAddrValve, RemoteHostValve) where you are planning
this change?

There this feature can be configurable. E.g. if split='' then
splitting is not performed at all. I do not see why we should force
users to use a single regex only.

Having a single regex by default is OK with me, but forcing a single
regex saves too little in performance of
RequestFilterValve.process(..) (removes iterating over an array but
adds a null check).


[OT] Merry X'mas


Best regards,
Konstantin Kolinko

2010/12/25 Tim Funk funk...@apache.org:
 +0.5 - I wonder if in some cases - it may be preferable to use a property
 called split which lets the user define the separator which we can pass to
 String.split(). [Which OTOH may be more confusing (yet powerful) since the
 user is using a regex to split get a list of regex]

 -Tim

 On 12/24/2010 1:34 PM, Mark Thomas wrote:

 There are a number of configuration properties defined as comma
 separated regular expressions. As someone pointed out at at ApacheCon
 that is a little odd. It stops , being used in an expression and is
 inefficient.

 Having just been bitten by this while setting up the new Jira instance,
 I intend change all properties that take regex in Tomcat 7 to use a
 single regex. This will simplify the code, simplify configuration and
 make the regex processing faster.

 I probably won't get around to actually doing this until the new year.

 -
 To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: dev-h...@tomcat.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Tomcat 7 regex

2010-12-25 Thread Mark Thomas
On 25/12/2010 01:49, Tim Funk wrote:
 +0.5 - I wonder if in some cases - it may be preferable to use a
 property called split which lets the user define the separator which we
 can pass to String.split(). [Which OTOH may be more confusing (yet
 powerful) since the user is using a regex to split get a list of regex]

I think that just makes it more complicated. There is no need to split
anything up since the regex can just use |. Rather than mix regex plus
our own proprietary splitting mechanism, I think we should just use the
support already provided by regex to do the same thing.

Mark

 
 -Tim
 
 On 12/24/2010 1:34 PM, Mark Thomas wrote:
 There are a number of configuration properties defined as comma
 separated regular expressions. As someone pointed out at at ApacheCon
 that is a little odd. It stops , being used in an expression and is
 inefficient.

 Having just been bitten by this while setting up the new Jira instance,
 I intend change all properties that take regex in Tomcat 7 to use a
 single regex. This will simplify the code, simplify configuration and
 make the regex processing faster.

 I probably won't get around to actually doing this until the new year.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: dev-h...@tomcat.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Tomcat 7 regex

2010-12-25 Thread Mark Thomas
On 25/12/2010 13:37, Konstantin Kolinko wrote:
 1) It it were configurable,  in certain places it makes sense to use
 space as a separator (e.g. in IP addresses).
 - any whitespace? - \w+ and we end up with using a regex to split a
 list of regexes.

Yes, space could work but I'd rather stick to what folks expect of
standard regex. | achieves the same result but is standard regex.

 2) It might make sense to require regex expressions to be surrounded by '/'.
 E.g. /192\.168\.1\.\d{1,3}/ is a regex, but 192.168.1.17 is a literal 
 value.

I'd rather Tomcat 7 moved towards an existing standard rather than tried
to create a new one.

 3) I wonder if it makes sense to manipulate RequestFilterValve though
 JMX. E.g. to add/remove some filtering patterns at runtime.

It is certainly something I can see would be useful - e.g. reacting to
an attacker. Making that dynamic should be do-able with care.

 Mark, are there other places than RequestFilterValve and its
 subclasses (RemoteAddrValve, RemoteHostValve) where you are planning
 this change?

It was actually RemoteIpValve that got me started on this. You can't
explicitly set the default since it uses ',' in the regex but we also
split using ','. I wanted to fix that and moving to a single regex fixes
that and removes any chance of any similar gotchas in the future.

 There this feature can be configurable. E.g. if split='' then
 splitting is not performed at all. I do not see why we should force
 users to use a single regex only.

All it really does is force users to use the standard regex of '|' where
they currently use ','.

 Having a single regex by default is OK with me, but forcing a single
 regex saves too little in performance of
 RequestFilterValve.process(..) (removes iterating over an array but
 adds a null check).

I don't have any hard numbers but I suspect matching a single regex
using '|' is going to be faster than matching multiple. Probably not by
much. The code simplification is pretty minor too.

 [OT] Merry X'mas
+1 to all.

Mark

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Tomcat 7 regex

2010-12-25 Thread Tim Funk
I am thinking from an admin point of view. While you can combine OR 
conditionals in regex's - when you get something more complicated - you 
may encounter a nasty nesting of () to get all the nested OR's correct. 
So while one COULD to it in a single regex - most mere mortals might not 
be able to comprehend that regex. This is where combining different 
regexes by a user defined split may come in handy.


But in reality ... the general case a user is doing is probably a 
singular regex. So since the capability exists for more exotic matches, 
it may be easier to simplify the code per your (mark's) suggestion and 
then provide enough coaching to demonstrate how more advanced regexes 
can be applied.


(So I am cool with it. Time permitting - I'll try to provide an example 
or 2 for odd regexes - like RFC1918 matching for the RemoteAddrressFilter)


-Tim

On 12/25/2010 2:53 PM, Mark Thomas wrote:

On 25/12/2010 01:49, Tim Funk wrote:

+0.5 - I wonder if in some cases - it may be preferable to use a
property called split which lets the user define the separator which we
can pass to String.split(). [Which OTOH may be more confusing (yet
powerful) since the user is using a regex to split get a list of regex]


I think that just makes it more complicated. There is no need to split
anything up since the regex can just use |. Rather than mix regex plus
our own proprietary splitting mechanism, I think we should just use the
support already provided by regex to do the same thing.

Mark



-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: Tomcat 7 regex

2010-12-25 Thread Daniel Baktiar
+1

i like the approach toward standard regex.
ones that use it most likely will be sysadmins, so they should at least know
how to use regex in their tools like shell, awk, perl, python or powershell.
it will be nice if the standard regex is in place.
---
daniel baktiar
http://savinggaia.tritiumapps.com - saving the planet is everyone's
business!




On 26 December 2010 04:02, Mark Thomas ma...@apache.org wrote:

 On 25/12/2010 13:37, Konstantin Kolinko wrote:
  1) It it were configurable,  in certain places it makes sense to use
  space as a separator (e.g. in IP addresses).
  - any whitespace? - \w+ and we end up with using a regex to split a
  list of regexes.

 Yes, space could work but I'd rather stick to what folks expect of
 standard regex. | achieves the same result but is standard regex.

  2) It might make sense to require regex expressions to be surrounded by
 '/'.
  E.g. /192\.168\.1\.\d{1,3}/ is a regex, but 192.168.1.17 is a literal
 value.

 I'd rather Tomcat 7 moved towards an existing standard rather than tried
 to create a new one.

  3) I wonder if it makes sense to manipulate RequestFilterValve though
  JMX. E.g. to add/remove some filtering patterns at runtime.

 It is certainly something I can see would be useful - e.g. reacting to
 an attacker. Making that dynamic should be do-able with care.

  Mark, are there other places than RequestFilterValve and its
  subclasses (RemoteAddrValve, RemoteHostValve) where you are planning
  this change?

 It was actually RemoteIpValve that got me started on this. You can't
 explicitly set the default since it uses ',' in the regex but we also
 split using ','. I wanted to fix that and moving to a single regex fixes
 that and removes any chance of any similar gotchas in the future.

  There this feature can be configurable. E.g. if split='' then
  splitting is not performed at all. I do not see why we should force
  users to use a single regex only.

 All it really does is force users to use the standard regex of '|' where
 they currently use ','.

  Having a single regex by default is OK with me, but forcing a single
  regex saves too little in performance of
  RequestFilterValve.process(..) (removes iterating over an array but
  adds a null check).

 I don't have any hard numbers but I suspect matching a single regex
 using '|' is going to be faster than matching multiple. Probably not by
 much. The code simplification is pretty minor too.

  [OT] Merry X'mas
 +1 to all.

 Mark

 -
 To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
 For additional commands, e-mail: dev-h...@tomcat.apache.org




Re: Tomcat 7 regex

2010-12-24 Thread Tim Funk
+0.5 - I wonder if in some cases - it may be preferable to use a 
property called split which lets the user define the separator which we 
can pass to String.split(). [Which OTOH may be more confusing (yet 
powerful) since the user is using a regex to split get a list of regex]


-Tim

On 12/24/2010 1:34 PM, Mark Thomas wrote:

There are a number of configuration properties defined as comma
separated regular expressions. As someone pointed out at at ApacheCon
that is a little odd. It stops , being used in an expression and is
inefficient.

Having just been bitten by this while setting up the new Jira instance,
I intend change all properties that take regex in Tomcat 7 to use a
single regex. This will simplify the code, simplify configuration and
make the regex processing faster.

I probably won't get around to actually doing this until the new year.


-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org