Re: Regex validation, was Re: Programmers with network engineering skills

2012-03-19 Thread Jeroen van Aart

Joel Maslak wrote:

is not.  But there is value in not passing utter garbage to another
program (it has a tendency to clog mail queues, if for no other
reason) - just make sure you do it right.


I fail to see why you wouldn't be able to throttle any abuse of your 
webform so it wouldn't clog a mail queue. Besides it's very hard to clog 
or otherwise overload an MTA, since it's purpose built to handle that 
kind of thing.


I also fail to see why it would be so hard to install an MTA listening 
on localhost which sole purpose would be to validate email addresses and 
nothing else. And just dumps any possible outgoing email to /dev/null.


If you're afraid of clogging the mail queue then only hand it off to the 
sending MTA after validation succeeded. But to be honest why would you 
care? MTAs are purpose made to handle such things.


I can't really think of a scenario where validating an email address 
using a separate service would create such a performance bottleneck. If 
you have robots flooding your web forms 1000s of times a second (still 
peanuts for the average MTA) you need to rethink your security and abuse 
prevention...not your email validation...I would say. :-)


People us a separate database instance for database queries, the 
database server has its own code to validate input. We don't code our 
own database server as part of the web form handling code. Why not hand 
of email validation the same way?



Okay, I'll step off the soap box and let the next person holler about
how I was wrong about all this!


You're mostly right, but I disagreed about the email validation part. I 
just don't see a point in re-inventing the wheel when there are 
perfectly capable free alternatives that can do it for you with no 
noticeable performance penalty.


Greetings,
Jeroen

--
Earthquake Magnitude: 4.8
Date: Saturday, March 17, 2012 01:49:29 UTC
Location: Banda Sea
Latitude: -7.0313; Longitude: 123.4175
Depth: 632.60 km



Regex validation, was Re: Programmers with network engineering skills

2012-03-13 Thread Joel Maslak
On Mon, Mar 12, 2012 at 9:18 PM, Mark Andrews ma...@isc.org wrote:

 Only if you don't properly quote/escape the arguments you are passing.

You're using your OS wrong if you are quoting/escaping the arguments.
You do not need a shell involved to use fork() + exec() + wait(), as
the shell is not involved (assuming Unix; I also suspect libc has a
nice packaged function for this that is not insecure like system(),
but it's not all that hard to roll your own).  In Perl, use the
multi-argument form of system(), not the single argument version().
In both cases you should clear the environment as well prior to the
exec()/system() unless you know nobody can play with LD_PRELOAD, IFS,
etc.

This is one of my pet peeves about programming - programmers calling
out to insecure functions when secure alternatives are available.

The same goes for SQL statements - if you need to quote things to
prevent SQL injection, you're using your SQL database wrong.  Look up
prepared statements.  Generally, it's very bad practice to dynamically
build SQL strings.  It's also very common practice, hence why so many
applications have SQL injection vulnerabilities.  It's the Perl/PHP
equivalent of the buffer overflow that simply wouldn't exist if
developers, instead of trying to figure out how to quote everything,
simply used prepared statements and placeholders.

As for checking for bogus email addresses, read the RFC and code it
right.  That's not with a too-simple regex, nor is it with a complex
regex.  You need a parser, which is the right tool for the job.  Regex
is not.  But there is value in not passing utter garbage to another
program (it has a tendency to clog mail queues, if for no other
reason) - just make sure you do it right.

I might add that the same goes for names.  People don't just have a
first name and a last name - some people just have one name, some
people have three or four names, some people have surnames with
spaces, hypens, or apostrophes (remember what I said about SQL?!),
etc.  Yet most systems I see assume people have two names with no
spaces, apostrophies, hyphens, etc.  Big mistake.  And don't get me
started on addresses, which might have one address line, two address
lines, even 5 address lines, to say nothing that international
addresses may or may not put the street part first.  It's certainly
not easily regex-able.

Okay, I'll step off the soap box and let the next person holler about
how I was wrong about all this!