On Fri, Mar 12, 2010 at 01:52:01PM +0000, Martin Gregorie wrote:
> On Fri, 2010-03-12 at 08:15 +0200, Henrik K wrote:
> 
> > Why don't you simply maintain your wordlists in some files and use a script
> > to generate portmanteau.cf? You could use Regexp::Assemble module to
> > optimize also. Who cares what the actual rules look like? The more words
> > (simple alternations) there are in a single RE, the better it performs. If
> > you want clarity in the cf, keep the original words listed in a comment
> > block.
> >
> ....because that didn't occur to me. 
> 
> Its a good idea. Better yet, my rule development & test environment can
> be easily extended to incorporate it. Thanks.
> 
> Your comment about a single regex containing many alternations being
> more efficient than several smaller ones raises two questions:
> 
> - what is the maximum line length for such a rule?

I don't think there is any serious limit in line or re size, atleast I
couldn't find any (I think Perl itself happily compiles REs in MB size). I'd
stay with something reasonable like 1kB. It's not like there aren't hundreds
of other REs around, so having few more is fine. :-)
 
> - does the order of alternations have any effect on performance or
>   is alphabetic order good enough? It would certainly make rule
>   generation simpler.

If you have enough words to require multiple REs, then sorting doesn't hurt.
So the start boundaries for a single RE to catch on are minimized.

Reply via email to