Joel Neely wrote:

>Carl Read wrote:
>
>>Well, you didn't say it was for any old phone numbers in any
>>old text, did you?
>>
>
>No, but I did say "memo"; I suspect most of us would consider
>
>    Albert Jones-Smythe ... on 12-01-2001 ... of 28-Nov-2001
>
>as neither unusual text nor phone numbers.  ;-)
>
>>Will your Perl script get this right for instance...
>>So these are standard NZ phone numbers...
>>
>>    National: 1-2-345 6789
>>    Local: 345 6789
>>    0800: 0800 123 456
>>
>
>No, it wouldn't.  I should have noted my deliberate decision to
>handle only a common form of US phone numbers (but the regular
>expression was certainly clear on that point ;-).
>
>Of course, the big danger of trying to write code that handles
>arbitrary human-written/readable text is that humans can use their
>context-sensitive intelligence to interpret (and even correct)
>a wide range of syntactical variation.  The only way around that
>AFAIK is to impose limits on the amount of variation before the
>program either gives up or raises the case for discussion.  For
>example, the simplest substitution rule that covers all of your
>NZ samples, as well as common variations on US phone numbers
>would be
>
>    s/\b[- .()\d]{8,}\b/####/g
>
>which would replace *any* run (of at least 8 characters) of digits,
>hyphens, spaces, dots, and parentheses with the "blot-out" string.
>
>But, of course, that also would affect strings such as
>
>    31-12-2001
>
>and
>
>    3.1415926535
>
>which a human would likely *not* interpret as phone numbers.
>
>IIRC, there was a thread a few months ago about trying to come
>up with a PARSE rule which would recognize phone numbers from
>as many countries as possible with as few errors (false positives
>and false negatives) as possible.  As I recall, the result was
>that the range of variation as one added countries with differing
>conventions rapidly made the task infeasible.
>
>>rebol[]c: charset "0123456789" parse/all f: read %memo.txt[some
>>[a: 1 2[3 c "-"]4 c b:(change/part a "####" b) | skip]]print f
>>
>
>Very nice!  I find it hard to imagine a shorter solution in REBOL.
>
eh, I can't read it right now - what does 1 2 [3 c "-"] 4 c is doing? 
:-) Thanks ... I also found out there is some strange 'opt keyword in 
parse rules :-)

>
>
>>I couldn't have done it without Petr's example though, as what
>>I was trying wasn't working. (:  But I now know a lot more about
>>parsing than I did yesterday...
>>
>
>And *THAT*, to my way of thinking, is the a real payoff to such
>parlor games as this -- we improve our grasp of what is (or isn't)
>feasible with one tool or another, and learn to use our tool(s)
>better.
>
Yes! That's exactly it. Language flame wars could be seen so often, that 
I am no longer interested in them :-) On the other hand I have no time 
to stufy other languages. Sometimes though I call my friend knowing php 
e.g. and ask him, how would he solve it in the tool he uses. Then I try 
to think about it and map to what I know about Rebol, and try to find 
adequate solution ...

-pekr-

>
>
>-jn-
>



-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

Reply via email to