Hi, Gregg,

Gregg Irwin wrote:
> 
...
> 
> Now, you run it and it shows you the results for confirmation,
> just like a new assistant you hire. You give it feedback, which
> it remembers, and over time it builds new rules to account for
> your style. At some point, it may do so well that you tell it
> "Don't ask me to proof your work unless you have some doubt about
> something". If something, like your example, is of a critical
> nature, you can always tell it "run this by me before you send it
> out", or "Add your own 'signature' so, if something is wrong,
> they can blame you." :)
> 

That's an interesting (although severely non-trivial) approach to
the development issue, but I was describing a property of the
problem space itself.  Using myself as a case in point, I made up
a list of the ways I've actually seen US phone numbers written or
typed/typeset:

          551-1211
      800-552-1212
    1-800-553-1213
    1+800-554-1214
      800/555-1215
     (800)556-1216
    (800) 557-1217
      800.558.1218
    1.800.559.1219

Even handling this short list (with nicely commented/named code ;-)
to find phone numbers in a text file required something roughly
like the following:

8<------------------------------------------------------------
phones: make object! [

    defaultarea: "123"

    areadata:  none
    exchdata:  none
    linedata:  none

    digits:    charset "0123456789"
    plusminus: charset "+-"

    ldcode: ["1" plusminus | none]
    optgap: [" " | none]

    area:  [copy areadata 3 digits]
    exch:  [copy exchdata 3 digits]
    line:  [copy linedata 4 digits]

    phonepatterns: [
                                      exch "-" line (areadata: none)
    |   ldcode        area "-"        exch "-" line
    |                 area "/"        exch "-" line
    |             "(" area ")" optgap exch "-" line
    |   ["1." | none] area "."        exch "." line
    ]

    findphones: func [st [string!] /local result] [
        result: clear []
        parse/all st [
            any [
                phonepatterns (
                    append result rejoin [
                        any [areadata defaultarea]
                        "-" exchdata "-" linedata
                    ]
                )
            |   skip
            ]
        ]
        result
    ]

    run: func [fn [file!] /local text] [
        text: read fn 
        print rejoin [{"^/} text {^/"}]
        foreach phone findphones text [
            print [tab phone]
        ]
        print ""
    ]
]

phones/run %phones.txt
8<------------------------------------------------------------

I'm sure someone could tighten it up a bit, but that's not my
main point here.  This quick draft version is still sensitive
to false positives (e.g. a line with a product number in it
resembling

    #AB-1234-56789

In addition, when I described this to a collegue, she immediately
asked, "What about extensions?", raising the issue of multi-line
phone systems where the numbers might be written/typed as

    800-555-1212x123
    808-554-1212 x 234
    888-556-1232 ext 456
    889-567-1242 ext. 9876
    898-576-1252/1234

(with the extension in combination with other phone number formats
as appear in the code above).

Whether a human manufactures the rules, or a piece of AI software
attempts to do so (and I suspect the human will do a better job at
this point in history), the problem remains that the size of the
rule set itself undergoes a combinatorial explosion as we try to
take into account the variations in the data.

And we haven't even tackled odd cases like the following

   You may call me at my office at 1-800-555-1212--I expect to be
   there until 5:00PM--to discuss our presentation.

which ultimately require an actual *understanding* of the text to
achieve high accuracy.

Hence my description of the problem domain itself as being
metaphorically fractal.

-jn-
-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

Reply via email to