Hi All,

Here's an experimental command line parsing module. Let me know if you
think it's useful (and worth pursuing further), what you like, and what
you don't.

-- Gregg                         


Command Line Parser Module

=== Read-Me

--- Introduction

This is an *experimental* version of a command line parsing module
for REBOL. It's so experimental that I don't even have a good name
cooked up for it yet. :)

The goal is to make it as easy as possible to define and process
command line interfaces for REBOL scripts and applications. To that
end, there is a dialect and one main function on the front end.

The PARSE-COMMAND-LINE function takes input data--generally something
like system/options/args--and a dialected spec block, very much like
PARSE does, but using its own dialect. All you have to do after that
is read the results.

    args: {--verbose -f %input.txt}     ; << system/script/args

    options: parse-command-line args [
        --verbose -v  {verbose mode}
        -f input-file {The input file}
    ]

    >> options/verbose
    == true
    >> options/input-file
    == %input.txt

To make your life even easier, standard usage and version options are
built in (though crude right now).

The command line syntax is meant to support standard Unix utility
formats. I'm a Windows guy, but REBOL runs everywhere and I think
this is the best option. Adding support for DOS "/" option syntax,
as an alternative to "-" should be neither hard nor necessary. :)

IMPORTANT NOTE! The input data is converted to a block! before it
is parsed.


--- Definitions

:option  - A switch that takes no arguments.
:opt-arg - A switch that takes one or more arguments.
:operand - A positional argument


Options some in two varieties: short and long. Short options are
a single character preceded by a single dash (-) and long options
are a word precded by two dashes (--).

    -q
    --verbose

Opt-Args consume one or more arguments following the option token.
Right now, the input is converted to a block for easy processing,
so standard REBOL lexical rules apply.

For long options, single argument(s) must be separated from the
token by a space or equal sign (=); multiple arguments have to be
separate lexical items. If you use the "<opt>=<arg>" syntax, the
argument will be seen as a string. You'll need to convert it
yourself until datatype validations and coercions are in place.

    --file %input.txt
    --file=%input.txt

Short options use the same form as long options, with one addition;
you can put the argument immediately following the switch token. I
don't care for it, but it's an accepted standard.

    -f %input.txt
    -f=input.txt
    -finput.txt

Operands are just regular values that have no corresponding switch
token associated with them. They are just accumulated in a block;
there is no support for operands beyond that at this point.


--- The Dialect

You specify options and opt-args much as they would appear in
documentation for your program. e.g.

    --quiet   -q {quiet mode}
    --verbose -v {verbose mode}

The rules are simple: Start with one or more words that begin with
a single or double dash, which will be interpreted as option tokens.
Next you can optionally include a word that *doesn't* start with
a dash, which says you're defining an opt-arg instead of just
a plain option. Next, put a string that will be displayed if the
user asks for help on the program. Finally, you can include a spec
block for the internal object that will be used during processing
(See: Objects). The dialect is basically:

    some dash-word!s  opt non-dash-word!  string!  opt block!

Example:

    -C n {
        Shows n lines of context before and after each change. diff
        marks lines removed from path1 with -, lines added to path2
        with + and lines changed in both files with !. This option
        conflicts with the -e and -f options.
    } [
        name:    'num-context-lines  ; overrides 'n
        args:    integer!
        action: [value: to integer! arg]
        default: 3
    ]

\note
You can put short strings on the same line as the tokens, but longer
strings should be formatted over multiple lines as you want them to
appear to the user.
/note

If multiple tokens are given for an option, the "name" of the token
will be taken from the first one given. In the case of an opt-arg,
the name will come from the arg name given after the option tokens.
The name is important because that's how you're going to find it
later.

--- Objects

When you call PARSE-COMMAND-LINE, giving it your command line spec,
it will return a COMMAND-PARSER object that is filled with values
that were parsed from the command line data. The name of each option
becomes a word in that object, which is how you read out values that
were set during the parsing process. For example:

    args: {--verbose -f %input.txt}

    options: parse-command-line args [
        --verbose -v {verbose mode}
        -f input-file {The input file}
    ]

    >> options/verbose
    == true
    >> options/input-file
    == %input.txt

In addition to the individual word values, you can get a list of all
the custom words added for each option in the NAMES field.

    >> options/names
    == [verbose input-file help version]

\note
HELP and VERSION are built-in options that provide standard functionality
for you at no charge.
/note

As your command line spec is parsed, internal objects are created
for each option and opt-arg (and, likely, eventually operands). By
providing a spec block for an option you can perform actions, override
the name, provide default values, and tell it how many and what type
of arguments it takes (though validation and type casting are not in
place yet).

--- Actions

Actions are defined in the spec block for an option.

    -C n {Number of context lines} [
        action: [value: to integer! arg]
    ]

Actions are just blocks of REBOL code. The current implementation is...
um...not all that great in regard to how these are handled. The thing
you really need to know is that VALUE, ARG, and SPEC are special words
in the context of an action. VALUE means the value of the option
object where the action block is defined, ARG refers to the argument(s)
consumed for the current opt-arg, and SPEC refers to the original
command line spec you provided (it's used for automatic USAGE display).

The default action, if none is given, will set the value of an option
to true and the value of an opt-arg to the argument(s) consumed on its
behalf.



;==============================================================

REBOL [
    Title:  "Command Line Dialect: Experimental Version A"
    File:   %cl-dialect-ex-a.r
    Author: "Gregg Irwin"
    Email:  [EMAIL PROTECTED]
    Date:   6-Oct-2003
    Version: 0.0.1
    Purpose: {
        Provide support for easy, automatic, command line parsing.
        You define the options in a dialect, which is used to
        build internal objects that are used to parse a command line.
    }
    Comment: {
        *****************************
        *** THIS IS EXPERIMENTAL! ***
        *****************************
        I include that caveat because I'm somewhat embarrassed by
        how I hacked it together. I have pages and pages of design
        ideas and notes, along with visions of an elegant PARSE-based
        implementation, but I was spending so much time thinking about
        the different ways it could be done, that I never got around
        to actually *doing* something with it. :\ So, I decided that
        I'd take some of the ideas and just whack something together
        to play with, and that's what this is. No, I'm not sure I like
        it, but it's something we can all use as a starting point, even
        if we only learn what *not* to do from it. :)

        One of the things we should use it for is to iron out the
        input dialect.

        There is a lot of other stuff to do:
            - General design needs to be re-thought to avoid all the
              ugly binding issues I created with this design (i.e.
              option/action contexts).
            - usage info for operands
            - conflict specification and handling
            - data type validation/coercion
            - what to do with unknown tokens
            - clean out and refactor unused idea bits
            - comment and explain things a lot more
            - CHOICE args (i.e. one of a set of options)
            - where best to get program name and version
            - match abbreviated long option names?
            - mulitple arg names for opt-args?
            - action handler for operands?
            - named operands?
            - more complete program info dialect?
                program-info: [
                    name: version: synopsis: description: options:
                    operands: examples: environment-variables:
                    diagnostics: messages: limits:
                ]

        I've looked at a number of modules in other languages that
        do this kind of thing, from getopt on up, but the heaviest
        influence was the Python Optik module by Greg Ward
        (http://optik.sourceforge.net/).
    }
]


option!: make object! [
    type: 'option
    name: tokens: action: conflicts-with: desc: value: none
]

opt-arg!: make option! [
    type: 'opt-arg
    args: default: none
]

operand!: make object! [
    type: 'operand
    name: args: default: optional: ordinal: desc: value: none
]


command-parser: make object! [
    ; For internal use and debugging needs only.
    _option:  copy []
    _opt-arg: copy []
    _operand: copy []
    _token-map: copy []
    _spec: none

;     ; Add default options (if we decide not to do it in parse-command-line).
;     append _option reduce [
;         'help make option! [
;             name: 'help  tokens: [--help -h]  action: [show-usage]
;             desc: "Show usage information"
;         ]
;         'version make option! [
;             name: 'version  tokens: [--version]  action: [show-version]
;             desc: "Show version information"
;         ]
;     ]

    names: copy [] ; custom option names added to the object.
    operands: does [:_operand] ; public access to operands.


    clean-token: func [
        "Returns token less any attached arguments that come after an = sign."
        token
    ][
        ; This fails if token is empty or has only spaces before the = sign.
        to word! first parse/all token "="
    ]

    do-action: func [
        "Execute the action associated with the token."
        token arg
        /local obj act
    ][
        ;!! This routine is kludgey, because I put things together
        ;   in such a way that binding/evaluation issues are problematic.
        ;print ["do-action" token arg]
        ;attempt [
            obj: obj-from-token clean-token token
            either obj [
                act: copy any [
                    obj/action
                    ;?? Should we allow default values for options and use
                    ;   "not obj/default" here instead of true?
                    [set in obj 'value either opt-arg? token [arg][true]]
                ]
                ;print mold act
                replace/all act 'arg either word? arg [to lit-word! arg][arg]
                ;!! YAK (yet another kludge). MOLDing to prevent evaluation.
                replace/all act 'spec mold _spec
                ;print ["x" token arg type? attempt [last act] mold act]
                do act
            ][
                print ["Unknown token found:" token]
            ]
        ;]
    ]

    find-in-map: func [token][find _token-map to word! token]

    name-from-token: func [
        "Returns an option name given any token that maps to it."
        token /local pos
    ][
        either pos: find-in-map token [to word! first find pos lit-word!][none]
    ]

    obj-from-token: func [
        "Returns an option name given any token that maps to it."
        token /local pos
    ][
        either pos: find-in-map token [first find pos object!][none]
    ]

    opt-arg?: func [
        "Returns true if the token maps to an opt-arg; false otherwise."
        token
    ][
        attempt [select _opt-arg name-from-token clean-token token]
    ]


    parse-cl: func [
        {Parses a command line according to the settings (options, etc.)
        in the parent command-parser object. Returns the object filled
        with data from the parse operation.}
        data
        /local
            ;-- funcs
            get-args get-opt-args process-long-opt process-operand
            process-opt process-short-opts
            ;-- vars
            args arg arg-str
    ][

        ;-- Local Functions

        get-args: func [
            {Returns the arguments for the given token.}
            token [string!]
            obj
            /short full-token [string!]
        ][
            ;print [tab "get-arg:" token mold obj/args]
            any [
                ; For short options, with a single arg, it can be butted
                ; right up against them.
                all [
                    short
                    (2 < length? full-token)
                    ; Allow things like "-qfin-file.txt", where
                    ; -q is an option and -f is an opt-arg?
                    (copy next find full-token last token)
                ]
                ; Both short and long args can have their args after an = sign.
                ; e.g. -a=on, --mode=text

                ; Handle opt-args using <opt>=<arg> format.
                pick parse/all token "=" 2

                ; Get next <n> items from ARGS.
                get-opt-args obj
            ]
        ]

        get-opt-args: func [
            {Consumes the number of arguments specified for the given
            opt-arg object from ARGS and returns them.}
            obj /local result num-args
        ][
            ;print [tab tab "get-opt-args:" obj/name mold obj/args]
            num-args: either block? obj/args [length? obj/args][1]
            result: either num-args > 1 [
                copy/part next args num-args
            ][
                first next args
            ]
            args: skip args num-args
            result
        ]

        process-long-opt: func [
            {Consumes any arguments for the option and performs its actions.}
            token [string!] /local obj arg
        ][
            ;print ["Long Opt:" token]
            process-opt token
            ;print [tab "opt-arg?:" either obj [true][false] tab "arg:" arg]
        ]

        process-operand: func [arg] [
            ;print ["Operand:" arg]
            append self/_operand arg
            ;do-action arg ???
        ]

        process-opt: func [
            {Inner option processor, for short opts that have an <opt>=<arg>
            format and all long opts.}
            token /local obj arg
        ][
            if obj: opt-arg? token [arg: get-args token obj]
            ;print [
            ;    tab "process-opt:" token tab "opt-arg?:"
            ;    either obj [true][false] tab "arg:" arg
            ;]
            do-action token arg
        ]

        process-short-opts: func [
            {Consumes any arguments for the options and performs their actions.
            Handles both single and grouped tokens.}
            token [string!] /local obj tok arg
        ][
            ;print ["Short Opts:" token]
            either find token #"=" [
                process-opt token
            ][
                foreach char next token [   ; skip leading "-"
                    arg: none
                    ;print [tab "Short Opt:" join "-" char]
                    either obj: opt-arg? tok: join "-" char [
                        arg: get-args/short tok obj token
                        do-action tok arg
                        ;print [tab "Short opt-arg:" tok tab "arg:" arg]
                        break
                    ][
                        do-action tok arg
                        ;print [tab "Short option:" tok]
                    ]
                ]
            ]
        ]

        ;-- Processing

        args: to block! data
        while [not tail? args] [
            arg-str: form arg: first args
            switch/default true reduce [
                '-- = arg [
                    ;print "--  END OF OPTIONS"
                    append self/_operand next copy args
                    break ; break out of WHILE loop.
                ]

                "--" = copy/part arg-str 2 [process-long-opt arg-str]

                all [#"-" = arg-str/1  1 < length? arg-str] [
                    process-short-opts arg-str
                ]
            ][  ; default case
                process-operand arg
            ]
            args: next args
        ]
        self
    ]


    show-usage: func [spec /local pad emit rules words desc-col] [
        words: copy []  ; Parse variable
        desc-col: 20    ; The column where descriptions start.
        pad: func [string length][
             head insert/dup copy "" " " length - length? string
        ]
        emit: func [words desc /local str][
            print [str: form words  pad str desc-col  desc]
        ]
        rules: [
            some [
                some [set word word! (append words word)]
                set desc string! opt block!
                (emit words desc  clear words)
            ]
        ]

        print [
            "The command line usage is:^/^/"
            tab any [
                attempt [system/script/header/file]
                attempt [app-name]   ; Did they add their own exe/app word?
                "program"
            ] "[options] [operands]^/^/"
            "Options:^/"
        ]
        ;!! Right now we get the spec as a MOLDed block, from do-action,
        ;   which forces us to LOAD it here. Yuck.
        parse load spec rules
        halt
    ]

    show-version: does [
        print [
            any [
                attempt [system/script/header/file]
                "Version:"
            ]
            any [
                attempt [system/script/header/version]
                attempt [version]   ; Did they add their own VERSION word?
                "Unknown"
            ]
        ]
        halt
    ]


    ; Global functions

    set 'cl-layout func [
        "Builds a command-parser object from a comand line dialect spec."
        spec
        /local emit rules words result names
    ][
        result: make command-parser [_spec: spec]
        words: copy []
        names: copy []

        emit: func [
            words desc spec
            /local tokens word arg-name type obj el
        ][
            ; If the last word starts with -, it's an option, otherwise
            ; it's an opt-arg.
            set [tokens arg-name] either #"-" <> first word: form last words [
                ; Last item is the arg name
                reduce [(head remove back tail words) (to word! word)]
            ][
                ; If the last item starts with a single -, and is longer
                ; than two characters, the part after the first two
                ; chars is the arg name.
                either all [word/2 <> #"-"  2 < length? word] [
                    reduce [
                        head change back tail words reduce [
                            to word! copy/part word 2
                        ]
                        to word! at word 3
                    ]
                ][
                    reduce [words none]
                ]
            ]
            ; If there's no arg-name, it's an option.
            obj: make get to word! type: either arg-name [
                ; If there are no switch tokens left after removing the
                ; arg-name, it's an operand; otherwise it's an opt-arg.
                either empty? tokens ['operand!]['opt-arg!]
            ][
                'option!
            ] either spec [spec][[]]
            obj/desc: desc
            if none? obj/name [
                obj/name: any [
                    arg-name
                    to word! find/last/tail/part form first tokens "-" 3
                ]
            ]
            ; Set the default value for operands and opt-args.
            ; (options don't have default values)
            if all [(find [operand! opt-arg!] type) obj/default][
                obj/value: obj/default
            ]
            ; Add tokens for options and opt-args to the token-map.
            if find [option! opt-arg!] type [
                append result/_token-map obj/tokens: copy tokens
                append result/_token-map reduce [to lit-word! obj/name obj]
            ]
            ; Add the object to our result, into the correct element by type
            ; (option, opt-arg, or operand).
            el: select [
                option! _option opt-arg! _opt-arg operand! _operand
            ] type
            ;probe obj
            append names reduce [
                to set-word! obj/name 'does reduce [
                    to path! reduce [el obj/name 'value]
                ]
            ]
            append result/names obj/name
            append result/:el reduce [obj/name obj]
        ]

        rules: [
            some [
                some [set word word! (append words word)]
                set desc string!
                set spec opt block!
                (
                    emit words desc spec
                    clear words
                )
            ]
        ]

        parse spec rules
        result/names: unique result/names
        make result names
    ]


    set 'parse-command-line func [
        input   "Input series to parse; usually system/options/args."
        rules   "Rules to parse by; in command line spec dialect."
    ][
        ; Add default options. If they have same-named items in
        ; their spec, they will override (shadow) these.
        append rules [
            --help -h {Show usage information}   [action: [show-usage spec]]
            --version {Show version information} [action: [show-version]]
        ]
        do get in cl-layout rules 'parse-cl input
    ]

]


;--------------------------------------------------------------

; You can define your own functions to go with --help and --version
; switches you define to override the built-ins.
; show-usage:   does [print "USAGE"]
; show-version: does [print "Version 1.0"]


cl-spec: [
    ;-- Simple options
    --quiet   -q {quiet mode}   ; Default action will set value to true
    --verbose -v {verbose mode} ; "

; You can define your own HELP and VERSION switches to override
; the built-ins.
;     --help -h {Show usage information} [
;         action: [show-usage]
;     ]
;
;     --version {Show version information} [
;         action: [show-version]
;     ]

    ;-- This is an opt-arg that takes one argument, which it converts
    ;   to an integer, and uses a default value for it if none is given.
    -C n {
        Shows n lines of context before and after each change. diff
        marks lines removed from path1 with -, lines added to path2
        with + and lines changed in both files with !. This option
        conflicts with the -e and -f options.
    } [
        name:    'num-context-lines  ; overrides 'n
        args:    integer!
        action: [value: to integer! arg]
        default: 3
    ]

    ; The name comes from the arg-name, not the token. The args datatype
    ; values aren't used yet; only the number of args specified is
    ; important right now.
    -D ifname {
        Displays output that is the appropriate input to the
        C preprocessor to generate the contents of path2 when
        ifname is defined, and the contents of path1 when
        ifname is not defined.
    } [
        ;name:    'ifname           ; 'ifname by default
        args:    [[file! string!]]  ; check arg types against 'default type?
        action:  [value: to-rebol-file arg]
    ]

    ; Here the name comes from NAME in the spec block, not the arg-name.
    --outfile file {The output file} [
        name:    'outfile
        args:    [[file! string!]]
        action:  [value: to-rebol-file arg]
        default: %out-file.txt
    ]

    --err-file err-file {The error file} [
        args:    [[file! string!]]
        action:  [value: to-rebol-file arg]
        default: %err-file.txt
    ]

    ; This opt-arg takes two arguments
    --seek start stop {seek spec} [
        name: 'seek
        args: [integer! integer!]
    ]

    --mode op-mode {The operating mode} [args: [word!]]

    ; I think butting the arg name right up against the switch is
    ; not as nice, but it's legal (at least for now).
    -e?clude {Exclude item} []

    --dum-dum
    
    ; You can't include operands yet, the current USAGE implementation
    ; doesn't play well with them (they show up in-line with options
    ; and before the built-in options) and they muck up the parsing. :)
    ;input-file {The input file}

]


;==============================================================
;-- Example usage
;==============================================================

do-example: true ;false

if do-example [
    ; Dummy test command line
    print s: {
        -a -xyz -e Exclude -e=Include -epreclude
        -c5 %log-file.dat
        -qc7
        --verbose
        -q
        --mode binary --mode=text
        --seek 3 15
        -d"c:\test\text.txt"
        --outfile %output-data.txt
        --
        file-1 %file-2
    }
;        --version --help
;        --err-file %error-file.log

    print "------------------------------------------------"

    options: parse-command-line s cl-spec

    print "------------------------------------------------"

    foreach word options/names [print [word tab mold options/:word]]
    print ["operands:" mold options/operands]

    ;print "------------------------------------------------"

    ;print ""
    ;options/show-usage options/_spec
    ;print ""
    ;options/show-version

    print "================================================"
]

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.

Reply via email to