Hi Maxim,

On Wednesday, October 8, 2003, 6:29:03 PM, you wrote:

MOA> can you give a short example of a grammar that would extract the text from

MOA> <tag! tag content <subtag! its content?> <p> paragraph info</p>content end>

MOA> and returns a block such as:
[...]

Well, nested tags are not valid HTML so this does not handle them,
but  maybe  it could be of some inspiration. (Sorry for Joel-style
indentation. ;-)

tag-rule:
[   "<" m1:
    [   "/" word thru ">" (end-tag to word! word-res) 
      | "!--" thru "-->" m2: (add-contents to tag! copy/part m1 back m2)
      | "!DOCTYPE" thru ">" m2: (add-contents to tag! copy/part m1 back m2)
      | "?xml" thru "?>" m2: (add-contents to tag! copy/part m1 back m2)
      | word any space (clear attributes) any attribute ["/" (content: no) | none 
(content: yes)] ">"
        (open-tag to word! word-res attributes content)
]   ]

chars: complement charset {<>"'= ^/^-/}
value-chars: union chars charset "/"
word: [copy word-res some chars]
space: charset { ^/^-}
attributes: [ ]
attribute:
[   (wrs: word-res) word any space 
    [   "=" any space
        [   {"} copy value any dquoted-chars {"}
          | {'} copy value any squoted-chars {'}
          | copy value any value-chars
        ]   any space
      | (value: yes)
    ]   (insert insert tail attributes to word! word-res any [value copy ""] word-res: 
wrs)
]
dquoted-chars: complement charset {"}
squoted-chars: complement charset {'}

document-rule:
[   some 
    [   copy contents to "<" (add-contents contents) tag-rule 
      | copy contents to end (add-contents contents) break
]   ]

stack: [ ]
parsed: none

no-content-tags:
[   basefont br area link img param hr input col frame base meta]

open-tag:
    func [tagname attributes content? /local tag]
    [   if find no-content-tags tagname [content?: no]
        either content?
        [   tag: compose/deep [[(tagname) (attributes)]]
            insert/only tail last stack tag
            insert/only tail stack tag
        ]
        [   tag: compose [(tagname) (attributes)]
            insert/only tail last stack tag
    ]   ]
end-tag:
    func [tagname]
    [   stack: back tail stack
        if head? stack [exit] ; unmatched close tag
        while [tagname <> tagname-of stack/1]
        [   stack: back stack
            if head? stack [exit] ; unmatched close tag
        ]
        stack: head clear stack
    ]
add-contents:
    func [contents]
    [   if contents
        [   insert tail last stack contents
    ]   ]

parse-document:
    func [document]
    [   stack: clear head stack
        insert/only stack parsed: make block! 10
        parse/all document document-rule
        parsed
    ]


This is extracted from other code so it is possible that something
is missing. Example:

>> parse-document "<html><head><title>Title</title></head><body>This is 
>> a<br>test</body></html>"
== [[[html] [[head] [[title] "Title"]] [[body] "This is a" [br] "test"]]]
>> parse-document read http://www.rebol.com
== [[[HTML] "^/" [[HEAD] "^/" [META HTTP-EQUIV "Content-Type" CONTENT 
"text/html;CHARSET=iso-8859-1"] "^/" [META NAME "KEYWORDS" CO...

Regards,
   Gabriele.
-- 
Gabriele Santilli <[EMAIL PROTECTED]>  --  REBOL Programmer
Amiga Group Italia sez. L'Aquila  ---   SOON: http://www.rebol.it/

-- 
To unsubscribe from this list, just send an email to
[EMAIL PROTECTED] with unsubscribe as the subject.

Reply via email to