Walter-
Well, I just woke up from another migraine-induced "coma". Thanks for
the tip. After re-examining the HTML files, your analysis seems
logical and a lot simpler than what I would have done. Thanks for
setting me on the right track.
-Scott
On Aug 18, 2006, at 5:29 PM, realbasic-nug-
[EMAIL PROTECTED] wrote:
Message: 1
Subject: RE: RB Language Reference parsing?
From: "Walter Purvis" <[EMAIL PROTECTED]>
Date: Fri, 18 Aug 2006 15:16:27 -0400
-----Original Message-----
To determine whether or not a symbol in a REALbasic program
has been defined in the REALbasic frameworks and determine
its type (or return type if it's a method), I need to have
information about all symbols in the frameworks. Without
direct access to framework symbol information or class
introspection to get at least some of this information, my
only other option is to parse the Language Reference HTML
pages and try to build a table of framework symbols from it.
So far, my only idea is to start with the LR's
topiclist.html file and follow each of its links to get
documentation on classes, modules, directives, and
constants. However, this would be complicated by the
extensive cross-linking of the HTML files and the
possibility of circular references.
I'd appreciate any ideas or strategies anybody can offer for
performing this parsing!!
Here's my thoughts after taking a brief glance at the LR structure.
I don't think you need to bother with following hyperlinks.
I would start by loading all of the individual topic files into a
dictionary, then go through and sort them out by what type of pages
they
are.
You can tell a lot just by looking at the titles of the pages.
E.g., the
pages that are the main page for a class all have a title like
"NameOfTheClass Class" -- so you could easily find the subset that
represents all the classes.
Then events, methods, and properties all have a title in the form of
NameOfTheClass.EventName Event (and similarly for methods and
properties).
You can link all of the events, methods, and properties back to
their class
just by looking at the NameOfTheClass before the dot.
Then you just have to parse those e/m/p pages for types and
parameters and
returns; that shouldn't be too hard, since the HTML is always the
same, as
far as I can tell.
If a page is not a class, and is not an Event/Property/Method of a
class,
then it's either a function or a method, and the title is either
FunctionName Function or MethodName Method.
Hope that helps!
Dr. Scott Steinman
Brought to you by a grant from the Steinman Foundation (Thanks, Mom
and Dad!)
Recommended by Major University Studies Over the Leading Brand
steinman at midsouth dot rr dot com
I hope I die peacefully in my sleep like my grandfather. . .not
screaming in terror like his passengers. -- "Deep Thoughts", Jack Handy
_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>
Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>