Walter-

Well, I just woke up from another migraine-induced "coma". Thanks for the tip. After re-examining the HTML files, your analysis seems logical and a lot simpler than what I would have done. Thanks for setting me on the right track.

-Scott

On Aug 18, 2006, at 5:29 PM, realbasic-nug- [EMAIL PROTECTED] wrote:

Message: 1
Subject: RE: RB Language Reference parsing?
From: "Walter Purvis" <[EMAIL PROTECTED]>
Date: Fri, 18 Aug 2006 15:16:27 -0400

 -----Original Message-----
 To determine whether or not a symbol in a REALbasic program
 has been defined in the REALbasic frameworks and determine
 its type (or return type if it's a method), I need to have
 information about all symbols in the frameworks.  Without
 direct access to framework symbol information or class
 introspection to get at least some of this information, my
 only other option is to parse the Language Reference HTML
 pages and try to build a table of framework symbols from it.

 So far, my only idea is to start with the LR's
 topiclist.html file and follow each of its links to get
 documentation on classes, modules, directives, and
 constants.  However, this would be complicated by the
 extensive cross-linking of the HTML files and the
 possibility of circular references.

 I'd appreciate any ideas or strategies anybody can offer for
 performing this parsing!!

Here's my thoughts after taking a brief glance at the LR structure.

I don't think you need to bother with following hyperlinks.

I would start by loading all of the individual topic files into a
dictionary, then go through and sort them out by what type of pages they
are.

You can tell a lot just by looking at the titles of the pages. E.g., the
pages that are the main page for a class all have a title like
"NameOfTheClass Class" -- so you could easily find the subset that
represents all the classes.

Then events, methods, and properties all have a title in the form of
NameOfTheClass.EventName Event (and similarly for methods and properties). You can link all of the events, methods, and properties back to their class
just by looking at the NameOfTheClass before the dot.

Then you just have to parse those e/m/p pages for types and parameters and returns; that shouldn't be too hard, since the HTML is always the same, as
far as I can tell.

If a page is not a class, and is not an Event/Property/Method of a class,
then it's either a function or a method, and the title is either
FunctionName Function or MethodName Method.

Hope that helps!

Dr. Scott Steinman
Brought to you by a grant from the Steinman Foundation (Thanks, Mom and Dad!)
Recommended by Major University Studies Over the Leading Brand
steinman at midsouth dot rr dot com

I hope I die peacefully in my sleep like my grandfather. . .not screaming in terror like his passengers. -- "Deep Thoughts", Jack Handy

_______________________________________________
Unsubscribe or switch delivery mode:
<http://www.realsoftware.com/support/listmanager/>

Search the archives of this list here:
<http://support.realsoftware.com/listarchives/lists.html>

Reply via email to