Thank you for this relevant & precise review, Albert. I will answer specific topic, then give a overall introduction of the problem(s) adressed by this project, that may clarify a bit some topics.

<this is quite a long post>

A.T.Hofkamp a écrit :

> However, by moving the 'type' information to a seperate object, your classes become less cluttered, and the problem (and thus the solution) becomes easier to understand/program. > You have a Factory class + factory object (usually 1) that represents the real factory, and you have a Product class + many prduct objects for the products it makes.
>
> Each product object has properties and functions (ie its color and how you can use it). > Meta information, like what kind of products exist and how to make/transform them however is *not* part of the product object but belongs in the factory object. > (Look at real life, you can use a car to get from A to B and back, but a car cannot tell you what kinds of other cars exist, or how to install eg a radio.) > In your application, that would mean that the things you call 'instance' above are part of eg the 'format' object, and things you name 'type' above, are part of the 'formatfactory' (or 'formattype') object.
>
> The difference between your approach and the factory pattern imho is that the 'type' information of an object is not stored in its class, but in another object (of class Factory).
[...]

I think I get it better now.
FactoryCLass ==> factory
                   |
                "models"
                   |
                   v
              ObjectClass ==> objects

Couldn't the factory actually be the same object as ObjectClass?
Isn't this in a way similar to the use of meta-classes (that I have never used, even never explored)? I think I understand the point in this pattern, mainly for clarification. In the case of my projet, it could be used to implement the flexibility of the types/classes which depend on config parameters, and these parameters may change at runtime. However, this seems to me overload / over structuration / over abstraction. I don't make a heavy use of type/class attributes, and they rather clearly appear as what they are/mean.
(See also below for further clarification.)

>> class Format(Symbol):
>>   ''' block formats symbol '''
>>   # import relevant config data for this symbol type
>>   from_codes = config.formats.from_codes
>>   to_codes = config.formats.to_codes
>>   names = config.formats.names
>>   # open_format to be checked with closing tag (xhtml only)
>>   open_format = None
>> def __init__(self, source, mark, section_level, posV, posH, close = False,
>>                   list_type = None, list_level=0):
>>       # posV & posH ~ line & element numbers for error output
>>       # 'mark' can be a wiki code or an xhfml tag
>>       self.section_level = section_level        # = current title level
>>       self.indent = section_level * TAB         # for nicer xhtml output
>
> As a side-note:
> self.indent can be computed from self.section_level. In general, it is better in such cases not to store the computable attribute, but instead to compute whenever you need it. > This strategy prevents that you get inconsistencies in your data (ie suppose that self.section_level == 1 and self.indent == TAB + TAB at some point in the program, which of both is then the true value?)

Right! I let down self.indent. In my case, such an unexpected change could not happen, but I agree upon the better practice.

>
>> self.close = close # for automatic closing (wiki only)
>>       self.list_level = list_level              # nesting level
>>       self.list_type = list_type                # bullet or number
>> self.error = False # flag used for no output
>
> As a side-note:
> you may want to consider using doc strings for documenting your class and instance variables, see eg pydoc or epydoc.

Right again! I will move argument and attribute description to doc strings.

> A variable like "section_level" leads me to believe that Format class is a part of a page.

This is true. All (symbol) types, of which Format is a sample, represent document elements/language features. There is a type for plain text content, one for (block) format, one for additional 'aspect' (e.g. strong or a custom span class), one for links, etc... See below. Section_level is used for writing into wiki, because wiki codes for headers are usually of the form "===", where the number of chars means the section level. Additionally, it lets me write xhtml docs in a nicer form, where indentation represents the logical structure, similar to python code:
<h1> section 1 </h1>
some text
   <h2> section 1.1 </h2>
   some text
   <h2> section 1.2</h2>
   some text

By the way, I answer here your question about the meaning of "Format": after your remark, I called it back BlockFormat. A BlockFormat instance represents a block formatting mark. For instance:
table_cell  : xhtml <td> <--> wiki |
list_item   : xhtml <li> <--> wiki * or #
paragraph   : xhtml <p>  <--> wiki (implicit)
I first considered making a symbol type for each kind of block format. But I finally merged them all into BlockFormat, because the only major difference is their class (=CSS class), that becomes an instance attribute.

>
>>       # read & record symbol data
>>       if source == WIKI:
>>           self.from_wiki(mark, posV, posH)
>>           return
>>       if source == XHTML:
>>           self.from_xhtml(mark, posV, posH)
>
> This looks like a (factory!) class hierarchy you currently don't have. In OOP I would expect something like
>
> self.load(...) # 'self.from()' is not possible, since 'from' is a reserved word in Python
>
> and the underlying object type of 'self' would decide the conversion you actually perform.

In a sense, yes. I first considered creating a version of each type for each language. E.g. WikiBlockFormat & HTMLBlockFormat. Then I merged them because they actually represent the same thing: a common language feature. The type holds r/w methods for both languages. I actually find this clearer and more consistent than having specific types for each language, all having the same sense ("block format") & holding the same data (class, open/close status, additional sub-kind & nesting level for list items). Right? Or Have misunderstood?

> In the factory pattern, you'd have a formattype object (of class FormatType) that holds the global (config?) information, and each Format object may hold a reference to formattype (if you want).

Yes. The reference would mainly be used to access config data, in order to check the source text validity and to write (back) to a specific language.

>> Now, imagine the source is a wiki text, and the user wishes to output into a different wiki language. At some point between reading & writing will the config been rebuilt and, as a consequence, data held by each Symbol (sub-)type. As this is an action of the type, I wish it to be performed by the type. So:
>
> That's one option. Another option may be to make the page object language-agnostic (ie you have a 'list', a 'picture' and many more page elements, but not attached to a specific language.)

Exactly. You reach here the core of the model. The result of parsing is fully "language agnostic". An object I called tortue (turtle) ;-) parses the source document; it's a kind of state machine, as it needs to 'know' its position (e.g. start of a block) and 'remember' things like open tags. The result simply is a list of symbols which builds an abstract representation of the parsed text. Right? These symbols are independent of the input language -- actually it is *the* point; and are able to further 'express' themselves into any know language. Presently the turtle can only parse wiki and a third format (table, see below), not xhtml; but xhtml is simpler to parse (more explicit & regular) because less human-oriented.

[Side note: Each symbol type is implemented as a sub-type of a super-type called Symbol. Symbol presently is of nearly no use, but who knows? Now, thank to your explanations, I tend to see it as a symbol type factory! It could even hold the whole configuration, instead of each symbol holding its relevant part.]

> Also, you have a number of language objects that know how to convert the page to their language.
> A few class definitions to make it a bit clearer (hopefully):
>
> class Language(object):
>     """
> Base class containing symbols/tags of any language used in your application. It also states what operations you can do, and has common code.
>     """
>     def __init__(self):
>         # Setup common attributes (valid for all languages)
>
>     def load_page(self, ....):
>         """ Load a page """
>         raise NotImplementedError("Implement me in a derived class")
>
> Language class is not really needed, but very nice to make clear what operations you >can do with it.
>
> class WikiLanguage(Language):
>     """
>     Properties and symbols/tags that exist in the wiki language
>     """
>     def __init__(self):
>         Language.__init__()
>         self.list_tag = '*'
>
>     def load_page(self, ....):
>         # load page in wiki format, and return a Page object
>
> class XHtmlLanguage(Language):
>     """
>     Properties and symbols/tags that exist in the xhtml language
>     """
>     def __init__(self):
>         Language.__init__()
>         self.list_tag = 'li'
>
>     def load_page(self, ....):
>         # load page in xhtml format, and return a Page object
>
> For each language class that you have, you make a object (probably 1 for each language that you have). It describes what the language contains and how it performs its functions. > A Language object also knows how to load/save a page in its language, how to render it, etc.

Well, I understand your point of view. However, this is where I rather disagree. Additional information is probably needed for a constructive exchange, now. You will find some below.

> class Page(object):
>     """
>     A page of text
>     """
>     def __init__(self, lang):
>     self.symbols = [] # Contents of the page, list or tree of Element's

Exactly. Actually, you could replace 'Page' with Turtle (I see it as a dynamic thing) and add this methods:
   def symbolise(wiki_doc):
       <parse & save into self.symbols>
   def symbolise(html_doc):
       <to be done>

> class Element(object):
>     """
>     A concept that exists at a page (list, text, paragraph, picture, etc)
>     """

= Symbol -- except that symbol is not language specific

> class ListElement(Element):
>     ....
>
> class TextElement(Element):
>     .....

= Symbol sub-types -- ditto

> A page is simply a tree or a list of Elements. Since a page here is language-agnostic, it doesn't even need to know its language.
> (don't know whether this would work for you).

Perfectly well. I first started buil a tree-like model of a page. Then swithed to a simple list (that better matches both wiki and xhtml expression -- actually a series of token, with block highest level of structure -- the rest beeing implicit). Maybe I go back to tree model later, just as an additional tool. May alse be used for semantic parsing?

> Hope it makes some sense,

I'm rather impressed how clearly you're able to dive into a (for me rather complex) problem, without even knowing what kind of need it is supposed to meet. So if you like to know a bit more, I will here start from the start.

You may have a look at www.creole.org. Creole is an attempt to create an interwiki standard language, in order to allow users of several wiki engines (& languages) to contribute on 'foreign' wiki sites that use another format. I have had for a long time the idea that it is indeed possible to allow (programming) language customization. Implemented through an editor configuration layer, this allows both respect of a common standard for code sharing, and comfortable personal use (Gemütlichkeit). Right? This is similar to syntax highlighting or indent preferences. The /saving/ form is not touched (obviously, semantics neither). [Note that 95% of the programmers seem not to reach this point, as they argue that this would launch millions of weird versions of their beloved language into the wild.] [note: I plan to extend a python editor to allow this. I long for the day when I can get rid of the ':' at end of headers, use ':' for assignment, endly use '=' for "equals" -- among lot's of other (so important for me) details.] This applies for wiki of course. What I propose. A subset of xhtml, matching common wiki lang feature, can be used as saving/standard/exchange format. The present project, as an amateur work basically done for pleasure, was first an attempt to build a demonstration of how this may actually work. Now, it has become a bit more.

A list of requisites:
* use of several wiki lang configuration
* further customization (i.e. def of differences only)
* inner abstract represention (currently beeing refactored)
* r/w to/from presently configured wiki lang
* use an xhtml subset as saving format, thus
* r/w to from xhtml (write ok, read planned)
* r/w to/from table format (ok)
The last format is used for debug; but it's also a kind of template or bridge for DB r/w.

I had never thought at implementing whole language specifications as you propose above. This is actually an option. But first, here is how it works up to now: Each symbol type started as a kind of description of a common wiki language feature, for instance a token that introduces a section title:

type "title", level 3: creole ===  <--> xhtml <h3>...</h3>
According to this pattern, both following source text snippets
|I'm a **pride** cell
<te>I'm a <strong>pride</strong> cell</te>
will be innerly represented as the same list of symbols. Which output in table format is:
BlockFormat table_cell  open
Text    I'm a
SegmentAspect   important   open
Text    pride
SegmentAspect   important   close
Text     cell
BlockFormat table_cell  close
Conversely, this symbol list can be output in either form.

Now, as you clearly explain above, in order to implement language "super_objects", I will still need to build classes for each type of symbol. For instance a Link type, inside the XHTML_language object. Now, I need nearly the same object, with the same semantics and same held data inside the wiki_language object, and also inside the table_format object. Correct? So why not just add r/w methods for each language inside a single, common, symbol type?

Now, why the idea of language object didn't jump into my brain is probably because it is too hard for me! Anyway, I see several uses/advantages for it:
* for wiki, hold the current config
* hold language specific r/w rules (e.g. the <...>, presently a tool function does it)
* specify syntactic rules
This is the hard part for me! For instance,the wiki config is presently nearly only about lexik (ie choice of codes), only some syntactic details can be set -- eg whether an aspect code must be closed. The real syntax rules are hidden, actually implicit inside the r/w methods of each language and turtle's symbolisation method. Samples of such rule: * wiki: block format codes a single characters, lie at the start of block, are not closed * html : segment aspect take either a <xxx>...</xxx> or a <span class="xxx""> </span> form.
* both list nesting and list mixing are available
Now, if I could specify such rules into a set of parameters, then I could write a general parsing (symbolisation) method for turtle, that takes this rule set as parameter. Idem for each symbol type read & write method. Now, this seems much too difficult for programming talent: the way I see, it tends to a parser generator.

> Albert

Denis


_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to