functions]

spir Sat, 01 Nov 2008 09:22:48 -0700

Thank you for this relevant & precise review, Albert. I will answerspecific topic, then give a overall introduction of the problem(s)adressed by this project, that may clarify a bit some topics.


<this is quite a long post>


A.T.Hofkamp a écrit :

> However, by moving the 'type' information to a seperate object, yourclasses become less cluttered, and the problem (and thus the solution)becomes easier to understand/program.> You have a Factory class + factory object (usually 1) that representsthe real factory, and you have a Product class + many prduct objects forthe products it makes.

> Each product object has properties and functions (ie its color andhow you can use it).> Meta information, like what kind of products exist and how tomake/transform them however is *not* part of the product object butbelongs in the factory object.> (Look at real life, you can use a car to get from A to B and back,but a car cannot tell you what kinds of other cars exist, or how toinstall eg a radio.)> In your application, that would mean that the things you call'instance' above are part of eg the 'format' object, and things you name'type' above, are part of the 'formatfactory' (or 'formattype') object.

> The difference between your approach and the factory pattern imho isthat the 'type' information of an object is not stored in its class, butin another object (of class Factory).

[...]

I think I get it better now.
FactoryCLass ==> factory
                   |
                "models"
                   |
                   v
              ObjectClass ==> objects

Couldn't the factory actually be the same object as ObjectClass?

Isn't this in a way similar to the use of meta-classes (that I havenever used, even never explored)?I think I understand the point in this pattern, mainly forclarification. In the case of my projet, it could be used to implementthe flexibility of the types/classes which depend on config parameters,and these parameters may change at runtime. However, this seems to meoverload / over structuration / over abstraction. I don't make a heavyuse of type/class attributes, and they rather clearly appear as whatthey are/mean.

(See also below for further clarification.)

>> class Format(Symbol):
>>   ''' block formats symbol '''
>>   # import relevant config data for this symbol type
>>   from_codes = config.formats.from_codes
>>   to_codes = config.formats.to_codes
>>   names = config.formats.names
>>   # open_format to be checked with closing tag (xhtml only)
>>   open_format = None

>> def __init__(self, source, mark, section_level, posV, posH, close= False,

>>                   list_type = None, list_level=0):
>>       # posV & posH ~ line & element numbers for error output
>>       # 'mark' can be a wiki code or an xhfml tag
>>       self.section_level = section_level        # = current title level
>>       self.indent = section_level * TAB         # for nicer xhtml output
>
> As a side-note:

> self.indent can be computed from self.section_level. In general, itis better in such cases not to store the computable attribute, butinstead to compute whenever you need it.> This strategy prevents that you get inconsistencies in your data (iesuppose that self.section_level == 1 and self.indent == TAB + TAB atsome point in the program, which of both is then the true value?)

Right! I let down self.indent. In my case, such an unexpected changecould not happen, but I agree upon the better practice.

>> self.close = close # for automaticclosing (wiki only)

>>       self.list_level = list_level              # nesting level
>>       self.list_type = list_type                # bullet or number

>> self.error = False # flag used for nooutput

>
> As a side-note:

> you may want to consider using doc strings for documenting your classand instance variables, see eg pydoc or epydoc.


Right again! I will move argument and attribute description to doc strings.

> A variable like "section_level" leads me to believe that Format classis a part of a page.

This is true. All (symbol) types, of which Format is a sample, representdocument elements/language features. There is a type for plain textcontent, one for (block) format, one for additional 'aspect' (e.g.strong or a custom span class), one for links, etc... See below.Section_level is used for writing into wiki, because wiki codes forheaders are usually of the form "===", where the number of chars meansthe section level. Additionally, it lets me write xhtml docs in a nicerform, where indentation represents the logical structure, similar topython code:

<h1> section 1 </h1>
some text
   <h2> section 1.1 </h2>
   some text
   <h2> section 1.2</h2>
   some text

By the way, I answer here your question about the meaning of "Format":after your remark, I called it back BlockFormat. A BlockFormat instancerepresents a block formatting mark. For instance:

table_cell  : xhtml <td> <--> wiki |
list_item   : xhtml <li> <--> wiki * or #
paragraph   : xhtml <p>  <--> wiki (implicit)

I first considered making a symbol type for each kind of block format.But I finally merged them all into BlockFormat, because the only majordifference is their class (=CSS class), that becomes an instance attribute.


>
>>       # read & record symbol data
>>       if source == WIKI:
>>           self.from_wiki(mark, posV, posH)
>>           return
>>       if source == XHTML:
>>           self.from_xhtml(mark, posV, posH)
>

> This looks like a (factory!) class hierarchy you currently don'thave. In OOP I would expect something like

> self.load(...) # 'self.from()' is not possible, since 'from' is areserved word in Python

> and the underlying object type of 'self' would decide the conversionyou actually perform.

In a sense, yes. I first considered creating a version of each type foreach language. E.g. WikiBlockFormat & HTMLBlockFormat. Then I mergedthem because they actually represent the same thing: a common languagefeature. The type holds r/w methods for both languages. I actually findthis clearer and more consistent than having specific types for eachlanguage, all having the same sense ("block format") & holding the samedata (class, open/close status, additional sub-kind & nesting level forlist items). Right? Or Have misunderstood?

> In the factory pattern, you'd have a formattype object (of classFormatType) that holds the global (config?) information, and each Formatobject may hold a reference to formattype (if you want).

Yes. The reference would mainly be used to access config data, in orderto check the source text validity and to write (back) to a specificlanguage.

>> Now, imagine the source is a wiki text, and the user wishes tooutput into a different wiki language. At some point between reading &writing will the config been rebuilt and, as a consequence, data held byeach Symbol (sub-)type. As this is an action of the type, I wish it tobe performed by the type. So:

> That's one option. Another option may be to make the page objectlanguage-agnostic (ie you have a 'list', a 'picture' and many more pageelements, but not attached to a specific language.)

Exactly. You reach here the core of the model. The result of parsing isfully "language agnostic". An object I called tortue (turtle) ;-) parsesthe source document; it's a kind of state machine, as it needs to 'know'its position (e.g. start of a block) and 'remember' things like opentags. The result simply is a list of symbols which builds an abstractrepresentation of the parsed text. Right? These symbols are independentof the input language -- actually it is *the* point; and are able tofurther 'express' themselves into any know language.Presently the turtle can only parse wiki and a third format (table, seebelow), not xhtml; but xhtml is simpler to parse (more explicit &regular) because less human-oriented.

[Side note: Each symbol type is implemented as a sub-type of asuper-type called Symbol. Symbol presently is of nearly no use, but whoknows? Now, thank to your explanations, I tend to see it as a symboltype factory! It could even hold the whole configuration, instead ofeach symbol holding its relevant part.]

> Also, you have a number of language objects that know how to convertthe page to their language.

> A few class definitions to make it a bit clearer (hopefully):
>
> class Language(object):
>     """

> Base class containing symbols/tags of any language used in yourapplication. It also states what operations you can do, and has common code.

>     """
>     def __init__(self):
>         # Setup common attributes (valid for all languages)
>
>     def load_page(self, ....):
>         """ Load a page """
>         raise NotImplementedError("Implement me in a derived class")
>

> Language class is not really needed, but very nice to make clear whatoperations you >can do with it.

>
> class WikiLanguage(Language):
>     """
>     Properties and symbols/tags that exist in the wiki language
>     """
>     def __init__(self):
>         Language.__init__()
>         self.list_tag = '*'
>
>     def load_page(self, ....):
>         # load page in wiki format, and return a Page object
>
> class XHtmlLanguage(Language):
>     """
>     Properties and symbols/tags that exist in the xhtml language
>     """
>     def __init__(self):
>         Language.__init__()
>         self.list_tag = 'li'
>
>     def load_page(self, ....):
>         # load page in xhtml format, and return a Page object
>

> For each language class that you have, you make a object (probably 1for each language that you have). It describes what the languagecontains and how it performs its functions.> A Language object also knows how to load/save a page in its language,how to render it, etc.

Well, I understand your point of view. However, this is where I ratherdisagree. Additional information is probably needed for a constructiveexchange, now. You will find some below.


> class Page(object):
>     """
>     A page of text
>     """
>     def __init__(self, lang):
>     self.symbols = [] # Contents of the page, list or tree of Element's

Exactly. Actually, you could replace 'Page' with Turtle (I see it as adynamic thing) and add this methods:

   def symbolise(wiki_doc):
       <parse & save into self.symbols>
   def symbolise(html_doc):
       <to be done>

> class Element(object):
>     """
>     A concept that exists at a page (list, text, paragraph, picture, etc)
>     """

= Symbol -- except that symbol is not language specific

> class ListElement(Element):
>     ....
>
> class TextElement(Element):
>     .....

= Symbol sub-types -- ditto

> A page is simply a tree or a list of Elements. Since a page here islanguage-agnostic, it doesn't even need to know its language.

> (don't know whether this would work for you).

Perfectly well. I first started buil a tree-like model of a page. Thenswithed to a simple list (that better matches both wiki and xhtmlexpression -- actually a series of token, with block highest level ofstructure -- the rest beeing implicit). Maybe I go back to tree modellater, just as an additional tool. May alse be used for semantic parsing?


> Hope it makes some sense,

I'm rather impressed how clearly you're able to dive into a (for merather complex) problem, without even knowing what kind of need it issupposed to meet. So if you like to know a bit more, I will here startfrom the start.

You may have a look at www.creole.org. Creole is an attempt to create aninterwiki standard language, in order to allow users of several wikiengines (& languages) to contribute on 'foreign' wiki sites that useanother format.I have had for a long time the idea that it is indeed possible to allow(programming) language customization. Implemented through an editorconfiguration layer, this allows both respect of a common standard forcode sharing, and comfortable personal use (Gemütlichkeit). Right? Thisis similar to syntax highlighting or indent preferences. The /saving/form is not touched (obviously, semantics neither).[Note that 95% of the programmers seem not to reach this point, as theyargue that this would launch millions of weird versions of their belovedlanguage into the wild.][note: I plan to extend a python editor to allow this. I long for theday when I can get rid of the ':' at end of headers, use ':' forassignment, endly use '=' for "equals" -- among lot's of other (soimportant for me) details.]This applies for wiki of course. What I propose. A subset of xhtml,matching common wiki lang feature, can be used assaving/standard/exchange format. The present project, as an amateur workbasically done for pleasure, was first an attempt to build ademonstration of how this may actually work. Now, it has become a bit more.


A list of requisites:
* use of several wiki lang configuration
* further customization (i.e. def of differences only)
* inner abstract represention (currently beeing refactored)
* r/w to/from presently configured wiki lang
* use an xhtml subset as saving format, thus
* r/w to from xhtml (write ok, read planned)
* r/w to/from table format (ok)

The last format is used for debug; but it's also a kind of template orbridge for DB r/w.

I had never thought at implementing whole language specifications as youpropose above. This is actually an option. But first, here is how itworks up to now:Each symbol type started as a kind of description of a common wikilanguage feature, for instance a token that introduces a section title:


type "title", level 3: creole ===  <--> xhtml <h3>...</h3>
According to this pattern, both following source text snippets
|I'm a **pride** cell
<te>I'm a <strong>pride</strong> cell</te>

will be innerly represented as the same list of symbols. Which output intable format is:

BlockFormat table_cell  open
Text    I'm a
SegmentAspect   important   open
Text    pride
SegmentAspect   important   close
Text     cell
BlockFormat table_cell  close
Conversely, this symbol list can be output in either form.

Now, as you clearly explain above, in order to implement language"super_objects", I will still need to build classes for each type ofsymbol. For instance a Link type, inside the XHTML_language object. Now,I need nearly the same object, with the same semantics and same helddata inside the wiki_language object, and also inside the table_formatobject. Correct? So why not just add r/w methods for each languageinside a single, common, symbol type?

Now, why the idea of language object didn't jump into my brain isprobably because it is too hard for me! Anyway, I see severaluses/advantages for it:

* for wiki, hold the current config

* hold language specific r/w rules (e.g. the <...>, presently a toolfunction does it)

* specify syntactic rules

This is the hard part for me! For instance,the wiki config is presentlynearly only about lexik (ie choice of codes), only some syntacticdetails can be set -- eg whether an aspect code must be closed. The realsyntax rules are hidden, actually implicit inside the r/w methods ofeach language and turtle's symbolisation method. Samples of such rule:* wiki: block format codes a single characters, lie at the start ofblock, are not closed* html : segment aspect take either a <xxx>...</xxx> or a <spanclass="xxx""> </span> form.

* both list nesting and list mixing are available

Now, if I could specify such rules into a set of parameters, then Icould write a general parsing (symbolisation) method for turtle, thattakes this rule set as parameter. Idem for each symbol type read & writemethod. Now, this seems much too difficult for programming talent: theway I see, it tends to a parser generator.


> Albert

Denis


_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] [Re: class/type methods/functions]

Reply via email to