> Good ideas though. I think we should make it easy to add a class for > each type of doc/mimetype and then wrap a set of parser rules around that > class. >
Good call. I've been working on this very thing. I realized as I was working on writing an xml parser that I was duplicating a lot of the methods, etc. from the other parsers. I'm working on a patch which would create a GenericParser() class which each content-specific Parser would inherit to eliminate a lot of potential code overlap, and be a little more 'plugin-friendly'. Here's what I'm thinking: 1. rename generic_parser() in Parser.py to handle_mimetype() or similar as this is more descriptive of what it does. Change it's if statement logic to search a dictionary of mimetypes and their parsers. 2. create a GenericParser class which has all methods/code common to all parsers. get_plucker_doc, get_images, get_anchors, setting self._doc = TextDocBuilder(), etc... 3. create a subdirectory in PyPlucker for the content specific parser classes, each in its own file. This would greatly simplify adding support for additional mime-types. To add a new content parser, one would only have to create a new file with their class and put an entry in a dictionary. My xml parser code is written in a similar manner, to support a new type of xml file one only has to create the ContentHandler and specify the root tag in a dictionary, the xml parser class would find and use it. If the above dictionaries were populated from an ini file or something, it gives the end user a lot of freedom to support whatever they want without modifying the actual plucker code in any way. And allows the user to explicitly disable a parser for a particular mime-type if they choose. What do ya think? PS -- Hope I'm not offending, the current codebase is well done. I'm just excited about the possibilities! ;) -- Dave <[EMAIL PROTECTED]> _______________________________________________ plucker-dev mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
