Parser TO-DO

Bill Janssen Wed, 03 Apr 2002 15:41:17 -0800

>               c.) Parallelizing the spider

I think this is an excellent idea.  I've worked out a state diagram
for a better retriever, and it looks fairly easy to implement.


Here's my list of parser TO-DOs (in priority order, but not
necessarily implementation order :-):

1)  Stylesheet (CSS) support in HTML pages.

2)  XHTML/OEBPS support -- basically, XML support.

3)  Pure Java version of the parser -- all that's needed is some code
    for JIU to generate Palm image format files, which I don't feel
    like writing, but which I'd be happy to describe to any interested volunteer.

4)  An improved text format record type, that will
    a) support seamless merging of text records into large pages
    b) support searching without previous decompression of the text record

5)  An improved image format that will support arbitrarily large
    images, captions, etc.

6)  Better retriever code.  There's an issue here about (a) moving to
    Python 2.*, which already contains better retriever code, which we
    could just use, and (b) the better retriever code should really be
    donated to the Python project as part of the Python standard library,
    instead of being released under GPL with Plucker.

7)  Support for the OBJECT tag in HTML/XHTML -- requires extensive
    restructuring of the parser control flow to allow recursion.

Bill

Parser TO-DO

Reply via email to