I don't know if this would help or not, but I always go off the HTML version and break on any H1 or H2. That isn't perfect either, but is easier to do.
-----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marcello Perathoner Sent: Wednesday, November 02, 2005 10:10 AM To: [email protected] Subject: Re: Plucker server on Project Gutenberg David A. Desrosiers wrote: >> I'm going to replace the text/plain parser with a custom one that >> will (try to) parse chapter heads, italics etc. out of the plain text. > > I'd be interested to see how you solve the context issue that has > been brought up on the pg lists over the last year or so. Its a very > complicated issue, and to date, nobody has solved it without trying to > reinvent the base PG text format into something different. I have the option of doing: pgtext > filter | PyPlucker > pdb or to write a custom parser for PyPlucker. The PG format has changed a lot over 30+ years. None of the 3rd-party tools I know is able to correctly parse all PG texts. The custom text/plain parser I'm writing will plug into PyPlucker and do a very simple analysis of the text. I'm not aiming at a 100% or even 99% solution. I'm just trying to make the average PG text look good enough for distribution. -- Marcello Perathoner [EMAIL PROTECTED] _______________________________________________ plucker-dev mailing list [email protected] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev E-Mail messages may contain viruses, worms, or other malicious code. By reading the message and opening any attachments, the recipient accepts full responsibility for taking protective action against such code. Sender is not liable for any loss or damage arising from this message. The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee(s). Access to this e-mail by anyone else is unauthorized. _______________________________________________ plucker-dev mailing list [email protected] http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
