I don't know if this would help or not, but I always go off the HTML
version and break on any H1 or H2.  That isn't perfect either, but is
easier to do. 

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marcello
Perathoner
Sent: Wednesday, November 02, 2005 10:10 AM
To: [email protected]
Subject: Re: Plucker server on Project Gutenberg

David A. Desrosiers wrote:

>> I'm going to replace the text/plain parser with a custom one that 
>> will (try to) parse chapter heads, italics etc. out of the plain
text.
> 
>     I'd be interested to see how you solve the context issue that has 
> been brought up on the pg lists over the last year or so. Its a very 
> complicated issue, and to date, nobody has solved it without trying to

> reinvent the base PG text format into something different.

I have the option of doing:

   pgtext > filter | PyPlucker > pdb

or

   to write a custom parser for PyPlucker.


The PG format has changed a lot over 30+ years. None of the 3rd-party
tools I know is able to correctly parse all PG texts.

The custom text/plain parser I'm writing will plug into PyPlucker and do
a very simple analysis of the text. I'm not aiming at a 100% or even 99%
solution. I'm just trying to make the average PG text look good enough
for distribution.




--
Marcello Perathoner
[EMAIL PROTECTED]

_______________________________________________
plucker-dev mailing list
[email protected]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev





E-Mail messages may contain viruses, worms, or other malicious code. By reading 
the message and opening any attachments, the recipient accepts full 
responsibility for taking protective action against such code. Sender is not 
liable for any loss or damage arising from this message.

The information in this e-mail is confidential and may be legally privileged. 
It is intended solely for the addressee(s). Access to this e-mail by anyone 
else is unauthorized.

_______________________________________________
plucker-dev mailing list
[email protected]
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Reply via email to