>On Behalf Of Marcello Perathoner
>Sent: Wednesday, November 02, 2005 1:37 PM
>To: plucker-dev@rubberchicken.org
>Subject: Re: Plucker server on Project Gutenberg
>
>Lambert, Mark wrote:
>
>> I don't know if this would help or not, but I always go off the HTML 
>> version and break on any H1 or H2.  That isn't perfect either, but is

>> easier to do.
>
>Not all PG ebooks have an HTML version.

True, and then I have to use regex to break things up and each book is
different... 
But it is low-hanging fruit that would make it simpler for those that
have HTML.

<H[12].*>(.*)</h[12]>
is much easier than
^(CHAPTER .*|BOOK .*|PART .*|PROLOGUE|EPILOGUE|ABOUT THE
AUTHOR|GLOSSARY|DRAMATIS PERSONA|CHARACTERS)$

Mark





E-Mail messages may contain viruses, worms, or other malicious code. By reading 
the message and opening any attachments, the recipient accepts full 
responsibility for taking protective action against such code. Sender is not 
liable for any loss or damage arising from this message.

The information in this e-mail is confidential and may be legally privileged. 
It is intended solely for the addressee(s). Access to this e-mail by anyone 
else is unauthorized.

_______________________________________________
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Reply via email to