>On Behalf Of Marcello Perathoner >Sent: Wednesday, November 02, 2005 1:37 PM >To: plucker-dev@rubberchicken.org >Subject: Re: Plucker server on Project Gutenberg > >Lambert, Mark wrote: > >> I don't know if this would help or not, but I always go off the HTML >> version and break on any H1 or H2. That isn't perfect either, but is
>> easier to do. > >Not all PG ebooks have an HTML version. True, and then I have to use regex to break things up and each book is different... But it is low-hanging fruit that would make it simpler for those that have HTML. <H[12].*>(.*)</h[12]> is much easier than ^(CHAPTER .*|BOOK .*|PART .*|PROLOGUE|EPILOGUE|ABOUT THE AUTHOR|GLOSSARY|DRAMATIS PERSONA|CHARACTERS)$ Mark E-Mail messages may contain viruses, worms, or other malicious code. By reading the message and opening any attachments, the recipient accepts full responsibility for taking protective action against such code. Sender is not liable for any loss or damage arising from this message. The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee(s). Access to this e-mail by anyone else is unauthorized. _______________________________________________ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev