Plucker server on Project Gutenberg
I'm the webmaster of Project Gutenberg and I'm about to install the plucker distiller on the PG website. The idea is to have people download a ready-made plucker pdb instead of requiring them to run the distiller on the appropriate ebook file. I'm going to replace the text/plain parser with a custom one that will (try to) parse chapter heads, italics etc. out of the plain text. I encountered a couple of problems doing that. I'll send some more mails to describe them. -- Marcello Perathoner [EMAIL PROTECTED] ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
Why are bookmarks sorted?
I'm writing a custom text/plain parser. I'm parsing a text file and as I go along I add all chapter heads to the bookmark list using: PluckerDocs.PluckerTextDocument.add_bookmark When I look at the plucker database all bookmarks are sorted by title: Appendix A Appendix B Chapter 1 Chapter 10 Chapter 2 ... I'm wondering why bookmarks have to be sorted? Is there any reason to do that? The code that does this is in: PluckerDocs.PluckerBookmarkDocument.dump_record -- Marcello Perathoner [EMAIL PROTECTED] ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
Re: Plucker server on Project Gutenberg
That's a wonderful idea. Are you going to be caching the pdbs, or will it be fast enough to generate on demand? Sorry, don't know about sorting of bookmarks. I myself added sorting of all records by URL to the parser, though, to keep chapters and the like in the right order. Maybe the bookmark sorting is a side-effect of that. Are you going to be making the docs split into 32K pages, or will you use the continuation flag to make each doc look like a single page (this requires Plucker viewer 1.6)? Best wishes, Alex -- Dr. Alexander R. Pruss Department of Philosophy Georgetown University Washington, DC 20057-1133 U.S.A. e-mail: [EMAIL PROTECTED] online papers and home page: www.georgetown.edu/faculty/ap85 -- Philosophiam discimus non ut tantum sciamus, sed ut boni efficiamur. - Paul of Worczyn (1424) ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
Re: Plucker server on Project Gutenberg
Alexander R. Pruss wrote: That's a wonderful idea. Are you going to be caching the pdbs, or will it be fast enough to generate on demand? I'll have to cache them. Are you going to be making the docs split into 32K pages, or will you use the continuation flag to make each doc look like a single page (this requires Plucker viewer 1.6)? I use seamless and zlib. But if users request alternate formats, I may consider adding them, like images / no images, etc. -- Marcello Perathoner [EMAIL PROTECTED] ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
Re: Plucker server on Project Gutenberg
David A. Desrosiers wrote: I'm going to replace the text/plain parser with a custom one that will (try to) parse chapter heads, italics etc. out of the plain text. I'd be interested to see how you solve the context issue that has been brought up on the pg lists over the last year or so. Its a very complicated issue, and to date, nobody has solved it without trying to reinvent the base PG text format into something different. I have the option of doing: pgtext filter | PyPlucker pdb or to write a custom parser for PyPlucker. The PG format has changed a lot over 30+ years. None of the 3rd-party tools I know is able to correctly parse all PG texts. The custom text/plain parser I'm writing will plug into PyPlucker and do a very simple analysis of the text. I'm not aiming at a 100% or even 99% solution. I'm just trying to make the average PG text look good enough for distribution. -- Marcello Perathoner [EMAIL PROTECTED] ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
RE: Plucker server on Project Gutenberg
I don't know if this would help or not, but I always go off the HTML version and break on any H1 or H2. That isn't perfect either, but is easier to do. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Marcello Perathoner Sent: Wednesday, November 02, 2005 10:10 AM To: plucker-dev@rubberchicken.org Subject: Re: Plucker server on Project Gutenberg David A. Desrosiers wrote: I'm going to replace the text/plain parser with a custom one that will (try to) parse chapter heads, italics etc. out of the plain text. I'd be interested to see how you solve the context issue that has been brought up on the pg lists over the last year or so. Its a very complicated issue, and to date, nobody has solved it without trying to reinvent the base PG text format into something different. I have the option of doing: pgtext filter | PyPlucker pdb or to write a custom parser for PyPlucker. The PG format has changed a lot over 30+ years. None of the 3rd-party tools I know is able to correctly parse all PG texts. The custom text/plain parser I'm writing will plug into PyPlucker and do a very simple analysis of the text. I'm not aiming at a 100% or even 99% solution. I'm just trying to make the average PG text look good enough for distribution. -- Marcello Perathoner [EMAIL PROTECTED] ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev E-Mail messages may contain viruses, worms, or other malicious code. By reading the message and opening any attachments, the recipient accepts full responsibility for taking protective action against such code. Sender is not liable for any loss or damage arising from this message. The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee(s). Access to this e-mail by anyone else is unauthorized. ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
Re: Plucker server on Project Gutenberg
Lambert, Mark wrote: I don't know if this would help or not, but I always go off the HTML version and break on any H1 or H2. That isn't perfect either, but is easier to do. Not all PG ebooks have an HTML version. -- Marcello Perathoner [EMAIL PROTECTED] ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
RE: Plucker server on Project Gutenberg
On Behalf Of Marcello Perathoner Sent: Wednesday, November 02, 2005 1:37 PM To: plucker-dev@rubberchicken.org Subject: Re: Plucker server on Project Gutenberg Lambert, Mark wrote: I don't know if this would help or not, but I always go off the HTML version and break on any H1 or H2. That isn't perfect either, but is easier to do. Not all PG ebooks have an HTML version. True, and then I have to use regex to break things up and each book is different... But it is low-hanging fruit that would make it simpler for those that have HTML. H[12].*(.*)/h[12] is much easier than ^(CHAPTER .*|BOOK .*|PART .*|PROLOGUE|EPILOGUE|ABOUT THE AUTHOR|GLOSSARY|DRAMATIS PERSONA|CHARACTERS)$ Mark E-Mail messages may contain viruses, worms, or other malicious code. By reading the message and opening any attachments, the recipient accepts full responsibility for taking protective action against such code. Sender is not liable for any loss or damage arising from this message. The information in this e-mail is confidential and may be legally privileged. It is intended solely for the addressee(s). Access to this e-mail by anyone else is unauthorized. ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
Re: Plucker server on Project Gutenberg
Lambert, Mark wrote: But it is low-hanging fruit that would make it simpler for those that have HTML. If they have HTML, of course I use HTML. But more than half of them don't. -- Marcello Perathoner [EMAIL PROTECTED] ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev
Re: Why are bookmarks sorted?
---Reply to mail from Marcello Perathoner about Why are bookmarks sorted? I'm writing a custom text/plain parser. I'm parsing a text file and as I go along I add all chapter heads to the bookmark list using: PluckerDocs.PluckerTextDocument.add_bookmark [...] I'm wondering why bookmarks have to be sorted? Is there any reason to do that? I wrote the bookmark code in PluckerDocs.py. They are sorted 'cause with 'regular' bookmarks (Conclusion, Other Links, etc) they are easier to find. Maybe sorting shoud be (yet) another option. In the short term you can comment out the line that calls the sort ( the_keys.sort() ). ---End reply Christopher R. Hawks HAWKSoft - Any research done on how to efficiently use computers has been long lost in the mad rush to upgrade systems to do things that aren't needed by people who don't understand what they are really supposed to do with them. -- Graham Reed, in a.s.r. ___ plucker-dev mailing list plucker-dev@rubberchicken.org http://lists.rubberchicken.org/mailman/listinfo/plucker-dev