subject:"Plucker server on Project Gutenberg"

Re: Plucker server on Project Gutenberg

2005-11-05 Thread Marcello Perathoner


The first experimental PG plucker server is up.

Find the no. of the ebook you want and then call this url:

  http://www.gutenberg.org/cache/plucker/17000.plucker

replace 17000 with your ebook no.


This will build the file, if not existent. That may take some time when 
the servers are busy. The second download should come much faster out of 
the cache.


If something awful happens, like no suitable source file was found, you 
should get an error page.


The source file will be HTML if available, else TXT. TXT files are 
parsed by a custom gutenberg parser, which works well enough on modern 
files but gets worse when applied to older ones.




BTW, what is the preferred mime-type for plucker ?



--
Marcello Perathoner
[EMAIL PROTECTED]

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

RE: Plucker server on Project Gutenberg

2005-11-03 Thread Lambert, Mark

> From: Marcello Perathoner
> Lambert, Mark wrote:
> 
> > But it is low-hanging fruit that would make it simpler for 
> those that 
> > have HTML.
> 
> If they have HTML, of course I use HTML. But more than half 
> of them don't.
> 

No worries.  I wasn't sure if you were or not so I thought I'd mention
it.  I read 2+ books a week on my palm and always have my eye out for
ways to make it easier. For example, I never work off TXT I always use
txt2html on it first, I always replace ellipses with with
…(sometimes thousands in a book) to make the file a little
smaller? and because I like the look better, break books up by
chapter(significantly speeds up display on my Clie) usually with
htmlsplitter (rekenwonder.com), etc.   
This week I have read Knife of Dreams(Library), The Penultimate
Peril(Own), Glory Road(Library), Plague Ship(Gutenberg), and Old
Nathan(Bean Free Library) on my Clie.

Mark

E-Mail messages may contain viruses, worms, or other malicious code. By reading 
the message and opening any attachments, the recipient accepts full 
responsibility for taking protective action against such code. Sender is not 
liable for any loss or damage arising from this message.

The information in this e-mail is confidential and may be legally privileged. 
It is intended solely for the addressee(s). Access to this e-mail by anyone 
else is unauthorized.

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Re: Plucker server on Project Gutenberg

2005-11-02 Thread Marcello Perathoner


Lambert, Mark wrote:


But it is low-hanging fruit that would make it simpler for those that
have HTML.


If they have HTML, of course I use HTML. But more than half of them don't.



--
Marcello Perathoner
[EMAIL PROTECTED]

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

RE: Plucker server on Project Gutenberg

2005-11-02 Thread Lambert, Mark

>On Behalf Of Marcello Perathoner
>Sent: Wednesday, November 02, 2005 1:37 PM
>To: plucker-dev@rubberchicken.org
>Subject: Re: Plucker server on Project Gutenberg
>
>Lambert, Mark wrote:
>
>> I don't know if this would help or not, but I always go off the HTML 
>> version and break on any H1 or H2.  That isn't perfect either, but is

>> easier to do.
>
>Not all PG ebooks have an HTML version.

True, and then I have to use regex to break things up and each book is
different... 
But it is low-hanging fruit that would make it simpler for those that
have HTML.

(.*)
is much easier than
^(CHAPTER .*|BOOK .*|PART .*|PROLOGUE|EPILOGUE|ABOUT THE
AUTHOR|GLOSSARY|DRAMATIS PERSONA|CHARACTERS)$

Mark

E-Mail messages may contain viruses, worms, or other malicious code. By reading 
the message and opening any attachments, the recipient accepts full 
responsibility for taking protective action against such code. Sender is not 
liable for any loss or damage arising from this message.

The information in this e-mail is confidential and may be legally privileged. 
It is intended solely for the addressee(s). Access to this e-mail by anyone 
else is unauthorized.

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Re: Plucker server on Project Gutenberg

2005-11-02 Thread Marcello Perathoner


Lambert, Mark wrote:


I don't know if this would help or not, but I always go off the HTML
version and break on any H1 or H2.  That isn't perfect either, but is
easier to do. 


Not all PG ebooks have an HTML version.


--
Marcello Perathoner
[EMAIL PROTECTED]

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

RE: Plucker server on Project Gutenberg

2005-11-02 Thread Lambert, Mark

I don't know if this would help or not, but I always go off the HTML
version and break on any H1 or H2.  That isn't perfect either, but is
easier to do. 

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Marcello
Perathoner
Sent: Wednesday, November 02, 2005 10:10 AM
To: plucker-dev@rubberchicken.org
Subject: Re: Plucker server on Project Gutenberg

David A. Desrosiers wrote:

>> I'm going to replace the text/plain parser with a custom one that 
>> will (try to) parse chapter heads, italics etc. out of the plain
text.
> 
> I'd be interested to see how you solve the context issue that has 
> been brought up on the pg lists over the last year or so. Its a very 
> complicated issue, and to date, nobody has solved it without trying to

> reinvent the base PG text format into something different.

I have the option of doing:

   pgtext > filter | PyPlucker > pdb

or

   to write a custom parser for PyPlucker.

The PG format has changed a lot over 30+ years. None of the 3rd-party
tools I know is able to correctly parse all PG texts.

The custom text/plain parser I'm writing will plug into PyPlucker and do
a very simple analysis of the text. I'm not aiming at a 100% or even 99%
solution. I'm just trying to make the average PG text look good enough
for distribution.

--
Marcello Perathoner
[EMAIL PROTECTED]

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

E-Mail messages may contain viruses, worms, or other malicious code. By reading 
the message and opening any attachments, the recipient accepts full 
responsibility for taking protective action against such code. Sender is not 
liable for any loss or damage arising from this message.

The information in this e-mail is confidential and may be legally privileged. 
It is intended solely for the addressee(s). Access to this e-mail by anyone 
else is unauthorized.

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Re: Plucker server on Project Gutenberg

2005-11-02 Thread Marcello Perathoner


David A. Desrosiers wrote:

I'm going to replace the text/plain parser with a custom one that will 
(try to) parse chapter heads, italics etc. out of the plain text.


I'd be interested to see how you solve the context issue that has 
been brought up on the pg lists over the last year or so. Its a very 
complicated issue, and to date, nobody has solved it without trying to 
reinvent the base PG text format into something different.


I have the option of doing:

  pgtext > filter | PyPlucker > pdb

or

  to write a custom parser for PyPlucker.


The PG format has changed a lot over 30+ years. None of the 3rd-party 
tools I know is able to correctly parse all PG texts.


The custom text/plain parser I'm writing will plug into PyPlucker and do 
a very simple analysis of the text. I'm not aiming at a 100% or even 99% 
solution. I'm just trying to make the average PG text look good enough 
for distribution.





--
Marcello Perathoner
[EMAIL PROTECTED]

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Re: Plucker server on Project Gutenberg

2005-11-02 Thread Marcello Perathoner


Alexander R. Pruss wrote:

That's a wonderful idea.  Are you going to be caching the pdbs, or will 
it be fast enough to generate on demand?


I'll have to cache them.


Are you going to be making the docs split into 32K pages, or will you 
use the continuation flag to make each doc look like a single page (this 
requires Plucker viewer 1.6)?


I use seamless and zlib. But if users request alternate formats, I may 
consider adding them, like images / no images, etc.




--
Marcello Perathoner
[EMAIL PROTECTED]

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Re: Plucker server on Project Gutenberg

2005-11-02 Thread David A. Desrosiers



I'm the webmaster of Project Gutenberg and I'm about to install the 
plucker distiller on the PG website. The idea is to have people 
download a ready-made plucker pdb instead of requiring them to run 
the distiller on the appropriate ebook file.


	There's a LOT of tools out there that do this, in varying 
degrees of success. Some are public, some are not. I've been mirroring 
the entire Gutenberg archive for awhile, and been doing automated 
conversions of the texts to Plucker, using various tools I've thrown 
together, but mostly for my own use and some private redistribution.


I'm going to replace the text/plain parser with a custom one that 
will (try to) parse chapter heads, italics etc. out of the plain 
text.


	I'd be interested to see how you solve the context issue that 
has been brought up on the pg lists over the last year or so. Its a 
very complicated issue, and to date, nobody has solved it without 
trying to reinvent the base PG text format into something different.



David A. Desrosiers
[EMAIL PROTECTED]
http://gnu-designs.com

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Re: Plucker server on Project Gutenberg

2005-11-02 Thread Alexander R. Pruss

That's a wonderful idea.  Are you going to be caching the pdbs, or will 
it be fast enough to generate on demand?


Sorry, don't know about sorting of bookmarks.  I myself added sorting of 
all records by URL to the parser, though, to keep chapters and the like 
in the right order.  Maybe the bookmark sorting is a side-effect of that.


Are you going to be making the docs split into 32K pages, or will you 
use the continuation flag to make each doc look like a single page (this 
requires Plucker viewer 1.6)?


Best wishes,

Alex

--
Dr. Alexander R. Pruss
Department of Philosophy
Georgetown University
Washington, DC 20057-1133  U.S.A.
e-mail: [EMAIL PROTECTED]
online papers and home page: www.georgetown.edu/faculty/ap85
--
"Philosophiam discimus non ut tantum sciamus, sed ut boni efficiamur."
- Paul of Worczyn (1424)
___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Plucker server on Project Gutenberg

2005-11-02 Thread Marcello Perathoner

I'm the webmaster of Project Gutenberg and I'm about to install the 
plucker distiller on the PG website. The idea is to have people download 
a ready-made plucker pdb instead of requiring them to run the distiller 
on the appropriate ebook file.


I'm going to replace the text/plain parser with a custom one that will 
(try to) parse chapter heads, italics etc. out of the plain text.


I encountered a couple of problems doing that. I'll send some more mails 
to describe them.



--
Marcello Perathoner
[EMAIL PROTECTED]

___
plucker-dev mailing list
plucker-dev@rubberchicken.org
http://lists.rubberchicken.org/mailman/listinfo/plucker-dev

Re: Plucker server on Project Gutenberg

RE: Plucker server on Project Gutenberg

Re: Plucker server on Project Gutenberg

RE: Plucker server on Project Gutenberg

Re: Plucker server on Project Gutenberg

RE: Plucker server on Project Gutenberg

Re: Plucker server on Project Gutenberg

Re: Plucker server on Project Gutenberg

Re: Plucker server on Project Gutenberg

Re: Plucker server on Project Gutenberg

Plucker server on Project Gutenberg

11 matches

Site Navigation

Mail list logo

Footer information