I just took a brief spotcheck on thier files. Most of them seem to be in a one carriage return is a soft break and two equal a new paragraph. Is there any options in Plucker to replace single CR with a space and leave double CR's alone (or mark a new paragraph)? I know Jpluck doesn;t have anything like that.
This would greatly improve readability. I could write a filter to do this in perl in probably 5 minutes, but that would only add another required software to the python pool.
I'd be willing to write something, but as mentioned previously, the ebooks come in many formats. I looked at a book of the Bible and Shakespeare's _The Comedy of Errors_ in .txt format. Either would be ruined by simply replacing <cr><space> with <space>. I assume that the HTML formatted files need no modifications?
FWIW, WordPerfect wrote something like this quite some time ago to import .txt files into WP format. The resulting document almost always needed cleaning up. Microsoft hasn't done much better. I doubt the value of writing the XSL, but tell me more precisely what you want (with the names of the sample documents). It could turn out to be easier to preprocess the files with something else before plucking. Regardless, document fidelity would require a HUMAN to review the conversion.
Ed
_______________________________________________ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list

