Re: LyX to Kindle, points to remember

2011-12-07 Thread Guenter Milde
On 2011-12-06, Steve Litt wrote:
> On Tuesday, December 06, 2011 04:57:07 AM Guenter Milde wrote:
>> On 2011-12-04, Steve Litt wrote:

>> > So far I've discovered that LyX pagebreaks don't translate to
>> > Kindle page breaks (start at top of reader). 

Both, LyXHTML and eLyXer should transform "hard" (manual) page breaks
into a  or similar. (-> Test and report to the
authors if otherwise)

This could then be combined with a rule like

  .pagebreak {page-break-after: always;}

in the CSS style file.

>> > Therefore, to every  should be added the property
>> > style="page-break-before: always;". That way every chapter starts at
>> > the top of the reader.

>> I think this could/should be done better in a global CSS rule.

> how do I write a CSS rule that makes all  pagefeed before 
> printing?

My "CSS pocket reference" says::

  h1 {page-break-before: always;}

Günter



Re: LyX to Kindle, points to remember

2011-12-06 Thread Steve Litt
On Tuesday, December 06, 2011 04:57:07 AM Guenter Milde wrote:
> On 2011-12-04, Steve Litt wrote:
> > Hi all,
> > 
> > I'm making this thread in hopes that everyone who knows something
> > on the subject will add something, and at the end we'll all know
> > how to go from LyX to Kindle.
> > 
> > I'm quickly coming to the conclusion that the best way to do it
> > is to write a post-processor for Alex's HTML. Simple, modular,
> > and I can do it myself (and obviously make it free software).
> > The post-processor might need to read the LyX file itself to get
> > a few bits of information not in Alex's HTML file.
> > 
> > So far I've discovered that LyX pagebreaks don't translate to
> > Kindle page breaks (start at top of reader). Therefore, to every
> >  should be added the property style="page-break-before:
> > always;". That way every chapter starts at the top of the
> > reader.
> 
> I think this could/should be done better in a global CSS rule.

Very possibly. I know so little about CSS it didn't even occur to me. 
I'm a parse and change kinda guy, so that's just what I did. For 
later, how do I write a CSS rule that makes all  pagefeed before 
printing? The more I think about it, putting this in CSS would be 
better because the author can change it without changing Python code.

> 
> > I assume the official interpreter of the LyX project is Python,
> > and if that assumption's true I'll make the postprocessor in
> > Python. This post processing will be a heck of a lot easier if
> > someone can point me to a good XML or HTML parser for Python, so
> > I can look at nodes and attributes instead of trying to parse
> > tags. I've noticed Alex's HTML appears very standard, with
> > matching start and end tags and the like. This should
> > theoretically make my job easier. So anyone have any suggestions
> > for HTML or XML parsers for Python?
> 
> beautifulsoup http://pypi.python.org/pypi/BeautifulSoup/3.2.0
> and the standard library xml.* submodules:
> 
> xml.dom
> xml.dom.minidom
> xml.dom.pulldom
> xml.etree.ElementTree
> xml.parsers.expat
> xml.sax
> xml.sax.handler
> xml.sax.saxutils
> xml.sax.xmlreader

I've chosen HTMLParser because:

1) Most ubiquitous documentation
2) Seems to be native Python
3) Event driven, no need to build dom tree
4) Relatively easy
5) It works

#3 was important to me in case somebody with a 400K word book, having 
lots of little paragraphs with lots of character styles in each 
paragraph, that whole albatross won't need to be in RAM at one time. 
Of course such a book would take a long time to parse, but probably 
someone with a book that size is used to things taking a long time.

From what I hear, lxml is by far the fastest, but my understanding is 
it parses to a dom tree. From what I hear 
(http://blog.ianbicking.org/2008/03/30/python-html-parser-
performance/), HTMLParser is one of the faster ones, though nowhere 
near as fast as lxml, and it's fairly easy on RAM because it outputs 
events, not a dom tree.

My postprocessor is a proof of concept -- its design decisions can be 
changed later.

[clip]
 
> Keep up the work.

Thank you!

SteveT
 
Steve Litt
Author: The Key to Everyday Excellence
http://www.troubleshooters.com/bookstore/key_excellence.htm
Twitter: http://www.twitter.com/stevelitt



Re: LyX to Kindle, points to remember

2011-12-06 Thread Guenter Milde
On 2011-12-04, Steve Litt wrote:
> Hi all,

> I'm making this thread in hopes that everyone who knows something on 
> the subject will add something, and at the end we'll all know how to 
> go from LyX to Kindle.

> I'm quickly coming to the conclusion that the best way to do it is to 
> write a post-processor for Alex's HTML. Simple, modular, and I can do 
> it myself (and obviously make it free software). The post-processor 
> might need to read the LyX file itself to get a few bits of information 
> not in Alex's HTML file.

> So far I've discovered that LyX pagebreaks don't translate to Kindle 
> page breaks (start at top of reader). Therefore, to every  should 
> be added the property style="page-break-before: always;". That way 
> every chapter starts at the top of the reader.

I think this could/should be done better in a global CSS rule.

> I assume the official interpreter of the LyX project is Python, and if 
> that assumption's true I'll make the postprocessor in Python. This 
> post processing will be a heck of a lot easier if someone can point me 
> to a good XML or HTML parser for Python, so I can look at nodes and 
> attributes instead of trying to parse tags. I've noticed Alex's HTML 
> appears very standard, with matching start and end tags and the like. 
> This should theoretically make my job easier. So anyone have any 
> suggestions for HTML or XML parsers for Python?

beautifulsoup http://pypi.python.org/pypi/BeautifulSoup/3.2.0
and the standard library xml.* submodules:

xml.dom
xml.dom.minidom
xml.dom.pulldom
xml.etree.ElementTree
xml.parsers.expat
xml.sax
xml.sax.handler
xml.sax.saxutils
xml.sax.xmlreader

> Other things I've noticed:

> 1) The LyX table of contents, when translated to HTML and then to 
> Kindle format, crashes the Kindle previewer, so it must be removed 
> from the HTML file and used to create an NCX TOC.

> 2) With the Kindle you probably don't want the title page, so there 
> should be a post-processor option to capture the author, title and 
> date, and then remove everything from the Title or Author 
> environmented text to the next .

> Hopefully this thread will serve as an accumulation of knowledge 
> resulting in a post-processor. The post processor may simply serve as 
> a stepping stone to a "real conversion". I've found that often the 
> best specification for the right solution is obtained by implementing 
> and evaluating a quick and dirty solution.

Keep up the work.

Günter



LyX to Kindle, points to remember

2011-12-03 Thread Steve Litt
Hi all,

I'm making this thread in hopes that everyone who knows something on 
the subject will add something, and at the end we'll all know how to 
go from LyX to Kindle.

I'm quickly coming to the conclusion that the best way to do it is to 
write a post-processor for Alex's HTML. Simple, modular, and I can do 
it myself (and obviously make it free software). The post-processor 
might need to read the LyX file itself to get a few bits of information 
not in Alex's HTML file.

So far I've discovered that LyX pagebreaks don't translate to Kindle 
page breaks (start at top of reader). Therefore, to every  should 
be added the property style="page-break-before: always;". That way 
every chapter starts at the top of the reader.

I assume the official interpreter of the LyX project is Python, and if 
that assumption's true I'll make the postprocessor in Python. This 
post processing will be a heck of a lot easier if someone can point me 
to a good XML or HTML parser for Python, so I can look at nodes and 
attributes instead of trying to parse tags. I've noticed Alex's HTML 
appears very standard, with matching start and end tags and the like. 
This should theoretically make my job easier. So anyone have any 
suggestions for HTML or XML parsers for Python?

Other things I've noticed:

1) The LyX table of contents, when translated to HTML and then to 
Kindle format, crashes the Kindle previewer, so it must be removed 
from the HTML file and used to create an NCX TOC.

2) With the Kindle you probably don't want the title page, so there 
should be a post-processor option to capture the author, title and 
date, and then remove everything from the Title or Author 
environmented text to the next .

Hopefully this thread will serve as an accumulation of knowledge 
resulting in a post-processor. The post processor may simply serve as 
a stepping stone to a "real conversion". I've found that often the 
best specification for the right solution is obtained by implementing 
and evaluating a quick and dirty solution.

Thanks

StevET
 
Steve Litt
Author: The Key to Everyday Excellence
http://www.troubleshooters.com/bookstore/key_excellence.htm
Twitter: http://www.twitter.com/stevelitt