Re: LyX to Kindle, points to remember
On 2011-12-06, Steve Litt wrote: > On Tuesday, December 06, 2011 04:57:07 AM Guenter Milde wrote: >> On 2011-12-04, Steve Litt wrote: >> > So far I've discovered that LyX pagebreaks don't translate to >> > Kindle page breaks (start at top of reader). Both, LyXHTML and eLyXer should transform "hard" (manual) page breaks into a or similar. (-> Test and report to the authors if otherwise) This could then be combined with a rule like .pagebreak {page-break-after: always;} in the CSS style file. >> > Therefore, to every should be added the property >> > style="page-break-before: always;". That way every chapter starts at >> > the top of the reader. >> I think this could/should be done better in a global CSS rule. > how do I write a CSS rule that makes all pagefeed before > printing? My "CSS pocket reference" says:: h1 {page-break-before: always;} Günter
Re: LyX to Kindle, points to remember
On Tuesday, December 06, 2011 04:57:07 AM Guenter Milde wrote: > On 2011-12-04, Steve Litt wrote: > > Hi all, > > > > I'm making this thread in hopes that everyone who knows something > > on the subject will add something, and at the end we'll all know > > how to go from LyX to Kindle. > > > > I'm quickly coming to the conclusion that the best way to do it > > is to write a post-processor for Alex's HTML. Simple, modular, > > and I can do it myself (and obviously make it free software). > > The post-processor might need to read the LyX file itself to get > > a few bits of information not in Alex's HTML file. > > > > So far I've discovered that LyX pagebreaks don't translate to > > Kindle page breaks (start at top of reader). Therefore, to every > > should be added the property style="page-break-before: > > always;". That way every chapter starts at the top of the > > reader. > > I think this could/should be done better in a global CSS rule. Very possibly. I know so little about CSS it didn't even occur to me. I'm a parse and change kinda guy, so that's just what I did. For later, how do I write a CSS rule that makes all pagefeed before printing? The more I think about it, putting this in CSS would be better because the author can change it without changing Python code. > > > I assume the official interpreter of the LyX project is Python, > > and if that assumption's true I'll make the postprocessor in > > Python. This post processing will be a heck of a lot easier if > > someone can point me to a good XML or HTML parser for Python, so > > I can look at nodes and attributes instead of trying to parse > > tags. I've noticed Alex's HTML appears very standard, with > > matching start and end tags and the like. This should > > theoretically make my job easier. So anyone have any suggestions > > for HTML or XML parsers for Python? > > beautifulsoup http://pypi.python.org/pypi/BeautifulSoup/3.2.0 > and the standard library xml.* submodules: > > xml.dom > xml.dom.minidom > xml.dom.pulldom > xml.etree.ElementTree > xml.parsers.expat > xml.sax > xml.sax.handler > xml.sax.saxutils > xml.sax.xmlreader I've chosen HTMLParser because: 1) Most ubiquitous documentation 2) Seems to be native Python 3) Event driven, no need to build dom tree 4) Relatively easy 5) It works #3 was important to me in case somebody with a 400K word book, having lots of little paragraphs with lots of character styles in each paragraph, that whole albatross won't need to be in RAM at one time. Of course such a book would take a long time to parse, but probably someone with a book that size is used to things taking a long time. From what I hear, lxml is by far the fastest, but my understanding is it parses to a dom tree. From what I hear (http://blog.ianbicking.org/2008/03/30/python-html-parser- performance/), HTMLParser is one of the faster ones, though nowhere near as fast as lxml, and it's fairly easy on RAM because it outputs events, not a dom tree. My postprocessor is a proof of concept -- its design decisions can be changed later. [clip] > Keep up the work. Thank you! SteveT Steve Litt Author: The Key to Everyday Excellence http://www.troubleshooters.com/bookstore/key_excellence.htm Twitter: http://www.twitter.com/stevelitt
Re: LyX to Kindle, points to remember
On 2011-12-04, Steve Litt wrote: > Hi all, > I'm making this thread in hopes that everyone who knows something on > the subject will add something, and at the end we'll all know how to > go from LyX to Kindle. > I'm quickly coming to the conclusion that the best way to do it is to > write a post-processor for Alex's HTML. Simple, modular, and I can do > it myself (and obviously make it free software). The post-processor > might need to read the LyX file itself to get a few bits of information > not in Alex's HTML file. > So far I've discovered that LyX pagebreaks don't translate to Kindle > page breaks (start at top of reader). Therefore, to every should > be added the property style="page-break-before: always;". That way > every chapter starts at the top of the reader. I think this could/should be done better in a global CSS rule. > I assume the official interpreter of the LyX project is Python, and if > that assumption's true I'll make the postprocessor in Python. This > post processing will be a heck of a lot easier if someone can point me > to a good XML or HTML parser for Python, so I can look at nodes and > attributes instead of trying to parse tags. I've noticed Alex's HTML > appears very standard, with matching start and end tags and the like. > This should theoretically make my job easier. So anyone have any > suggestions for HTML or XML parsers for Python? beautifulsoup http://pypi.python.org/pypi/BeautifulSoup/3.2.0 and the standard library xml.* submodules: xml.dom xml.dom.minidom xml.dom.pulldom xml.etree.ElementTree xml.parsers.expat xml.sax xml.sax.handler xml.sax.saxutils xml.sax.xmlreader > Other things I've noticed: > 1) The LyX table of contents, when translated to HTML and then to > Kindle format, crashes the Kindle previewer, so it must be removed > from the HTML file and used to create an NCX TOC. > 2) With the Kindle you probably don't want the title page, so there > should be a post-processor option to capture the author, title and > date, and then remove everything from the Title or Author > environmented text to the next . > Hopefully this thread will serve as an accumulation of knowledge > resulting in a post-processor. The post processor may simply serve as > a stepping stone to a "real conversion". I've found that often the > best specification for the right solution is obtained by implementing > and evaluating a quick and dirty solution. Keep up the work. Günter
LyX to Kindle, points to remember
Hi all, I'm making this thread in hopes that everyone who knows something on the subject will add something, and at the end we'll all know how to go from LyX to Kindle. I'm quickly coming to the conclusion that the best way to do it is to write a post-processor for Alex's HTML. Simple, modular, and I can do it myself (and obviously make it free software). The post-processor might need to read the LyX file itself to get a few bits of information not in Alex's HTML file. So far I've discovered that LyX pagebreaks don't translate to Kindle page breaks (start at top of reader). Therefore, to every should be added the property style="page-break-before: always;". That way every chapter starts at the top of the reader. I assume the official interpreter of the LyX project is Python, and if that assumption's true I'll make the postprocessor in Python. This post processing will be a heck of a lot easier if someone can point me to a good XML or HTML parser for Python, so I can look at nodes and attributes instead of trying to parse tags. I've noticed Alex's HTML appears very standard, with matching start and end tags and the like. This should theoretically make my job easier. So anyone have any suggestions for HTML or XML parsers for Python? Other things I've noticed: 1) The LyX table of contents, when translated to HTML and then to Kindle format, crashes the Kindle previewer, so it must be removed from the HTML file and used to create an NCX TOC. 2) With the Kindle you probably don't want the title page, so there should be a post-processor option to capture the author, title and date, and then remove everything from the Title or Author environmented text to the next . Hopefully this thread will serve as an accumulation of knowledge resulting in a post-processor. The post processor may simply serve as a stepping stone to a "real conversion". I've found that often the best specification for the right solution is obtained by implementing and evaluating a quick and dirty solution. Thanks StevET Steve Litt Author: The Key to Everyday Excellence http://www.troubleshooters.com/bookstore/key_excellence.htm Twitter: http://www.twitter.com/stevelitt