Re: We need ePub/Mobi conversion: was: Book Frontmatter

2014-02-02 Thread Liviu Andronic
Dear Steve and Alex,

On Mon, Feb 3, 2014 at 2:48 AM, Steve Litt  wrote:
> On Fri, 31 Jan 2014 22:33:02 +0100
> Alex Fernandez  wrote:
>> > Ladies and gentlemen, if the preceding paragraph doesn't convince
>> > us we need a good, solid, LyX to ePub and LyX to Mobi conversion
>
>
Last year we actually had a GSoC project specifically dealing with
ePub. Josh and Richard made progress on this front, and the code
simply awaits someone with the motivation and the skills to finish the
job. The almost finished feature is available in several GIT branches
here: http://git.lyx.org/?p=gsoc.git;a=summary .

Regards,
Liviu


Re: We need ePub/Mobi conversion: was: Book Frontmatter

2014-02-02 Thread Steve Litt
On Fri, 31 Jan 2014 22:33:02 +0100
Alex Fernandez  wrote:

> Hi Steve,
> 
> On Fri, Jan 31, 2014 at 5:47 PM, Steve Litt
> wrote:
> 
> > Ladies and gentlemen, if the preceding paragraph doesn't convince
> > us we need a good, solid, LyX to ePub and LyX to Mobi conversion


[clipped shame, commiseration, and eLyXer's not doing the bad stuff I
spoke of]

Hi Alex...

> The shame is, in theory, LyX to ePub is simple. Every environment
> > becomes , every character style becomes
> > . Leave  out of it except for every
> > special cases. Even lyx-code should become ,
> > not .
> >
> 
> It should not be difficult to change eLyXer output to be as you
> desire; just take a look at base.cfg
>   https://github.com/alexfernandez/elyxer/blob/master/src/conf/base.cfg

:-)

I've looked at the main eLyXer Python program before. It was almost 10K
lines of code. I'd imagine making a change to it would be difficult.
Later in the email you thought it would take me 1 day to understand
eLyXer's programming. This either greatly overstates my abilities, or
understates the (necessary for the problem domain solved) complexity of
eLyXer.

> where most elements of the output can be configured. There are many
> special cases and a few ugly-ish tricks (such as h? to denote h1-h6),
> but it is mostly there. Or should be.
> 
[clip]
> 
> I briefly considered writing Yet Another LyX to HTML Exporter, but
> > found out that in spite of LyX's native format being Non Human
> > Friendly XML, it's not *well formed* XML, so I can't use Python's
> > lxml.etree, let alone Python's xml.etree.ElementTree, to parse it.
> > Perhaps if LyX offered an export to well-formed XML, hopefully with
> > a DTD, I could parse that to produce ePub-friendly Xhtml, but as
> > far as I know that doesn't exist either.
> >
> 
> With eLyXer I have already done all the heavy work myself of
> converting LyX documents to an in-memory structure of containers and
> insets. In theory you might just tweak the configuration file
> base.cfg and generate a completely different document structure such
> as ePub. In practice, and as far as I know, it works: I was able to
> make the transition from LyX 1.x to 2.0 just by adding a few insets
> and containers to the configuration file.

Alex, I can't justify working with 10K lines just to, basically, pass
environment and character style names, with their applied text, into
xhtml. eLyXer was designed to do a hugely greater superset of what I
need. 

Of course, the right way would be to capture semantic styles plus
text with Python's lxml.etree, and (easily) convert to Xhtml. If native
LyX ever becomes well-formed XML, I can easily have my way with it
using Python's lxml.etree package. If native LyX were still 2005 all
text all the time format, I could have text-parsed it and converted to
Xhtml. But with this neither-here-nor-there native format, what should
be a very easy task becomes a riot of detours.

[clip]


> If you are already conversant in Python, know ePub and are willing to
> do the pretty boring task of translating between LyX and ePub, then
> you can take eLyXer as the starting point to do the job. It can
> probably be done in a few days:
> - one to understand eLyXer internals,
> - one to solve any stupid design errors that I may have made that
> make ePub support hard, such as configure it to use a different
> base.cfg file,
> - and one or two to recode all commands to output ePub.

You greatly overestimate my coding abilities.

> 
> I would encourage you to take eLyXer and run from there, but the high
> probability that you will find strong opposition to integrate the
> resulting converter into eLyXer will probably mean that your effort
> will mostly be useful for you. 

Assuming I'm the one who eventually makes the converter, I'm not at all
concerned whether it gets integrated into the official LyX project. As
a matter of fact, I'd feel better about it if it were just an add-on
people would download from Troubleshooters.Com. I'm a big fan of
modularity, and the less the conversion and LyX need to know about each
other, the better I like it. Also, my philosophy differs from the LyX
developers' in that I believe that if a user doesn't have the knowledge
to make a shellscript, batch file, powerscript, whatever it's called
on the various platforms, and doesn't want to learn how to do this
simple task, he/she isn't a good candidate for Free Software.


> So, my advice would be to keep away
> from it. If you are still interested I can of course give you any
> support you need with the source code.

I agree, but for different reasons. I keep coming back to the fact that
it's a huge superset of what I need, and it's just short of 10K lines
of code. IMHO there's *got* to be a better way.

That being said, I'd like to congratulate you on being one of the very
few converters of various types that studiously retains semantic
content and passes it to the output.

Thanks,

SteveT

Steve Litt*  http://www.troubleshooter

Re: We need ePub/Mobi conversion: was: Book Frontmatter

2014-02-02 Thread Alan L Tyree


>> And instead of using the LyX
>> exporters, try tex4ht to make the html file. 
>
> I think I tried lyx->latex->tex4ht once before, and something went
> wrong, though I no longer remember what.
>
>> And process the html file
>> through tidy before using Pandoc.
>
> That's a good idea, although Python's lxml.etree can read well formed
> XML in any form, and has its own pretty print.
>> 
>> I haven't tried this, so please don't waste your time on it if
>> inconvenient. Pandoc seems at its strongest when starting with a
>> Markdown file.
>
> LOL, if it leads to LyX to ePub, it will be anything but a waste of
> time. I'm going to try that tex4ht again and see what happens.
>
Hi Steve,
According to my notes from 2009, I used tex4ht using the following
commands:

   - htlatex file.tex "xhtml,mathml" "-cunihtf" "-cvalidate"
 
   - tidy -m -asxhtml name.html

I was just trying to make presentable XHTML files, so I don't know how
'good' they are for your purposes.

When the htlatex command runs, it will stop once in a while waiting for
input. Again, my notes say to use 'R '. I just ran it on a
reasonable size file and need to use the 'R' command about 3 times.

I should mention that I am on Debian Wheezy.

HTH,
Alan


> Thanks,
>
> SteveT
>
> Steve Litt*  http://www.troubleshooters.com/
> Troubleshooting Training  *  Human Performance


-- 
Alan L Tyree   http://www2.austlii.edu.au/~alan
Tel:  04 2748 6206 sip:172...@iptel.org


Re: We need ePub/Mobi conversion: was: Book Frontmatter

2014-02-02 Thread Steve Litt
On Sun, 02 Feb 2014 05:53:59 +1100
Alan L Tyree  wrote:

> Well, that's a bit disappointing. Thanks for the full report. I've
> only used Pandoc for simple conversions, so haven't looked deeply at
> the configuration options that might help with the problems that you
> identify below.
> 
> One last approach might be interesting to try: What about LyX ->
> (X)html, then process through Pandoc. 

Once I have good, solid, semantic Xhtml, Pandoc isn't needed. I already
have most of the code to ePubize a single Xhtml file with lots of
chapters. The tough part is getting your content out of LyX or LaTeX
without dropping the styles.

> And instead of using the LyX
> exporters, try tex4ht to make the html file. 

I think I tried lyx->latex->tex4ht once before, and something went
wrong, though I no longer remember what.

> And process the html file
> through tidy before using Pandoc.

That's a good idea, although Python's lxml.etree can read well formed
XML in any form, and has its own pretty print.
> 
> I haven't tried this, so please don't waste your time on it if
> inconvenient. Pandoc seems at its strongest when starting with a
> Markdown file.

LOL, if it leads to LyX to ePub, it will be anything but a waste of
time. I'm going to try that tex4ht again and see what happens.

Thanks,

SteveT

Steve Litt*  http://www.troubleshooters.com/
Troubleshooting Training  *  Human Performance