Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
On 2/9/12, Rob Oakes wrote: > On 2/9/2012 11:42 AM, Alex Fernandez wrote: >> Ah, OK. Always hated DOM. eLyXer's in-memory representation is for the >> LyX document, not of the resulting HTML document. Much tighter this >> way, IMHO. > > Is there an example of how I might be able to access the in-m

Re: eLyXer for Document Parsing

2012-02-09 Thread Rob Oakes
On 2/9/2012 11:42 AM, Alex Fernandez wrote: > Ah, OK. Always hated DOM. eLyXer's in-memory representation is for the > LyX document, not of the resulting HTML document. Much tighter this > way, IMHO. Is there an example of how I might be able to access the in-memory representation for the LyX docu

Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
On 2/9/12, Steve Litt wrote: >> I am not sure what you mean by "document model". For the record, >> eLyXer creates an in-memory representation of the complete LyX >> document since version 0.36 (released back in 2009): > I'm pretty sure he meant "Document Object Model", or DOM, the object > hiera

Re: eLyXer for Document Parsing

2012-02-09 Thread Steve Litt
On Thu, 9 Feb 2012 15:13:48 +0100 Alex Fernandez wrote: > Hi Steve, > > On 2/5/12, Rob Oakes wrote: > > Extremely good point, I'm also more comfortable with the HTML export > > available in LyX. I initially was interested in eLyXer because I > > thought I might be able to use it to help with an

Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
Hi Steve, On 2/5/12, Rob Oakes wrote: > Extremely good point, I'm also more comfortable with the HTML export > available in LyX. I initially was interested in eLyXer because I thought I > might be able to use it to help with an import filter as well. I'm not sure > that it can, though. As you not

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 05/02/2012 17:48, Rob Oakes wrote: My current script is about 50 lines long, and can be used with either native XHTML or eLyXer. To add new features, you add additional cases describing how to translate the XHTML. Which brings us to an important point: there's already a pretty good LyX ->

Re: eLyXer for Document Parsing

2012-02-05 Thread Rob Oakes
On Feb 5, 2012, at 2:04 AM, Abdelrazak Younes wrote: > Strong suggestion: use LyX proper. I am quite sure you already know that > because I saw some patches from you in this area but I'll explain anyway: > LyX's html own export is so good and fast because it effectively knows the > in-memory r

Re: eLyXer for Document Parsing

2012-02-05 Thread Alex Fernandez
Hi all, I am currently travelling so excuse my android top-posting. Actually building a reusable in-memory representation for Python scripting of LyX documents was a requisite for eLyXer. You should not have trouble with large documents as my puny netbook eats 1000 page documents for lunch. Look a

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 04/02/2012 18:03, Rob Oakes wrote: Dear eLyXer Users and Developers, I'm still at work on the import/export module for Microsoft Word documents. I'm making pretty good progress. I've got a rough prototype that works pretty well and I'm now starting to refine it. My approach up to now has b

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 04/02/2012 19:07, slitt wrote: One more question: You sure you want to go in-memory? What happens if a guy has a 1200 page book with 100 chapters each containing 10 sections, each containing 10 subsections, and tries to parse it on a machine with 512 MB RAM? You in a heap of trouble son. I

Re: eLyXer for Document Parsing

2012-02-04 Thread slitt
On Sat, 4 Feb 2012 14:00:24 -0700 Rob Oakes wrote: > Hi Steve, [clip] > > One more question: You sure you want to go in-memory? What happens > > if a guy has a 1200 page book with 100 chapters each containing 10 > > sections, each containing 10 subsections, and tries to parse it on > > a machine

Re: eLyXer for Document Parsing

2012-02-04 Thread Rob Oakes
Hi Steve, > Not only possible but easy if you do things the Steve Litt way. eLyXer > quickly punches out HTML that's clean enough to read with an XML > parser, I think. So, eLyXer converts to HTML, and then your program's > DOMbuilder module converts that HTML to in-memory DOM. No muss, no > fuss,

Re: eLyXer for Document Parsing

2012-02-04 Thread slitt
On Sat, 4 Feb 2012 10:03:00 -0700 Rob Oakes wrote: > Dear eLyXer Users and Developers, > > I'm still at work on the import/export module for Microsoft Word > documents. I'm making pretty good progress. I've got a rough > prototype that works pretty well and I'm now starting to refine it. > > My