Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
Hi Steve, On 2/5/12, Rob Oakes lyx-de...@oak-tree.us wrote: Extremely good point, I'm also more comfortable with the HTML export available in LyX. I initially was interested in eLyXer because I thought I might be able to use it to help with an import filter as well. I'm not sure that it can,

Re: eLyXer for Document Parsing

2012-02-09 Thread Steve Litt
On Thu, 9 Feb 2012 15:13:48 +0100 Alex Fernandez alejandro...@gmail.com wrote: Hi Steve, On 2/5/12, Rob Oakes lyx-de...@oak-tree.us wrote: Extremely good point, I'm also more comfortable with the HTML export available in LyX. I initially was interested in eLyXer because I thought I

Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
On 2/9/12, Steve Litt sl...@troubleshooters.com wrote: I am not sure what you mean by document model. For the record, eLyXer creates an in-memory representation of the complete LyX document since version 0.36 (released back in 2009): I'm pretty sure he meant Document Object Model, or DOM, the

Re: eLyXer for Document Parsing

2012-02-09 Thread Rob Oakes
On 2/9/2012 11:42 AM, Alex Fernandez wrote: Ah, OK. Always hated DOM. eLyXer's in-memory representation is for the LyX document, not of the resulting HTML document. Much tighter this way, IMHO. Is there an example of how I might be able to access the in-memory representation for the LyX

Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
On 2/9/12, Rob Oakes lyx-de...@oak-tree.us wrote: On 2/9/2012 11:42 AM, Alex Fernandez wrote: Ah, OK. Always hated DOM. eLyXer's in-memory representation is for the LyX document, not of the resulting HTML document. Much tighter this way, IMHO. Is there an example of how I might be able to

Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
Hi Steve, On 2/5/12, Rob Oakes lyx-de...@oak-tree.us wrote: Extremely good point, I'm also more comfortable with the HTML export available in LyX. I initially was interested in eLyXer because I thought I might be able to use it to help with an import filter as well. I'm not sure that it can,

Re: eLyXer for Document Parsing

2012-02-09 Thread Steve Litt
On Thu, 9 Feb 2012 15:13:48 +0100 Alex Fernandez alejandro...@gmail.com wrote: Hi Steve, On 2/5/12, Rob Oakes lyx-de...@oak-tree.us wrote: Extremely good point, I'm also more comfortable with the HTML export available in LyX. I initially was interested in eLyXer because I thought I

Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
On 2/9/12, Steve Litt sl...@troubleshooters.com wrote: I am not sure what you mean by document model. For the record, eLyXer creates an in-memory representation of the complete LyX document since version 0.36 (released back in 2009): I'm pretty sure he meant Document Object Model, or DOM, the

Re: eLyXer for Document Parsing

2012-02-09 Thread Rob Oakes
On 2/9/2012 11:42 AM, Alex Fernandez wrote: Ah, OK. Always hated DOM. eLyXer's in-memory representation is for the LyX document, not of the resulting HTML document. Much tighter this way, IMHO. Is there an example of how I might be able to access the in-memory representation for the LyX

Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
On 2/9/12, Rob Oakes lyx-de...@oak-tree.us wrote: On 2/9/2012 11:42 AM, Alex Fernandez wrote: Ah, OK. Always hated DOM. eLyXer's in-memory representation is for the LyX document, not of the resulting HTML document. Much tighter this way, IMHO. Is there an example of how I might be able to

Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
Hi Steve, On 2/5/12, Rob Oakes wrote: > Extremely good point, I'm also more comfortable with the HTML export > available in LyX. I initially was interested in eLyXer because I thought I > might be able to use it to help with an import filter as well. I'm not sure > that it

Re: eLyXer for Document Parsing

2012-02-09 Thread Steve Litt
On Thu, 9 Feb 2012 15:13:48 +0100 Alex Fernandez wrote: > Hi Steve, > > On 2/5/12, Rob Oakes wrote: > > Extremely good point, I'm also more comfortable with the HTML export > > available in LyX. I initially was interested in eLyXer because I > >

Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
On 2/9/12, Steve Litt wrote: >> I am not sure what you mean by "document model". For the record, >> eLyXer creates an in-memory representation of the complete LyX >> document since version 0.36 (released back in 2009): > I'm pretty sure he meant "Document Object

Re: eLyXer for Document Parsing

2012-02-09 Thread Rob Oakes
On 2/9/2012 11:42 AM, Alex Fernandez wrote: > Ah, OK. Always hated DOM. eLyXer's in-memory representation is for the > LyX document, not of the resulting HTML document. Much tighter this > way, IMHO. Is there an example of how I might be able to access the in-memory representation for the LyX

Re: eLyXer for Document Parsing

2012-02-09 Thread Alex Fernandez
On 2/9/12, Rob Oakes wrote: > On 2/9/2012 11:42 AM, Alex Fernandez wrote: >> Ah, OK. Always hated DOM. eLyXer's in-memory representation is for the >> LyX document, not of the resulting HTML document. Much tighter this >> way, IMHO. > > Is there an example of how I might be

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 04/02/2012 19:07, slitt wrote: One more question: You sure you want to go in-memory? What happens if a guy has a 1200 page book with 100 chapters each containing 10 sections, each containing 10 subsections, and tries to parse it on a machine with 512 MB RAM? You in a heap of trouble son. I

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 04/02/2012 18:03, Rob Oakes wrote: Dear eLyXer Users and Developers, I'm still at work on the import/export module for Microsoft Word documents. I'm making pretty good progress. I've got a rough prototype that works pretty well and I'm now starting to refine it. My approach up to now has

Re: eLyXer for Document Parsing

2012-02-05 Thread Alex Fernandez
Hi all, I am currently travelling so excuse my android top-posting. Actually building a reusable in-memory representation for Python scripting of LyX documents was a requisite for eLyXer. You should not have trouble with large documents as my puny netbook eats 1000 page documents for lunch. Look

Re: eLyXer for Document Parsing

2012-02-05 Thread Rob Oakes
On Feb 5, 2012, at 2:04 AM, Abdelrazak Younes wrote: Strong suggestion: use LyX proper. I am quite sure you already know that because I saw some patches from you in this area but I'll explain anyway: LyX's html own export is so good and fast because it effectively knows the in-memory

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 05/02/2012 17:48, Rob Oakes wrote: My current script is about 50 lines long, and can be used with either native XHTML or eLyXer. To add new features, you add additional cases describing how to translate the XHTML. Which brings us to an important point: there's already a pretty good LyX -

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 04/02/2012 19:07, slitt wrote: One more question: You sure you want to go in-memory? What happens if a guy has a 1200 page book with 100 chapters each containing 10 sections, each containing 10 subsections, and tries to parse it on a machine with 512 MB RAM? You in a heap of trouble son. I

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 04/02/2012 18:03, Rob Oakes wrote: Dear eLyXer Users and Developers, I'm still at work on the import/export module for Microsoft Word documents. I'm making pretty good progress. I've got a rough prototype that works pretty well and I'm now starting to refine it. My approach up to now has

Re: eLyXer for Document Parsing

2012-02-05 Thread Alex Fernandez
Hi all, I am currently travelling so excuse my android top-posting. Actually building a reusable in-memory representation for Python scripting of LyX documents was a requisite for eLyXer. You should not have trouble with large documents as my puny netbook eats 1000 page documents for lunch. Look

Re: eLyXer for Document Parsing

2012-02-05 Thread Rob Oakes
On Feb 5, 2012, at 2:04 AM, Abdelrazak Younes wrote: Strong suggestion: use LyX proper. I am quite sure you already know that because I saw some patches from you in this area but I'll explain anyway: LyX's html own export is so good and fast because it effectively knows the in-memory

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 05/02/2012 17:48, Rob Oakes wrote: My current script is about 50 lines long, and can be used with either native XHTML or eLyXer. To add new features, you add additional cases describing how to translate the XHTML. Which brings us to an important point: there's already a pretty good LyX -

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 04/02/2012 19:07, slitt wrote: One more question: You sure you want to go in-memory? What happens if a guy has a 1200 page book with 100 chapters each containing 10 sections, each containing 10 subsections, and tries to parse it on a machine with 512 MB RAM? You in a heap of trouble son. I

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 04/02/2012 18:03, Rob Oakes wrote: Dear eLyXer Users and Developers, I'm still at work on the import/export module for Microsoft Word documents. I'm making pretty good progress. I've got a rough prototype that works pretty well and I'm now starting to refine it. My approach up to now has

Re: eLyXer for Document Parsing

2012-02-05 Thread Alex Fernandez
Hi all, I am currently travelling so excuse my android top-posting. Actually building a reusable in-memory representation for Python scripting of LyX documents was a requisite for eLyXer. You should not have trouble with large documents as my puny netbook eats 1000 page documents for lunch. Look

Re: eLyXer for Document Parsing

2012-02-05 Thread Rob Oakes
On Feb 5, 2012, at 2:04 AM, Abdelrazak Younes wrote: > Strong suggestion: use LyX proper. I am quite sure you already know that > because I saw some patches from you in this area but I'll explain anyway: > LyX's html own export is so good and fast because it effectively knows the > in-memory

Re: eLyXer for Document Parsing

2012-02-05 Thread Abdelrazak Younes
On 05/02/2012 17:48, Rob Oakes wrote: My current script is about 50 lines long, and can be used with either native XHTML or eLyXer. To add new features, you add additional cases describing how to translate the XHTML. Which brings us to an important point: there's already a pretty good LyX

eLyXer for Document Parsing

2012-02-04 Thread Rob Oakes
Dear eLyXer Users and Developers, I'm still at work on the import/export module for Microsoft Word documents. I'm making pretty good progress. I've got a rough prototype that works pretty well and I'm now starting to refine it. My approach up to now has been to use regular expressions to match

Re: eLyXer for Document Parsing

2012-02-04 Thread slitt
On Sat, 4 Feb 2012 10:03:00 -0700 Rob Oakes lyx-de...@oak-tree.us wrote: Dear eLyXer Users and Developers, I'm still at work on the import/export module for Microsoft Word documents. I'm making pretty good progress. I've got a rough prototype that works pretty well and I'm now starting to

Re: eLyXer for Document Parsing

2012-02-04 Thread Rob Oakes
Hi Steve, Not only possible but easy if you do things the Steve Litt way. eLyXer quickly punches out HTML that's clean enough to read with an XML parser, I think. So, eLyXer converts to HTML, and then your program's DOMbuilder module converts that HTML to in-memory DOM. No muss, no fuss, no

Re: eLyXer for Document Parsing

2012-02-04 Thread slitt
On Sat, 4 Feb 2012 14:00:24 -0700 Rob Oakes lyx-de...@oak-tree.us wrote: Hi Steve, [clip] One more question: You sure you want to go in-memory? What happens if a guy has a 1200 page book with 100 chapters each containing 10 sections, each containing 10 subsections, and tries to parse it on

eLyXer for Document Parsing

2012-02-04 Thread Rob Oakes
Dear eLyXer Users and Developers, I'm still at work on the import/export module for Microsoft Word documents. I'm making pretty good progress. I've got a rough prototype that works pretty well and I'm now starting to refine it. My approach up to now has been to use regular expressions to match

Re: eLyXer for Document Parsing

2012-02-04 Thread slitt
On Sat, 4 Feb 2012 10:03:00 -0700 Rob Oakes lyx-de...@oak-tree.us wrote: Dear eLyXer Users and Developers, I'm still at work on the import/export module for Microsoft Word documents. I'm making pretty good progress. I've got a rough prototype that works pretty well and I'm now starting to

Re: eLyXer for Document Parsing

2012-02-04 Thread Rob Oakes
Hi Steve, Not only possible but easy if you do things the Steve Litt way. eLyXer quickly punches out HTML that's clean enough to read with an XML parser, I think. So, eLyXer converts to HTML, and then your program's DOMbuilder module converts that HTML to in-memory DOM. No muss, no fuss, no

Re: eLyXer for Document Parsing

2012-02-04 Thread slitt
On Sat, 4 Feb 2012 14:00:24 -0700 Rob Oakes lyx-de...@oak-tree.us wrote: Hi Steve, [clip] One more question: You sure you want to go in-memory? What happens if a guy has a 1200 page book with 100 chapters each containing 10 sections, each containing 10 subsections, and tries to parse it on

eLyXer for Document Parsing

2012-02-04 Thread Rob Oakes
Dear eLyXer Users and Developers, I'm still at work on the import/export module for Microsoft Word documents. I'm making pretty good progress. I've got a rough prototype that works pretty well and I'm now starting to refine it. My approach up to now has been to use regular expressions to match

Re: eLyXer for Document Parsing

2012-02-04 Thread slitt
On Sat, 4 Feb 2012 10:03:00 -0700 Rob Oakes wrote: > Dear eLyXer Users and Developers, > > I'm still at work on the import/export module for Microsoft Word > documents. I'm making pretty good progress. I've got a rough > prototype that works pretty well and I'm now

Re: eLyXer for Document Parsing

2012-02-04 Thread Rob Oakes
Hi Steve, > Not only possible but easy if you do things the Steve Litt way. eLyXer > quickly punches out HTML that's clean enough to read with an XML > parser, I think. So, eLyXer converts to HTML, and then your program's > DOMbuilder module converts that HTML to in-memory DOM. No muss, no >

Re: eLyXer for Document Parsing

2012-02-04 Thread slitt
On Sat, 4 Feb 2012 14:00:24 -0700 Rob Oakes wrote: > Hi Steve, [clip] > > One more question: You sure you want to go in-memory? What happens > > if a guy has a 1200 page book with 100 chapters each containing 10 > > sections, each containing 10 subsections, and tries to