Re: Progress on the MS Word to LyX conversion (xml)

2008-07-31 Thread Abdelrazak Younes

Michael Wojcik wrote:

I don't expect the
switch to XML to cause me any problems, and to be honest I'm a bit
puzzled by all the worrying.


/me too :-)

Abdel.



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-31 Thread Michael Wojcik

Steve Litt wrote:


Trouble is, replacing \begin..\end with <>... is a hack. LyX developers 
have defined LyX native format as \begin always is the first character on a 
line. There's no such requirement in XML, and if we require it, that's a 
hack. If we don't require it, LyX-XML parsing becomes a whole new level of 
difficulty.


It's not hard at all, with an XML parser. Actually, putting all XML 
elements on their own lines, with or without leading whitespace, can 
be done with a DFA (or anything equivalent, such as a regular 
expression); you don't even need a full-strength parser. If you want 
elements all on their own lines, pre-processing with a quick sed 
script would do that for you.


I'm a toolsmith myself, and I write lots of tools, in lots of 
languages, for pre- and post-processing various file formats. I don't 
expect the switch to XML to cause me any problems, and to be honest 
I'm a bit puzzled by all the worrying.


--
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-31 Thread Michael Wojcik

Manveru wrote:

Have you ever merge XML? I tried - it is horrible work.


It depends entirely on how the XML document is formatted. There's 
nothing that prevents XML with sensible line breaks, for example.


I keep lots of XHTML documents in CVS. They're well-formatted, so 
merging works just fine.


--
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-28 Thread Abdelrazak Younes

John McCabe-Dansted wrote:

On Fri, Jul 25, 2008 at 4:43 PM, Manveru<[EMAIL PROTECTED]>  wrote:

To the discussion about data format preference:

I am reading all your comments about XML, YAML and other suggested data
formats. And this discussion reminds me something about XML what almost
nobody is remeber about. How many LyX user are working in large team
projects? How often they have to merge text files from different branches?
Have you ever merge XML? I tried - it is horrible work.


I don't see why it would be harder if we "just replace \begin...\end
with<>...".


I think LyX cannot exist with XML data format without build-in document
merge functionality.


This would be nice in any case.


Shameless plug:

http://www.lyx.org/Donate#sponsorship

Abdel.



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-28 Thread Abdelrazak Younes

G. Milde wrote:

On 28.07.08, Steve Litt wrote:

On Monday 28 July 2008 01:10, John McCabe-Dansted wrote:

On Fri, Jul 25, 2008 at 4:43 PM, Manveru<[EMAIL PROTECTED]>  wrote:

To the discussion about data format preference:

... Have you ever merged XML? I tried - it is horrible work.

I don't see why it would be harder if we "just replace \begin...\end
with<>...".



Trouble is, replacing \begin..\end with<>...  is a hack.

...

There's no such requirement in XML, and if we require it, that's a
hack.


I'd call it a layout convention.

IMO it is perfectly legal to define the lyx file format as

... uses XML ...
... is laid out in a manner to facilitate processing by tools that
operate on a line basis (grep, merge, sed, awk, ...)
...


Right, but LyX should not depend on this human friendly format. IOW LyX 
will be able to parse non nicely formatted .lyx file but will always 
output nicely formatted .lyx file.


We could add an option to lyx2lyx so that badly formatted LyX files 
generated by some external tool would be transformed into a nicely 
formatted .lyx file. See? I don't forecast any parsing problem :-)


Abdel.



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-28 Thread G. Milde
On 28.07.08, Steve Litt wrote:
> On Monday 28 July 2008 01:10, John McCabe-Dansted wrote:
> > On Fri, Jul 25, 2008 at 4:43 PM, Manveru <[EMAIL PROTECTED]> wrote:
> > > To the discussion about data format preference:
> > >
> > > ... Have you ever merged XML? I tried - it is horrible work.
> >
> > I don't see why it would be harder if we "just replace \begin...\end
> > with <>...".

> Trouble is, replacing \begin..\end with <>... is a hack. 
...
> There's no such requirement in XML, and if we require it, that's a 
> hack. 

I'd call it a layout convention.

IMO it is perfectly legal to define the lyx file format as

... uses XML ...
... is laid out in a manner to facilitate processing by tools that
operate on a line basis (grep, merge, sed, awk, ...) 
...

Günter




Re: Progress on the MS Word to LyX conversion (xml)

2008-07-28 Thread Steve Litt
On Monday 28 July 2008 01:10, John McCabe-Dansted wrote:
> On Fri, Jul 25, 2008 at 4:43 PM, Manveru <[EMAIL PROTECTED]> wrote:
> > To the discussion about data format preference:
> >
> > I am reading all your comments about XML, YAML and other suggested data
> > formats. And this discussion reminds me something about XML what almost
> > nobody is remeber about. How many LyX user are working in large team
> > projects? How often they have to merge text files from different
> > branches? Have you ever merge XML? I tried - it is horrible work.
>
> I don't see why it would be harder if we "just replace \begin...\end
> with <>...".

Trouble is, replacing \begin..\end with <>... is a hack. LyX developers 
have defined LyX native format as \begin always is the first character on a 
line. There's no such requirement in XML, and if we require it, that's a 
hack. If we don't require it, LyX-XML parsing becomes a whole new level of 
difficulty.

Like I said, nothing that XML->YAML and YAML->XML can't solve, but those would 
be required. Incidentally, I just heard there are already standalone programs 
that do those conversions, so before writing code myself, I'll investigate.

SteveT
 
Steve Litt
Recession Relief Package
http://www.recession-relief.US



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-27 Thread John McCabe-Dansted
On Fri, Jul 25, 2008 at 4:43 PM, Manveru <[EMAIL PROTECTED]> wrote:
> To the discussion about data format preference:
>
> I am reading all your comments about XML, YAML and other suggested data
> formats. And this discussion reminds me something about XML what almost
> nobody is remeber about. How many LyX user are working in large team
> projects? How often they have to merge text files from different branches?
> Have you ever merge XML? I tried - it is horrible work.

I don't see why it would be harder if we "just replace \begin...\end
with <>...".

> I think LyX cannot exist with XML data format without build-in document
> merge functionality.

This would be nice in any case.

-- 
John C. McCabe-Dansted
PhD Student
University of Western Australia


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-27 Thread Manveru
To the discussion about data format preference:

I am reading all your comments about XML, YAML and other suggested data
formats. And this discussion reminds me something about XML what almost
nobody is remeber about. How many LyX user are working in large team
projects? How often they have to merge text files from different branches?
Have you ever merge XML? I tried - it is horrible work.

I think LyX cannot exist with XML data format without build-in document
merge functionality. If any one is thinking about proffesional usage of LyX.
I saw some discussions about it, but I do not know whether it is in LyX or
not. I do not need this feature yet.

YAML is interesting idea, I saw use of it in one of Python frameworks (I
don't remeber which one). But it stays in nische. I don't see libraries for
YAML under active development right now.

-- 
Manveru
jabber: [EMAIL PROTECTED]
gg: 1624001
http://www.manveru.pl


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-27 Thread Pavel Sanda
> On Thursday 24 July 2008 13:07:19 Pavel Sanda wrote:
> > frankly - these are nice dreams, but there is not manpower to do it.
> > my feeling is that the xml-branch commit activity pefectly shows what will
> > happen after the worst bugs will be repaired in xml merged trunk.
> >
> > or you have some particular developer in mind? :))
> 
> Last time that I remember lyx2lyx was also a nice dream. :-)

you wanted to say docbook ? :))

> > pavel


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-27 Thread José Matos
On Thursday 24 July 2008 13:07:19 Pavel Sanda wrote:
> frankly - these are nice dreams, but there is not manpower to do it.
> my feeling is that the xml-branch commit activity pefectly shows what will
> happen after the worst bugs will be repaired in xml merged trunk.
>
> or you have some particular developer in mind? :))

Last time that I remember lyx2lyx was also a nice dream. :-)

> pavel

-- 
José Abílio


Re: Progress on the MS Word to LyX conversion

2008-07-24 Thread Manveru
I understand DTD simplicity... but it is no longer fresh these days. Schema
allows better understanding and can be processed by XSLT.

2008/7/23 John <[EMAIL PROTECTED]>:

> On Wednesday 23 July 2008 08:04:59 am Steve Litt wrote:
> > On Tuesday 22 July 2008 11:32, rgheck wrote:
> > > Steve Litt wrote:
> > > > I don't know how it will be after LyX goes XML, but right now at
> 1.5.3,
> > > > converting my LyX code to something else by parsing the LyX native
> code
> > > > would be trivial.
> This is probably teaching Grandma to suck eggs - but
> There is a very good set of XML utilities available in Linux which alloy
> you
> easily parse and transform .xml files into almost anything you want (using
> xslt, sax, and friends. In openSUSE it is called xmlstarlet and comes with
> the installation CDs or DVD.
> These should make it easy to translate to and from LyX (when it finally
> goes
> fully XML).
>
> John O'Gorman
> > >
> > > My understanding is that, whatever happens with the LyX file format, we
> > > want it to remain possible to do the sort of simple scripting we all
> > > like to be able to do. The XML business is really just a matter of
> > > replacing things like this:
> > >
> > > \begin_layout Standard
> > > this.
> > > \end_layout
> > >
> > > \begin_layout Standard
> > > \begin_inset CommandInset bibtex
> > > LatexCommand bibtex
> > > bibfiles "/tmp/bib"
> > > options "plain"
> > >
> > > \end_inset
> > >
> > >
> > > \end_layout
> > >
> > > with things like this:
> > >
> > > 
> > > this.
> > > 
> > >
> > > 
> > >  options="plain"
> > > /> 
> > >
> > > Just as easy to parse, I hope. Maybe even easier.
> > >
> > > That's not anything actually agreed or implemented
> >
> > It's not as easy to parse, but it's reasonable. If that's the extent of
> the
> > XMLization of LyX, it should still be somewhat tweakable with Vim, Perl,
> > etc.
> >
> > The real problems come in when they do things in XML that would be
> > denormalization in a database. Store the paragraphs one place, and then
> > store the *number of paragraphs* somewhere else, so if you add a
> paragraph
> > and forget to increment the number, your doc no longer opens.
> >
> > Or treating the XML file like a relational database, where you have a
> list
> > of styles with numbered IDs one place, and then have those numbers
> applied
> > to paragraphs somewhere else. This is an excellent programming technique,
> > but for the guy just trying to casually go in and tweak something, or
> > casually trying to programmatically generate LyX data, it can be daunting
> > indeed. Personally, I love having my style defs in the layout file and
> > using the style names as their identifiers.
> >
> > Then there's this habit of people like OpenOffice, where the native
> format
> > is a Zip file unzipping to different directories, each containing XML
> files
> > and other types of files. Yeah, I just dare anyone to generate OpenOffice
> > on the fly.
> >
> > I suggest that whatever you decide, you document the XML structure. I
> don't
> > mean document as in "it's open source, read the code". I mean document as
> > in "Here is the data hierarchy, here is the high level data design, here
> > are our reasons for doing it this way, here are the data
> interdependencies,
> > here are some tips for building LyX files programmatically and tweaking
> > them either programmatically or with an editor. And here is a tutorial on
> > building and tweaking LyX files without the LyX front end.
> >
> > I'm busy these days, but if you keep me in the loop I'll do at least a
> good
> > chunk of that documentation.
> >
> > One more thing -- if you're going XML and don't want to reinvent the
> wheel,
> > you'll be using someone else's XML parser. Please, please, PLEASE, don't
> > make it some parser with tons of dependency so that the guy with a 2 year
> > old distro can't compile LyX because of the XML parser. We already have
> > enough problems with Qt dependencies.
> >
> > Thanks
> >
> > SteveT
> >
> > Steve Litt
> > Recession Relief Package
> > http://www.recession-relief.US
>
>
>


-- 
Manveru
jabber: [EMAIL PROTECTED]
gg: 1624001
http://www.manveru.pl


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-24 Thread Pavel Sanda
> what I claim is that we need better 
> script tools to handle lyx documents. Those tools should be stable across lyx 
> versions and should not depend of any particular file format.

frankly - these are nice dreams, but there is not manpower to do it.
my feeling is that the xml-branch commit activity pefectly shows what will
happen after the worst bugs will be repaired in xml merged trunk.

or you have some particular developer in mind? :))

pavel


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-24 Thread Christian Ridderström

On Wed, 23 Jul 2008, Steve Litt wrote:


As a sed/awk/perl/ruby parser, I appreciate that very much.

The more I think about it, the more I think I should make the XML->YAML 
and YAML->XML converters. That way, if future generations of LyX project 
programmers forget why it's important to space their XML "just so", it 
won't matter. Also, I have a feeling that YAML will be much easier to 
parse than either 1.5.x or XML.


At first I'll do them in Ruby because Ruby has all that stuff built in 
and easy to do.


Did you see José's post about how the lyx2lyx stuff is really inside a 
Python lib (module)?  You'd probably only need a different kind of wrapper 
that calls this module, instead of reinventing everything in Ruby.


/Christian

--
Christian Ridderström, +46-8-768 39 44http://www.md.kth.se/~chr

Re: Progress on the MS Word to LyX conversion (xml)

2008-07-24 Thread José Matos
On Wednesday 23 July 2008 19:24:16 Pavel Sanda wrote:
> this depends on what you master. i'm used on the bunch of small unix
> utilities so i gave that sed example. if you know python you will do in
> python. my point was not propose the best tools but to groan and moan about
> xml :)

FWIW this chunk is from one of my shell scripts:

echo $1
for i in {8..40}
do
echo -n '.'
w=`printf "%.2d0" $i`
f="dfa-$1-$w.dat"
./dfa -s -w $w < $1.dat | ./join-lag.py -l $w -r $1.dates > $f
cut -f1,2 $f | join -a1 dfa.dat - > tmp.dat
mv tmp.dat dfa.dat
done

So as you can see I know more than python. :-)
And yes I know this only works with bash, and that is OK with me. :-)

My point is that it is alright to use the small tools of the trade but we can 
do better because lyx documents are richer than just pure text.

I am not saying that your usage is wrong what I claim is that we need better 
script tools to handle lyx documents. Those tools should be stable across lyx 
versions and should not depend of any particular file format.

> pavel

-- 
José Abílio


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Richard heck

Steve Litt wrote:
At first I'll do them in Ruby because Ruby has all that stuff built in and 
easy to do. Later, depending on performance and the percent of people who 
have Ruby installed, I can convert them to C. There's a C implementation of 
the same YAML parser/emitter that Ruby uses -- Syck. I'm pretty sure there 
are also C or C++ implementations of XML Parsers, although I don't know how 
well they do things like DTD/schema.


  
At present, it's LyX policy that included things should be in Python, 
since we require it anyway.


rh



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Richard heck

José Matos wrote:
That is also the reason why lyx2lyx is nowadays mostly a python library 
(LyX.py) and the script lyx2lyx is just a wrapper around the library.


  
And let me add that anyone who wants to process LyX files on a regular 
basis using external scripts would be well served to learn the basics of 
this library. The interface is really very simple once you get the hang 
of it.


rh



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Richard heck

Steve Litt wrote:
Perhaps our best hope of continuing tweakability of native LyX is to create 
1.5.x to XML and XML to 1.5.x converters. Then all the parsing/tweaking can 
continue to be done in the 1.5.x format.


  
As always, LyX will have such converters, so old formats can be 
imported/exported.


rh



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Richard heck

Steve Litt wrote:

On Wednesday 23 July 2008 07:00, José Matos wrote:
  

XML will not change the current status.

grep '

Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Andre Poenitz
On Wed, Jul 23, 2008 at 10:33:16AM -0400, Steve Litt wrote:
> On Wednesday 23 July 2008 07:00, José Matos wrote:
> 
> > XML will not change the current status.
> >
> > grep '

Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Pavel Sanda
> The next question is why do we need to manipulate lyx files with awk and 
> friends? Is not there something that can should be done by lyx?

search and replace is one of the weak lyx parts and even if we get Tommaso
one day to put his stuff in there are so many place where its of no help.
just look on the things like notes-mutate or graphics settings synchronization
other nonimplemented things come to my mind.

> I have generated lyx files with scripts that have been used in my PhD thesis 
> (almost 40 pages were generated like this) so I can recognize advantages in 
> manipulating lyx files with scripts, but in that case there are better tools 
> than awk and sed.

this depends on what you master. i'm used on the bunch of small unix utilities
so i gave that sed example. if you know python you will do in python. my point
was not propose the best tools but to groan and moan about xml :)

pavel


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Pavel Sanda
> Perhaps our best hope of continuing tweakability of native LyX is to create 
> 1.5.x to XML and XML to 1.5.x converters. Then all the parsing/tweaking can 
> continue to be done in the 1.5.x format.

as have written others 1.6 is still ok. for lyx files assembly you can still
make what you want in 1.6 format and lyx2lyx will convert for you to 1.7 etc.

next possibility is to stick with 1.6 as long as possible :)

> The only thing you and I would have to do is the XML to 1.5.x converter. I'm 

this will be part of the the fileformat transition in lyx itself. moreover xml 
is not my religion, so i will try to keep myself as far as possible from any
xml related coding :D

pavel


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Steve Litt
On Wednesday 23 July 2008 11:05, José Matos wrote:
> On Wednesday 23 July 2008 15:33:16 Steve Litt wrote:
> > The trouble is, XML tags can be anywhere -- spacing and linefeeds are
> > immaterial. That means you can no longer parse based on position, such
> > as:
> >
> > /^begin_layout/
> >
> > because technically the whole XML file could be in a single line. Or a
> > single tag could be split between lines.
>
> Since we control the format I am (almost) sure that we will choose a reader
> friendly output. There is no reason to do otherwise. In terms of size a
> blank or a newline are equivalent, so... :-)
>
> That is why it will be business as usual. :-)
> Not much will change in this regard.

Thanks José,

As a sed/awk/perl/ruby parser, I appreciate that very much.

The more I think about it, the more I think I should make the XML->YAML and 
YAML->XML converters. That way, if future generations of LyX project 
programmers forget why it's important to space their XML "just so", it won't 
matter. Also, I have a feeling that YAML will be much easier to parse than 
either 1.5.x or XML.

The way I envision it, these two converters will be simple standalone commands 
implemented as filters (convert stdin to stdout), very few dependencies. They 
will comply with the Unix Philosophy (little apps that do one thing and do it 
well). Trivial to install. They will be simple enough to be maintained by one 
person. 

They will be encapsulated. They won't need to know about LyX other than its 
XML format, and LyX won't need to know about them. They can be included in 
the LyX distribution, or not.

At first I'll do them in Ruby because Ruby has all that stuff built in and 
easy to do. Later, depending on performance and the percent of people who 
have Ruby installed, I can convert them to C. There's a C implementation of 
the same YAML parser/emitter that Ruby uses -- Syck. I'm pretty sure there 
are also C or C++ implementations of XML Parsers, although I don't know how 
well they do things like DTD/schema.

Thanks

SteveT

Steve Litt
Recession Relief Package
http://www.recession-relief.US



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Steve Litt
On Wednesday 23 July 2008 11:21, José Matos wrote:

> > There may be things wrong with awking, seding and perling data into
> > submission, but the age of these tools is not one of them.
>
> If you add there the coreutils, like tail, cut, paste, merge and so on we
> can do things that spreadsheet programs can only dream of like processing
> Gigs of data with thousands of lines and columns. :-)

:-)  :-)  :-)

Check this out:

http://www.troubleshooters.cxm/lpm/200801/200801.htm

http://www.troubleshooters.cxm/lpm/200802/200802.htm


But seriously -- it's obvious that for the LyX application itself, XML is by 
far the best way to go, and I would never suggest rewriting LyX in awk :-). 
My interest is in quick writes/tweaks of LyX native format files in order to 
do things that LyX isn't equipped to do, like my VimOutliner to LyX script.

STeveT

Steve Litt
Recession Relief Package
http://www.recession-relief.US



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread José Matos
On Wednesday 23 July 2008 14:49:12 Manveru wrote:
> Guys,
>
> Have you even looked at TinyXML?

Thanks for the link. :-)
-- 
José Abílio


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread José Matos
On Wednesday 23 July 2008 15:58:56 Steve Litt wrote:
> Hi Pavel,
>
> Perhaps our best hope of continuing tweakability of native LyX is to create
> 1.5.x to XML and XML to 1.5.x converters. Then all the parsing/tweaking can
> continue to be done in the 1.5.x format.

I will advise against such practice. I hope to explain why in the paragraphs 
below.

> I'm presuming that the LyX developers will create the 1.5.x to XML
> converter so users can upgrade their old docs, and hopefully they would
> keep that converter updated for each new LyX version, so that you and I
> wouldn't need to worry about coding the 1.5.x to XML.

Note that the convertion to xml will only happen after 1.6. I know that your 
argument remains unchanged with this shift and just correct this before 
continuing.

With this said lyx2lyx will be able to convert from pre-xml to xml and vice-
versa.

Our previous experience suggest however that while the forward translation is 
complete the backwards translation results sometimes in the truncation or lots 
of ERT added to preserve the same structure.

For several reasons a transformation from X to X+1 and back again is not 
guaranteed to give the same document bit by bit. Note also that this is not an 
easy task in any way.

The next question is why do we need to manipulate lyx files with awk and 
friends? Is not there something that can should be done by lyx?

I have generated lyx files with scripts that have been used in my PhD thesis 
(almost 40 pages were generated like this) so I can recognize advantages in 
manipulating lyx files with scripts, but in that case there are better tools 
than awk and sed.

That is also the reason why lyx2lyx is nowadays mostly a python library 
(LyX.py) and the script lyx2lyx is just a wrapper around the library.

> The only thing you and I would have to do is the XML to 1.5.x converter.
> I'm pretty darned good with C, and if necessary I can do C++ (but with a C
> accent). If we pick an XML parser with full schema/dtd capability, that
> doesn't have many dependencies, then if you know how to write 1.5.x, I can
> feed you whatever data is needed to write the 1.5.x.
>
> There's another possibility that I think might be better. Using Ruby with
> REXML, I could convert the XML to YAML (http://en.wikipedia.org/wiki/Yaml)
> if you could help me just a little bit with the return trip (YAML to XML).
> I think this would be EVEN BETTER than 1.5.x, because YAML was made for
> exactly what you and I want to do -- parsing with awk/sed/perl/grep/cut. It
> would also remove our responsibility to support 1.5.x syntax in the 22nd
> century.
>
> Using YAML for tweaking, I think there may come a time when you and I would
> say "remember when we had to parse that nasty 1.5.x?"
>
> I can begin this project as soon as the developers give me an XML def and
> an XML file. That way, once they actually specify what they're going to do,
> we'll have the technology for the XML->YAML->XML round trip, and only the
> details will require coding.
>
> What do you think?
>
> StevET

You are welcome both to tell us your requirements around the future xml file 
format and to help us so that in the end we all have a better lyx. Really, all 
help is welcome.

> Steve Litt
> Recession Relief Package
> http://www.recession-relief.US

-- 
José Abílio


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread José Matos
On Wednesday 23 July 2008 15:20:59 Steve Litt wrote:
> When the discussion reverts to "your thingamabob is from another
> decade/century so it must not be good by today's standards", you know that
> thingamabob is pretty darn good, or else there would have been a more
> powerful argument against it.

Pavel is a developer just as I am. In this thread we been teasing each other 
over this issue. In such cases this is an acceptable argument (IMO). ;-)

> First of all, I understand *exactly* why an XML native format is an
> improvement for the LyX application. I'm limiting my point to the concept
> that something old has to be something bad.

That is fair. :-)

> Modern things are usually improvements, but often are not improvements in
> quality or usefulness. They can be improvements to profit margin (e.g. most
> MS Windows "improvements"), or marketing improvements (all the silly little
> expensive features thrown into basic family cars today), or improvements in
> restricting use (DRM), or improvements in price (crummy bicycles from
> Walmart). Sometimes older stuff has more quality or usefulness.

All that is true but in this case the lyx file format and indirectly the lyx 
parser have not been changed in a long time until 2002 not because they were 
perfect but because most developers were afraid to touch and break it. The 
format had been evolving over time and it was a mess with places where 
whitespaces were significant and others were they were for no good reason.

> In 1969 and the early 1970's, Ken Thompson and the gang made Unix with the
> philosophy of little executables that do one thing and do it right. Stdin,
> stdout and pipes were the glue language with which these little executables
> could be cascaded to produce a substantial result. This enabled
> logical-thinking non-developers, and also developers, to produce those
> substantial results in an hour, with perhaps the greatest encapsulation
> that's ever been achieved in the computer world. Each little executable has
> one input and one output, each being a measurable test point. For batch
> processes this "programming" technique is every bit as productive as it was
> 39 years ago.

lyx2lyx that lyx uses to convert between the different file formats works 
using this principle, it acts as a filter receiving from stdin and writing the 
transformation in stdout.

Yet until now there is not a good way to have an external program (script) 
other than lyx to check the validity of a lyx file. For me, at least, this is 
a strong shortcoming of our file format.

> There may be things wrong with awking, seding and perling data into
> submission, but the age of these tools is not one of them.

If you add there the coreutils, like tail, cut, paste, merge and so on we can 
do things that spreadsheet programs can only dream of like processing Gigs of 
data with thousands of lines and columns. :-)

> SteveT

-- 
José Abílio


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread José Matos
On Wednesday 23 July 2008 15:33:16 Steve Litt wrote:
> The trouble is, XML tags can be anywhere -- spacing and linefeeds are
> immaterial. That means you can no longer parse based on position, such as:
>
> /^begin_layout/
>
> because technically the whole XML file could be in a single line. Or a
> single tag could be split between lines.

Since we control the format I am (almost) sure that we will choose a reader 
friendly output. There is no reason to do otherwise. In terms of size a blank 
or a newline are equivalent, so... :-)

That is why it will be business as usual. :-)
Not much will change in this regard.

-- 
José Abílio


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Abdelrazak Younes

Steve Litt wrote:

Perhaps our best hope of continuing tweakability of native LyX is to create
1.5.x to XML and XML to 1.5.x converters. Then all the parsing/tweaking can
continue to be done in the 1.5.x format.

I'm presuming that the LyX developers will create the 1.5.x to XML converter
so users can upgrade their old docs, and hopefully they would keep that
converter updated for each new LyX version, so that you and I wouldn't need
to worry about coding the 1.5.x to XML.


Yes, switching to XML doesn't mean abandoning lyx2lyx. The difference is 
that we will be able to use simpler XSL templates for the conversion. 
The advantage being that the XSL templates will be available to all, not 
being specificy to python or lyx2lyx.


By the way, the switch to XML is not going to happen with 1.6 but with 
1.7, that is at least one year from now ;-)




The only thing you and I would have to do is the XML to 1.5.x converter.


This will be provided by lyx2lyx too. 1.7-XML will export to all 1.x 
formats with x <= 6.



I'm
pretty darned good with C, and if necessary I can do C++ (but with a C
accent). If we pick an XML parser with full schema/dtd capability, that
doesn't have many dependencies, then if you know how to write 1.5.x, I can
feed you whatever data is needed to write the 1.5.x.


As I said above, this 1.7 to 1.6 will be supported via a simple XSL 
stylesheet. It's really the other direction 1.6 to 1.7 that will be 
difficult to implement.
But hey, all help is welcome, the development of 1.7 is going to begin 
in a couple of months so if you want to have a say in the new XML 
format, come along on the devel list ;-)


Abdel.



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Steve Litt
On Tuesday 22 July 2008 19:24, Pavel Sanda wrote:
> > Pavel Sanda wrote:
> > Moreover, if you're editing by hand, you can use
> > something that recognizes XML.
>
> of course it will work, but it will take x-times more time.
> quite difference to write sed one-liner or start doing some
> xslt templating.
>
> pavel

Hi Pavel,

Perhaps our best hope of continuing tweakability of native LyX is to create 
1.5.x to XML and XML to 1.5.x converters. Then all the parsing/tweaking can 
continue to be done in the 1.5.x format.

I'm presuming that the LyX developers will create the 1.5.x to XML converter 
so users can upgrade their old docs, and hopefully they would keep that 
converter updated for each new LyX version, so that you and I wouldn't need 
to worry about coding the 1.5.x to XML.

The only thing you and I would have to do is the XML to 1.5.x converter. I'm 
pretty darned good with C, and if necessary I can do C++ (but with a C 
accent). If we pick an XML parser with full schema/dtd capability, that 
doesn't have many dependencies, then if you know how to write 1.5.x, I can 
feed you whatever data is needed to write the 1.5.x.

There's another possibility that I think might be better. Using Ruby with 
REXML, I could convert the XML to YAML (http://en.wikipedia.org/wiki/Yaml) if 
you could help me just a little bit with the return trip (YAML to XML). I 
think this would be EVEN BETTER than 1.5.x, because YAML was made for exactly 
what you and I want to do -- parsing with awk/sed/perl/grep/cut. It would 
also remove our responsibility to support 1.5.x syntax in the 22nd century.

Using YAML for tweaking, I think there may come a time when you and I would 
say "remember when we had to parse that nasty 1.5.x?"

I can begin this project as soon as the developers give me an XML def and an 
XML file. That way, once they actually specify what they're going to do, 
we'll have the technology for the XML->YAML->XML round trip, and only the 
details will require coding.

What do you think?

StevET

Steve Litt
Recession Relief Package
http://www.recession-relief.US


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Steve Litt
On Wednesday 23 July 2008 07:00, José Matos wrote:

> XML will not change the current status.
>
> grep '

Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Steve Litt
On Tuesday 22 July 2008 18:21, José Matos wrote:

> Clearly you did not had to deal with the lyx file format like I did. :-)
> If your idea of a parser is a set of regexp's that is so 80's. ;-)
[clip]
> It is funny to see all this nostalgia around something that is/was a
> nightmare. If the syntax was so clear you would not have the problem of
> crashing lyx with a bad formed file (a file modified by scripts).

When the discussion reverts to "your thingamabob is from another 
decade/century so it must not be good by today's standards", you know that 
thingamabob is pretty darn good, or else there would have been a more 
powerful argument against it.

First of all, I understand *exactly* why an XML native format is an 
improvement for the LyX application. I'm limiting my point to the concept 
that something old has to be something bad.

Modern things are usually improvements, but often are not improvements in 
quality or usefulness. They can be improvements to profit margin (e.g. most 
MS Windows "improvements"), or marketing improvements (all the silly little 
expensive features thrown into basic family cars today), or improvements in 
restricting use (DRM), or improvements in price (crummy bicycles from 
Walmart). Sometimes older stuff has more quality or usefulness.

In 1969 and the early 1970's, Ken Thompson and the gang made Unix with the 
philosophy of little executables that do one thing and do it right. Stdin, 
stdout and pipes were the glue language with which these little executables 
could be cascaded to produce a substantial result. This enabled 
logical-thinking non-developers, and also developers, to produce those 
substantial results in an hour, with perhaps the greatest encapsulation 
that's ever been achieved in the computer world. Each little executable has 
one input and one output, each being a measurable test point. For batch 
processes this "programming" technique is every bit as productive as it was 
39 years ago.

There may be things wrong with awking, seding and perling data into 
submission, but the age of these tools is not one of them.

SteveT

Steve Litt
Recession Relief Package
http://www.recession-relief.US



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Manveru
Guys,

Have you even looked at TinyXML?

I have a project once where we use XML as a message passing protocol and we
were using XSLT as C++ code generator for classes handling XML and
converting them to data structures handling all data we need. This freed us
from portability problems (Litte Endian, Big Endian) which is not case here.
For the application like LyX binary structure may be better to handle -
certainly much work to do. We in our project hadn't found any known DOM
useful for our purpose.

Cheers!
M.

2008/7/23 José Matos <[EMAIL PROTECTED]>:

> On Wednesday 23 July 2008 12:19:16 Pavel Sanda wrote:
> > i've done incorrect file, it's my fault if lyx crashes. i take my
> > responsibility, no problem.
> > trial method is the fastest if you want something quickly.
>
> If LyX crashes that is a bug. LyX should not ever crash, it can refused to
> load a file because it is invalid, or to truncate it but it should not ever
> crash.
>
> In the whole picture our parser is one of our weak links so we should do
> something about it. Replace it in this case.
>
> > > First make it correct and then make it fast.
> >
> > i have exactly oposite view as far as the tweaking i was talking about
> > is concerned; i just need quickly output of something, may be i will
> throw
> > it away after few days.
> >
> > or take Steve's example - if he takes your 'First make it correct and
> then
> > make it fast' it would take some two weaks to invent some beast to be
> > correct in your sense. but then the whole point is lost, since after this
> > time he could do it manually.
> >
> > i guess we can't agree on this, since i'm not talking about lyx
> internals,
> > while your job is to make lyx format conversions on lyx level... but this
> > is users list, not the the devel one, so i feel free to speak this way :)
>
> Yes, I know but I can pretend otherwise. ;-)
>
> > pavel
>
> --
> José Abílio
>



-- 
Manveru
jabber: [EMAIL PROTECTED]
gg: 1624001
http://www.manveru.pl


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread José Matos
On Wednesday 23 July 2008 12:19:16 Pavel Sanda wrote:
> i've done incorrect file, it's my fault if lyx crashes. i take my
> responsibility, no problem.
> trial method is the fastest if you want something quickly.

If LyX crashes that is a bug. LyX should not ever crash, it can refused to 
load a file because it is invalid, or to truncate it but it should not ever 
crash.

In the whole picture our parser is one of our weak links so we should do 
something about it. Replace it in this case.

> > First make it correct and then make it fast.
>
> i have exactly oposite view as far as the tweaking i was talking about
> is concerned; i just need quickly output of something, may be i will throw
> it away after few days.
>
> or take Steve's example - if he takes your 'First make it correct and then
> make it fast' it would take some two weaks to invent some beast to be
> correct in your sense. but then the whole point is lost, since after this
> time he could do it manually.
>
> i guess we can't agree on this, since i'm not talking about lyx internals,
> while your job is to make lyx format conversions on lyx level... but this
> is users list, not the the devel one, so i feel free to speak this way :)

Yes, I know but I can pretend otherwise. ;-)

> pavel

-- 
José Abílio


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread Pavel Sanda
> On Wednesday 23 July 2008 00:19:09 Pavel Sanda wrote:
> > while you are right that xml could be better technology for internal
> > lyx parsing (and i can understand your viewpoint as lyx2lyx fan:)
> > this was not my mail about.
> >
> > > It is funny to see all this nostalgia around something that is/was a
> > > nightmare.
> >
> > it has nothing to do with nostalgia, but speed of hacking around.
> 
> Not when the resulting file crashes lyx, something that should not ever 
> happen 
> but that it does now.

i've done incorrect file, it's my fault if lyx crashes. i take my 
responsibility,
no problem.
trial method is the fastest if you want something quickly.

> First make it correct and then make it fast.

i have exactly oposite view as far as the tweaking i was talking about
is concerned; i just need quickly output of something, may be i will throw
it away after few days.

or take Steve's example - if he takes your 'First make it correct and then
make it fast' it would take some two weaks to invent some beast to be 
correct in your sense. but then the whole point is lost, since after this
time he could do it manually.

i guess we can't agree on this, since i'm not talking about lyx internals,
while your job is to make lyx format conversions on lyx level... but this
is users list, not the the devel one, so i feel free to speak this way :)

pavel


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-23 Thread José Matos
On Wednesday 23 July 2008 00:19:09 Pavel Sanda wrote:
> by 'outside' i mean tweakings which i regularly do and watching users list
> power users do that too _and_ are happy about the current simplicity of
> format.
>
> tweaks like assembling of the whole file for various datasets, global
> changes of things (cf notes-mutate lfun i introduced lately), conversions
> and so on.

This works well for simple things but breaks badly when you try something a 
bit more complex.

> while you are right that xml could be better technology for internal
> lyx parsing (and i can understand your viewpoint as lyx2lyx fan:)
> this was not my mail about.
>
> > It is funny to see all this nostalgia around something that is/was a
> > nightmare.
>
> it has nothing to do with nostalgia, but speed of hacking around.

Not when the resulting file crashes lyx, something that should not ever happen 
but that it does now. First make it correct and then make it fast.

XML will not change the current status.

grep '

Re: Progress on the MS Word to LyX conversion (xml)

2008-07-22 Thread Steve Litt
On Tuesday 22 July 2008 19:24, Pavel Sanda wrote:
> > Pavel Sanda wrote:
> > Moreover, if you're editing by hand, you can use
> > something that recognizes XML.
>
> of course it will work, but it will take x-times more time.
> quite difference to write sed one-liner or start doing some
> xslt templating.
>
> pavel

Yeah, I think this was the point I was trying to get across. With the current 
format, you can do a lot with Vim. Or you can run through a series of small 
filters that do just one thing.

XML's a different animal. Without a parser, it's almost impossible to handle. 
With a parser, you're forced to work only within the language of that parser, 
and you're forced to make a monolithic solution that can't take advantage of 
Unix pipes and small executables that do one thing and do it well. You also 
forgo the ability to have a series of intermediate files, each serving as a 
test point to make sure things are still going well.

Also, an XML parser, especially a DOM one, makes READING XML very easy, but it 
does nothing for WRITING.

Pavel -- you and I and others like us need to start identifying parsing tools 
to at least partially compensate for the loss of our Unix based pipes with 
small filter executables. Theoretically, if one could read the XML into a DOM 
tree, tweak it in memory, and then write it back out, that would be at least 
somewhat doable, though nothing like the Awk and Perl techniques I'm used to.

And once again, we need COMPLETE documentation on the XML dialect, and Like I 
said I'm willing to help with that documentation.

SteveT

Steve Litt
Recession Relief Package
http://www.recession-relief.US



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-22 Thread Pavel Sanda
> Pavel Sanda wrote:
> Moreover, if you're editing by hand, you can use 
> something that recognizes XML.

of course it will work, but it will take x-times more time.
quite difference to write sed one-liner or start doing some
xslt templating.

pavel


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-22 Thread Pavel Sanda
> On Tuesday 22 July 2008 22:54:14 Pavel Sanda wrote:
> >
> > now you are joking right? :) i just see all the bugs just because '>' is
> > redirection. and imho manually generate \begin_layout Standard is more
> > simpler
> > then typing .
> 
> You are welcome to reimplement lyx in shell, good luck. :-)
> 
> > now imagine those regexps where you need to escape all those \"
> >
> > in conclusion xml will be pain for people trying to use .lyx files
> > directly with scripts etc.
> 
> Clearly you did not had to deal with the lyx file format like I did. :-) 
> If your idea of a parser is a set of regexp's that is so 80's. ;-)

clearly you haven't understand my point. i was not talking at all about lyx
internal parsing, but about 'outside' usage.

by 'outside' i mean tweakings which i regularly do and watching users list
power users do that too _and_ are happy about the current simplicity of format.

tweaks like assembling of the whole file for various datasets, global changes
of things (cf notes-mutate lfun i introduced lately), conversions and so on.

while you are right that xml could be better technology for internal
lyx parsing (and i can understand your viewpoint as lyx2lyx fan:)
this was not my mail about.

> It is funny to see all this nostalgia around something that is/was a 
> nightmare.

it has nothing to do with nostalgia, but speed of hacking around.

pavel


Re: Progress on the MS Word to LyX conversion

2008-07-22 Thread John
On Wednesday 23 July 2008 08:04:59 am Steve Litt wrote:
> On Tuesday 22 July 2008 11:32, rgheck wrote:
> > Steve Litt wrote:
> > > I don't know how it will be after LyX goes XML, but right now at 1.5.3,
> > > converting my LyX code to something else by parsing the LyX native code
> > > would be trivial.
This is probably teaching Grandma to suck eggs - but 
There is a very good set of XML utilities available in Linux which alloy you 
easily parse and transform .xml files into almost anything you want (using 
xslt, sax, and friends. In openSUSE it is called xmlstarlet and comes with 
the installation CDs or DVD.
These should make it easy to translate to and from LyX (when it finally goes 
fully XML). 

John O'Gorman
> >
> > My understanding is that, whatever happens with the LyX file format, we
> > want it to remain possible to do the sort of simple scripting we all
> > like to be able to do. The XML business is really just a matter of
> > replacing things like this:
> >
> > \begin_layout Standard
> > this.
> > \end_layout
> >
> > \begin_layout Standard
> > \begin_inset CommandInset bibtex
> > LatexCommand bibtex
> > bibfiles "/tmp/bib"
> > options "plain"
> >
> > \end_inset
> >
> >
> > \end_layout
> >
> > with things like this:
> >
> > 
> > this.
> > 
> >
> > 
> >  > /> 
> >
> > Just as easy to parse, I hope. Maybe even easier.
> >
> > That's not anything actually agreed or implemented
>
> It's not as easy to parse, but it's reasonable. If that's the extent of the
> XMLization of LyX, it should still be somewhat tweakable with Vim, Perl,
> etc.
>
> The real problems come in when they do things in XML that would be
> denormalization in a database. Store the paragraphs one place, and then
> store the *number of paragraphs* somewhere else, so if you add a paragraph
> and forget to increment the number, your doc no longer opens.
>
> Or treating the XML file like a relational database, where you have a list
> of styles with numbered IDs one place, and then have those numbers applied
> to paragraphs somewhere else. This is an excellent programming technique,
> but for the guy just trying to casually go in and tweak something, or
> casually trying to programmatically generate LyX data, it can be daunting
> indeed. Personally, I love having my style defs in the layout file and
> using the style names as their identifiers.
>
> Then there's this habit of people like OpenOffice, where the native format
> is a Zip file unzipping to different directories, each containing XML files
> and other types of files. Yeah, I just dare anyone to generate OpenOffice
> on the fly.
>
> I suggest that whatever you decide, you document the XML structure. I don't
> mean document as in "it's open source, read the code". I mean document as
> in "Here is the data hierarchy, here is the high level data design, here
> are our reasons for doing it this way, here are the data interdependencies,
> here are some tips for building LyX files programmatically and tweaking
> them either programmatically or with an editor. And here is a tutorial on
> building and tweaking LyX files without the LyX front end.
>
> I'm busy these days, but if you keep me in the loop I'll do at least a good
> chunk of that documentation.
>
> One more thing -- if you're going XML and don't want to reinvent the wheel,
> you'll be using someone else's XML parser. Please, please, PLEASE, don't
> make it some parser with tons of dependency so that the guy with a 2 year
> old distro can't compile LyX because of the XML parser. We already have
> enough problems with Qt dependencies.
>
> Thanks
>
> SteveT
>
> Steve Litt
> Recession Relief Package
> http://www.recession-relief.US




Re: Progress on the MS Word to LyX conversion (xml)

2008-07-22 Thread rgheck

José Matos wrote:

now imagine those regexps where you need to escape all those \"

in conclusion xml will be pain for people trying to use .lyx files
directly with scripts etc.



Clearly you did not had to deal with the lyx file format like I did. :-) 
If your idea of a parser is a set of regexp's that is so 80's. ;-)


  
In fairness, I think he was talking about little hacked scripts to do 
the kind of search-and-replace that isn't possible yet in LyX itself. So 
you don't really have a parser in that case. Just a very long string. ;-)


This seems to me like the debate between strong and bold. I want to parse the 
lyx file on a content based stream, not just a set of lines.


After the change to xml the regularity will still be there with the added 
bonus that finally it will be consistent. We took 6 years to clean the lyx 
format to a reasonable state and we are still not there yet.


  
So, Jose, are we ever actually going to do this? If so, then it seems to 
me we ought to decide to do it, halt other development for the few weeks 
it would take, and do it. I don't think it would really be that hard to 
have it working. The existing parser could be tweaked for the short 
term. It's already capable of dealing with tabulars, and those are 
written as XML already. Longer term, we'd prefer libxml2 or 
something---SAX, I assume, rather than DOM---, but that could be done 
after the format had stabilized.


Yeah, I know, wrong list.

rh



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-22 Thread rgheck

Pavel Sanda wrote:

Steve Litt wrote:

this.






Just as easy to parse, I hope. Maybe even easier.



now you are joking right? :) i just see all the bugs just because '>' is 
redirection.

  

Only in the shell, right?


now imagine those regexps where you need to escape all those \"

  
There's lots of that in LyX now. But it's easy to deal with in Python, 
at least, via the r'' quoter. And in Perl, you have qr//. So the quotes 
aren't really a problem. Moreover, if you're editing by hand, you can 
use something that recognizes XML.


But, well, XML isn't exactly around the corner, anyway, so far as I can 
tell.


rh



Re: Progress on the MS Word to LyX conversion (xml)

2008-07-22 Thread José Matos
On Tuesday 22 July 2008 22:54:14 Pavel Sanda wrote:
>
> now you are joking right? :) i just see all the bugs just because '>' is
> redirection. and imho manually generate \begin_layout Standard is more
> simpler
> then typing .

You are welcome to reimplement lyx in shell, good luck. :-)

> now imagine those regexps where you need to escape all those \"
>
> in conclusion xml will be pain for people trying to use .lyx files
> directly with scripts etc.

Clearly you did not had to deal with the lyx file format like I did. :-) 
If your idea of a parser is a set of regexp's that is so 80's. ;-)

This seems to me like the debate between strong and bold. I want to parse the 
lyx file on a content based stream, not just a set of lines.

After the change to xml the regularity will still be there with the added 
bonus that finally it will be consistent. We took 6 years to clean the lyx 
format to a reasonable state and we are still not there yet.

It is funny to see all this nostalgia around something that is/was a 
nightmare. If the syntax was so clear you would not have the problem of 
crashing lyx with a bad formed file (a file modified by scripts).

> pavel

-- 
José Abílio


Re: Progress on the MS Word to LyX conversion

2008-07-22 Thread José Matos
On Tuesday 22 July 2008 21:04:59 Steve Litt wrote:
> One more thing -- if you're going XML and don't want to reinvent the wheel,
> you'll be using someone else's XML parser. Please, please, PLEASE, don't
> make it some parser with tons of dependency so that the guy with a 2 year
> old distro can't compile LyX because of the XML parser. We already have
> enough problems with Qt dependencies.

The idea is to have a DTD to describe the XML and to use a standard parser 
like libxml2. This should meet both criteria. :-)

> Thanks
>
> SteveT

-- 
José Abílio


Re: Progress on the MS Word to LyX conversion (xml)

2008-07-22 Thread Pavel Sanda
> Steve Litt wrote:
> 
> this.
> 
>
> 
> 
> 
>
> Just as easy to parse, I hope. Maybe even easier.

now you are joking right? :) i just see all the bugs just because '>' is 
redirection.
and imho manually generate \begin_layout Standard is more simpler
then typing . 

now imagine those regexps where you need to escape all those \"

in conclusion xml will be pain for people trying to use .lyx files
directly with scripts etc.
pavel


Re: Progress on the MS Word to LyX conversion

2008-07-22 Thread Steve Litt
On Tuesday 22 July 2008 11:32, rgheck wrote:
> Steve Litt wrote:
> > I don't know how it will be after LyX goes XML, but right now at 1.5.3,
> > converting my LyX code to something else by parsing the LyX native code
> > would be trivial.
>
> My understanding is that, whatever happens with the LyX file format, we
> want it to remain possible to do the sort of simple scripting we all
> like to be able to do. The XML business is really just a matter of
> replacing things like this:
>
> \begin_layout Standard
> this.
> \end_layout
>
> \begin_layout Standard
> \begin_inset CommandInset bibtex
> LatexCommand bibtex
> bibfiles "/tmp/bib"
> options "plain"
>
> \end_inset
>
>
> \end_layout
>
> with things like this:
>
> 
> this.
> 
>
> 
> 
> 
>
> Just as easy to parse, I hope. Maybe even easier.
>
> That's not anything actually agreed or implemented

It's not as easy to parse, but it's reasonable. If that's the extent of the 
XMLization of LyX, it should still be somewhat tweakable with Vim, Perl, etc.

The real problems come in when they do things in XML that would be 
denormalization in a database. Store the paragraphs one place, and then store 
the *number of paragraphs* somewhere else, so if you add a paragraph and 
forget to increment the number, your doc no longer opens.

Or treating the XML file like a relational database, where you have a list of 
styles with numbered IDs one place, and then have those numbers applied to 
paragraphs somewhere else. This is an excellent programming technique, but 
for the guy just trying to casually go in and tweak something, or casually 
trying to programmatically generate LyX data, it can be daunting indeed. 
Personally, I love having my style defs in the layout file and using the 
style names as their identifiers.

Then there's this habit of people like OpenOffice, where the native format is 
a Zip file unzipping to different directories, each containing XML files and 
other types of files. Yeah, I just dare anyone to generate OpenOffice on the 
fly.

I suggest that whatever you decide, you document the XML structure. I don't 
mean document as in "it's open source, read the code". I mean document as 
in "Here is the data hierarchy, here is the high level data design, here are 
our reasons for doing it this way, here are the data interdependencies, here 
are some tips for building LyX files programmatically and tweaking them 
either programmatically or with an editor. And here is a tutorial on building 
and tweaking LyX files without the LyX front end.

I'm busy these days, but if you keep me in the loop I'll do at least a good 
chunk of that documentation.

One more thing -- if you're going XML and don't want to reinvent the wheel, 
you'll be using someone else's XML parser. Please, please, PLEASE, don't make 
it some parser with tons of dependency so that the guy with a 2 year old 
distro can't compile LyX because of the XML parser. We already have enough 
problems with Qt dependencies.

Thanks

SteveT

Steve Litt
Recession Relief Package
http://www.recession-relief.US



Re: Progress on the MS Word to LyX conversion

2008-07-22 Thread rgheck

Steve Litt wrote:
I don't know how it will be after LyX goes XML, but right now at 1.5.3, 
converting my LyX code to something else by parsing the LyX native code would 
be trivial.


  
My understanding is that, whatever happens with the LyX file format, we 
want it to remain possible to do the sort of simple scripting we all 
like to be able to do. The XML business is really just a matter of 
replacing things like this:


\begin_layout Standard
this.
\end_layout

\begin_layout Standard
\begin_inset CommandInset bibtex
LatexCommand bibtex
bibfiles "/tmp/bib"
options "plain"

\end_inset


\end_layout

with things like this:


this.






Just as easy to parse, I hope. Maybe even easier.

That's not anything actually agreed or implemented

rh



Re: Progress on the MS Word to LyX conversion

2008-07-22 Thread Steve Litt
On Tuesday 22 July 2008 06:32, Christian Ridderström wrote:
> On Mon, 21 Jul 2008, Steve Litt wrote:
> > This morning I got an acceptably tagged text file out of MS Word. From
> > that moment on, things got much easier.
>
> Congratulations!
>
> I put a reference to your post on a wiki page, giving others that need to
> do this a starting point. (If you want to summarize how you did it and
> post the relevant scripts on the wiki, I can help you with it). Here's the
> page:
>   http://wiki.lyx.org/Tools/Word2LyXConversionProcess
>
> While doing this, I found this page:


Thanks Christian!

One use for the new page is showing people how to convert word to LyX while 
preserving all styles. Perhaps an even greater use for this page is showing 
people the mess they'll get themselves into by using MS Word to write a book.

I don't know how it will be after LyX goes XML, but right now at 1.5.3, 
converting my LyX code to something else by parsing the LyX native code would 
be trivial.

Thanks

SteveT

Steve Litt
Recession Relief Package
http://www.recession-relief.US



Re: Progress on the MS Word to LyX conversion

2008-07-22 Thread Christian Ridderström

On Mon, 21 Jul 2008, Steve Litt wrote:


This morning I got an acceptably tagged text file out of MS Word. From that
moment on, things got much easier.


Congratulations!

I put a reference to your post on a wiki page, giving others that need to 
do this a starting point. (If you want to summarize how you did it and 
post the relevant scripts on the wiki, I can help you with it). Here's the 
page:

http://wiki.lyx.org/Tools/Word2LyXConversionProcess

While doing this, I found this page:

http://wiki.lyx.org/Tools/Word2LyXMacro

Maybe it can help you with the tables if nothing else?

/Christian

--
Christian Ridderström, +46-8-768 39 44http://www.md.kth.se/~chr

Progress on the MS Word to LyX conversion

2008-07-21 Thread Steve Litt
This morning I got an acceptably tagged text file out of MS Word. From that 
moment on, things got much easier.

I made a perl script to remove end tags, and instead put start tags on all 
lines between a start and end. It also made sure there were no interlinking 
tag sets. It also put all the start tags in the same format and easily 
parsable. I hadn't thought to do that when converting out of MS Word -- I had 
bigger fish to fry at the time.

I hadn't marked Normal paragraphs, so my program had to deduce which lines 
weren't marked already, and put a b_pstyle_normal::: start tag on them.

Armed with proper start tags on every line (which is actually a paragraph), it 
was pretty easy to pipe that through something that added the \begin_layout 
Whatever and \end_layout commands. At this point I have NOT removed the start 
or end tags -- I want some redundancy for checking. I also added a little C 
program to get rid of the '\015' characters that DOS put in.

I made a layout with dummy styles for each style I used (sort -u came in very 
handy for this).

Anyway, my program can make the body of a LyX file, and all the 
Part/Chapter/Section etc works perfectly, and it seems like all the other 
paragraph styles are working. It's basically a pipeline of little filters 
creating a LyX file from the text file, and I can do it over and over to my 
heart's content. 

I imagine tomorrow I'll add the code to handle character styles, and start 
making my layout file create effects that look how they're supposed to. That 
will help in looking at the produced PDF (it already produces a PDF, so the 
basic code is correct).

Bottom line, I now have a text file with tags representing all my document's 
original style, and I've created perl, awk, sed and C code to convert it to a 
LyX document with my styles preserved.

Anyway, thanks for all the help.

SteveT

Steve Litt
Recession Relief Package
http://www.recession-relief.US