[Jprogramming] parsing nif.xml, part 2 of n

Raul Miller Wed, 12 Feb 2014 10:51:44 -0800

Previously, in
http://jsoftware.com/pipermail/programming/2014-February/034926.html I went
through a basic illustration of sax/xml. There are more examples at
http://www.jsoftware.com/jwiki/Addons/xml/sax


And, if you deploy something based on J in a corporate context, it would be
a good idea to remind them that if need be they can hire the people at ISI
to support these kinds of applications. Good managers tend to like
redundancy - they are dealing with people and it's important to have
somewhere to turn in the event of problems. (Not that people ever have
problems or anything...) This suggests error handling and error reporting
and strategies and tactics for isolating and coping with errors might be
useful subject of study.

Anyways, in "part 1" I hard coded rules expressing what I wanted to extract
from the xml file. The code would only extract that one thing. This can be
useful in some cases, but in other cases we might need something more data
driven. In the case of nif.xml we have a single huge xml file and we might
want to extract different elements from it.

Worse, the file is something of an ad-hoc collection of rules. In one
sense, it is documentation on file formats for a large collection of games.
In another sense, it's an extension of several different programs used by
people modifying those games. One code base is in python (
https://github.com/amorilia/niflib) and another code base is in C++ (
http://niftools.sourceforge.net/wiki/NifSkope) and I guess I am starting
here, building a code base in J (though I currently have no intention of
supporting all of those games - I've other plans), and there could easily
be additional uses of the file.

What this means is that I would like to parse this file a number of  times,
each time extracting a bit of information which is relevant to whatever I
happen to be focusing on. I may never be interested in the file as a whole
- that's someone else's "job".

Now I could of course edit the code from part 1, and then run it, each time
I want to extract something different. That could be a very good coding
exercise also. But there's another part of the xml/sax addon that I could
also use: x2j.

So let's invent some requirements, so we can write some code.

I'm going to define a verb 'compound' which takes an argument which must
match the name= attribute from nif.xml of a <compound> element.
Furthermore, I am going to define a noun 'Version' and I am going to strip
from the result any <add> elements whose ver1= attribute has a larger
version or whose ver2= attribute has a smaller version.

(Caution: there are multiple versions of the nif.xml file itself - for
example, consider
https://github.com/throttlekitty/nifxml/commits/master/nif.xml - but there
are also versions of the .nif files which get documented by the underlying
nif.xml. I am actually focusing on the .nif version. Distinctions which
seem subtle in theory are rather blatant in practice.)

Nif versions are described using the same format as ip addresses. In other
words, version 20.6.5.0 (Epic Mickey) will be represented in a .nif file as:
   256 #. 20 6 5 0
335938816

So.. code:

require 'xml/sax/x2j'

x2jclass 'nifxml'

extract=:4 :0
  Name=: x
  process fread y
)

NB. nif versioning:
vton=: 256 #. '.' 0&".;._1@, ]
Version=: vton '4.0.0.2' NB. Morrowind

'Items' x2jDefn NB. dispatch xml event handlers
  /        := Result : Result=: ''
  compound := cEnd y : x cStart y
  add      := aEnd y : x aStart y
  add      := aChr y
)

NB. --------- "low level" implementation --------
Interesting=: 0

ver1=: (>: vton)`1:@.(-:&_1@]) atr bind 'ver1'
ver2=: (<: vton)`1:@.(-:&_1@]) atr bind 'ver2'
showStart=:4 :0
  if. Interesting do.
    atrs=. ([,'="',],'"'"_)&.>/"1 }.attributes x
    Result=: Result,'<',(;:inv (<y),atrs),'>'
  end.
)

showEnd=:3 :0
  if. Interesting do.
    Result=: Result,'</',y,'>',LF
  end.
)

showCharacters=: 3 :0
  if. Interesting do.
    Result=: Result,y
  end.
)

NB. ------- "glue" ------------------------------
cStart=:4 :0 NB. event handlers for <compound>
  if. Name -: atr 'name' do.
    Interesting=: 1
  end.
  x showStart y
)
cEnd=:3 :0
  showEnd y
  Interesting=: 0
)

aStart=:4 :0  NB. event handlers for <add>
  WasInteresting=: Interesting
  Interesting=: Interesting * (ver1 * ver2) Version
  x showStart y
)
aEnd=:3 :0
  showEnd y
  Interesting=: WasInteresting
)
aChr=: showCharacters

-------------------------------------------------------------------------------------

Example use:
   'Header' extract_nifxml_ 'c:\users\rdmiller\desktop\furniture\nif.xml'
<compound><add type="HeaderString">'NetImmerse File Format x.x.x.x'
(versions <= 10.0.1.2) or 'Gamebryo File Format x.x.x.x' (versions >=
10.1.0.0), with x.x.x.x the version written out. Ends with a newline
character (0x0A).</add>
<add type="FileVersion" default="0x04000002" ver1="3.3.0.13">The NIF
version, in hexadecimal notation: 0x04000002, 0x0401000C, 0x04020002,
0x04020100, 0x04020200, 0x0A000100, 0x0A010000, 0x0A020000, 0x14000004,
...</add>
<add type="ulittle32" ver1="3.3.0.13">Number of file objects.</add>
</compound>

There are several things worth noting here:

First, the result here is much simpler than the version I posted in "part
1". This file represents documentation for a lot of games, and a good part
of that information is not relevant to this particular game.

Second, this style of xml processing is slow. It's side effect driven and
results are recomputed frequently which means there's a perceptible lag in
parsing the file and when the results show up. (Almost a second, for me.) I
imagine that I could do a lot better, but sometimes getting results is more
important than computational speed.  Interestingly, about 20% of the time
is spent on this line:
     Interesting=: Interesting * (ver1 * ver2) Version

Given how inefficient that is (it's extracting version attributes from
every <add> element in the file, even if they are irrelevant), I am
slightly surprised that it's not a bigger chunk of time.

I should say a few more things, about the details of the implementation.
But this message has gotten long enough already. So I will instead wait to
see if anyone feels the need for further explanation.

Meanwhile, where do I hope to take this?

The nif.xml is, in a sense, documentation. It is also, in a sense, code.
Like I mentioned in "part 1", there are already several code bases that use
the nif.xml file to guide the parsing of .nif files. Conceptually speaking,
parts of the structured text in nif.xml refers to hardcoded functionality
(and the rest is basically just documentation).

Anyways, it has occurred to me that working through an implementation of
.nif file parsing (something I have done before, for fun) and taking it
through to rendering of the represented 3d objects (something else I have
done before, for fun) and possibly even creating some small manipulation
tools (another thing I have enjoyed doing) could serve as examples of how
to tackle a wide variety of coding tasks.

Plus, handling of 3d representations of objects has significant use in
"maker communities" (basically: groups of people who focus on the design
and use "machine" tools, and other engineering and artistic projects). I
have not figured out how to tie electrical engineering into the mix (nor
some other important topics) but maybe those can wait for later on and
handling of other media types (sound, maybe, for the electrical engineering
side of things).

That said, xml is just a bit awkward to represent and think about. It's a
mix of sequential material, nested structure and named structure, with
several forms of redundancy which make general purpose handling
ridiculously abstract. So practical uses of xml are obviously focused on
specializations of the general concepts. This issue - specialization of
overly general concepts - turns out to be a common theme in requirements
gathering, in system design and in program implementation. So it's worth
working through some different examples now and then.

Thanks,

-- 
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

[Jprogramming] parsing nif.xml, part 2 of n

Reply via email to