Previously, in http://jsoftware.com/pipermail/programming/2014-February/034926.html I went through a basic illustration of sax/xml. There are more examples at http://www.jsoftware.com/jwiki/Addons/xml/sax
And, if you deploy something based on J in a corporate context, it would be a good idea to remind them that if need be they can hire the people at ISI to support these kinds of applications. Good managers tend to like redundancy - they are dealing with people and it's important to have somewhere to turn in the event of problems. (Not that people ever have problems or anything...) This suggests error handling and error reporting and strategies and tactics for isolating and coping with errors might be useful subject of study. Anyways, in "part 1" I hard coded rules expressing what I wanted to extract from the xml file. The code would only extract that one thing. This can be useful in some cases, but in other cases we might need something more data driven. In the case of nif.xml we have a single huge xml file and we might want to extract different elements from it. Worse, the file is something of an ad-hoc collection of rules. In one sense, it is documentation on file formats for a large collection of games. In another sense, it's an extension of several different programs used by people modifying those games. One code base is in python ( https://github.com/amorilia/niflib) and another code base is in C++ ( http://niftools.sourceforge.net/wiki/NifSkope) and I guess I am starting here, building a code base in J (though I currently have no intention of supporting all of those games - I've other plans), and there could easily be additional uses of the file. What this means is that I would like to parse this file a number of times, each time extracting a bit of information which is relevant to whatever I happen to be focusing on. I may never be interested in the file as a whole - that's someone else's "job". Now I could of course edit the code from part 1, and then run it, each time I want to extract something different. That could be a very good coding exercise also. But there's another part of the xml/sax addon that I could also use: x2j. So let's invent some requirements, so we can write some code. I'm going to define a verb 'compound' which takes an argument which must match the name= attribute from nif.xml of a <compound> element. Furthermore, I am going to define a noun 'Version' and I am going to strip from the result any <add> elements whose ver1= attribute has a larger version or whose ver2= attribute has a smaller version. (Caution: there are multiple versions of the nif.xml file itself - for example, consider https://github.com/throttlekitty/nifxml/commits/master/nif.xml - but there are also versions of the .nif files which get documented by the underlying nif.xml. I am actually focusing on the .nif version. Distinctions which seem subtle in theory are rather blatant in practice.) Nif versions are described using the same format as ip addresses. In other words, version 20.6.5.0 (Epic Mickey) will be represented in a .nif file as: 256 #. 20 6 5 0 335938816 So.. code: require 'xml/sax/x2j' x2jclass 'nifxml' extract=:4 :0 Name=: x process fread y ) NB. nif versioning: vton=: 256 #. '.' 0&".;._1@, ] Version=: vton '4.0.0.2' NB. Morrowind 'Items' x2jDefn NB. dispatch xml event handlers / := Result : Result=: '' compound := cEnd y : x cStart y add := aEnd y : x aStart y add := aChr y ) NB. --------- "low level" implementation -------- Interesting=: 0 ver1=: (>: vton)`1:@.(-:&_1@]) atr bind 'ver1' ver2=: (<: vton)`1:@.(-:&_1@]) atr bind 'ver2' showStart=:4 :0 if. Interesting do. atrs=. ([,'="',],'"'"_)&.>/"1 }.attributes x Result=: Result,'<',(;:inv (<y),atrs),'>' end. ) showEnd=:3 :0 if. Interesting do. Result=: Result,'</',y,'>',LF end. ) showCharacters=: 3 :0 if. Interesting do. Result=: Result,y end. ) NB. ------- "glue" ------------------------------ cStart=:4 :0 NB. event handlers for <compound> if. Name -: atr 'name' do. Interesting=: 1 end. x showStart y ) cEnd=:3 :0 showEnd y Interesting=: 0 ) aStart=:4 :0 NB. event handlers for <add> WasInteresting=: Interesting Interesting=: Interesting * (ver1 * ver2) Version x showStart y ) aEnd=:3 :0 showEnd y Interesting=: WasInteresting ) aChr=: showCharacters ------------------------------------------------------------------------------------- Example use: 'Header' extract_nifxml_ 'c:\users\rdmiller\desktop\furniture\nif.xml' <compound><add type="HeaderString">'NetImmerse File Format x.x.x.x' (versions <= 10.0.1.2) or 'Gamebryo File Format x.x.x.x' (versions >= 10.1.0.0), with x.x.x.x the version written out. Ends with a newline character (0x0A).</add> <add type="FileVersion" default="0x04000002" ver1="3.3.0.13">The NIF version, in hexadecimal notation: 0x04000002, 0x0401000C, 0x04020002, 0x04020100, 0x04020200, 0x0A000100, 0x0A010000, 0x0A020000, 0x14000004, ...</add> <add type="ulittle32" ver1="3.3.0.13">Number of file objects.</add> </compound> There are several things worth noting here: First, the result here is much simpler than the version I posted in "part 1". This file represents documentation for a lot of games, and a good part of that information is not relevant to this particular game. Second, this style of xml processing is slow. It's side effect driven and results are recomputed frequently which means there's a perceptible lag in parsing the file and when the results show up. (Almost a second, for me.) I imagine that I could do a lot better, but sometimes getting results is more important than computational speed. Interestingly, about 20% of the time is spent on this line: Interesting=: Interesting * (ver1 * ver2) Version Given how inefficient that is (it's extracting version attributes from every <add> element in the file, even if they are irrelevant), I am slightly surprised that it's not a bigger chunk of time. I should say a few more things, about the details of the implementation. But this message has gotten long enough already. So I will instead wait to see if anyone feels the need for further explanation. Meanwhile, where do I hope to take this? The nif.xml is, in a sense, documentation. It is also, in a sense, code. Like I mentioned in "part 1", there are already several code bases that use the nif.xml file to guide the parsing of .nif files. Conceptually speaking, parts of the structured text in nif.xml refers to hardcoded functionality (and the rest is basically just documentation). Anyways, it has occurred to me that working through an implementation of .nif file parsing (something I have done before, for fun) and taking it through to rendering of the represented 3d objects (something else I have done before, for fun) and possibly even creating some small manipulation tools (another thing I have enjoyed doing) could serve as examples of how to tackle a wide variety of coding tasks. Plus, handling of 3d representations of objects has significant use in "maker communities" (basically: groups of people who focus on the design and use "machine" tools, and other engineering and artistic projects). I have not figured out how to tie electrical engineering into the mix (nor some other important topics) but maybe those can wait for later on and handling of other media types (sound, maybe, for the electrical engineering side of things). That said, xml is just a bit awkward to represent and think about. It's a mix of sequential material, nested structure and named structure, with several forms of redundancy which make general purpose handling ridiculously abstract. So practical uses of xml are obviously focused on specializations of the general concepts. This issue - specialization of overly general concepts - turns out to be a common theme in requirements gathering, in system design and in program implementation. So it's worth working through some different examples now and then. Thanks, -- Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm