I agree that the processing instruction tokens appear to be inconsistently handled from the other token types. For example, it seems like the xml declaration is just a processing instruction. I would speculate that it's handled differently for performance reasons. Maybe that applies to the other inconsistencies.
If you end up doing something with it, I would be interested in hearing how it turns out. I think this same concept could be applied to parsing JSON, which I have been coming across more frequently than XML as of late On Sat, Dec 6, 2014 at 1:15 AM, Raul Miller <[email protected]> wrote: > This looks promising. > > I'd probably want to represent that sort of information differently in > J (parallel lists for starting offset, length, nesting depth and token > type), but other than that minor detail, it's very much in the > direction of what I was trying to conceptualize. > > That said, I'm puzzling over the token type table > http://vtd-xml.sourceforge.net/userGuide/6.html -- why, for example, > do they not distinguish between the "element" name of a processing > instruction and the "attribute" name of a processing instruction (or > whatever those are called)? Also, I've an analogous question about the > value of a namespace. But I can probably ignore those issues for my > current project, since it uses neither (and perhaps the VTD developers > also do not use need those missing token types). > > Thanks! > > -- > Raul > > > On Fri, Dec 5, 2014 at 8:44 PM, Joe Bogner <[email protected]> wrote: > > I found VTD-xml while researching this. Looks like an interesting > > alternative and reminds me of the work you did with segmented strings > > > > http://vtd-xml.sourceforge.net/VTD.html > > > > > http://jsoftware.2058.n7.nabble.com/quot-Segmented-Strings-quot-td59863.html > > On Dec 5, 2014 4:28 PM, "Raul Miller" <[email protected]> wrote: > > > >> I would like to revisit the idea of using J to parse xml. > >> > >> The xml/sax addon was a nice idea, but not very stable. It represented > >> xml as a series of events (function calls), and left it up to the user > >> how they would structure the result. Unfortunately, it also rather > >> reliably crashes J. > >> > >> This can be mitigated in various ways. If what you are parsing is > >> simple enough, and you can live with 32 bit j602, xml/sax can work > >> great. But those are not always ideal constraints to work with. > >> > >> But... what's a good data structure in J, to represent xml? > >> > >> A problem is that xml is something of a living example of "the nice > >> thing about standards is that there are so many to choose from". The > >> standards documents describing xml are voluminous, and there are many > >> alternatives which are physically different but logically similar to > >> wade through. > >> > >> Still, at a basic level, xml is something of a nested sequence type of > >> a thing. So one approach might leverage boxed character arrays. This > >> will not be particularly efficient, but it's a start. > >> > >> For example, this xml snippet: > >> > >> <ab cd="ef" gh="ijk">lmnop</a> > >> > >> Might be represented in J as: > >> 'ab';<('cd';'ef'),('gh';'ijk'),:'';<<'lmnop' > >> > >> (The extra boxing on the text is because that might in the general > >> case actually be a sequence of elements). > >> > >> Another approach might be: > >> 'ab';(('cd';'ef'),:('gh';'ijk'));<<'lmnop' > >> > >> Here, the [textual, in this case] content of the element is stored in > >> a separate box from the attributes, instead of treating it as a > >> blank-named attribute. > >> > >> But perhaps there are good non-boxed ways of representing the structure? > >> > >> Has anyone else been working with xml in J? > >> > >> Thanks, > >> > >> -- > >> Raul > >> ---------------------------------------------------------------------- > >> For information about J forums see http://www.jsoftware.com/forums.htm > >> > > ---------------------------------------------------------------------- > > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
