I agree that the processing instruction tokens appear to be inconsistently
handled from the other token types. For example, it seems like the xml
declaration is just a processing instruction. I would speculate that it's
handled differently for performance reasons. Maybe that applies to the
other inconsistencies.

If you end up doing something with it, I would be interested in hearing how
it turns out. I think this same concept could be applied to parsing JSON,
which I have been coming across more frequently than XML as of late


On Sat, Dec 6, 2014 at 1:15 AM, Raul Miller <[email protected]> wrote:

> This looks promising.
>
> I'd probably want to represent that sort of information differently in
> J (parallel lists for starting offset, length, nesting depth and token
> type), but other than that minor detail, it's very much in the
> direction of what I was trying to conceptualize.
>
> That said, I'm puzzling over the token type table
> http://vtd-xml.sourceforge.net/userGuide/6.html -- why, for example,
> do they not distinguish between the "element" name of a processing
> instruction and the "attribute" name of a processing instruction (or
> whatever those are called)? Also, I've an analogous question about the
> value of a namespace. But I can probably ignore those issues for my
> current project, since it uses neither (and perhaps the VTD developers
> also do not use need those missing token types).
>
> Thanks!
>
> --
> Raul
>
>
> On Fri, Dec 5, 2014 at 8:44 PM, Joe Bogner <[email protected]> wrote:
> > I found VTD-xml while researching this. Looks like an interesting
> > alternative and reminds me of the work you did with segmented strings
> >
> > http://vtd-xml.sourceforge.net/VTD.html
> >
> >
> http://jsoftware.2058.n7.nabble.com/quot-Segmented-Strings-quot-td59863.html
> >  On Dec 5, 2014 4:28 PM, "Raul Miller" <[email protected]> wrote:
> >
> >> I would like to revisit the idea of using J to parse xml.
> >>
> >> The xml/sax addon was a nice idea, but not very stable. It represented
> >> xml as a series of events (function calls), and left it up to the user
> >> how they would structure the result. Unfortunately, it also rather
> >> reliably crashes J.
> >>
> >> This can be mitigated in various ways. If what you are parsing is
> >> simple enough, and you can live with 32 bit j602, xml/sax can work
> >> great. But those are not always ideal constraints to work with.
> >>
> >> But... what's a good data structure in J, to represent xml?
> >>
> >> A problem is that xml is something of a living example of "the nice
> >> thing about standards is that there are so many to choose from". The
> >> standards documents describing xml are voluminous, and there are many
> >> alternatives which are physically different but logically similar to
> >> wade through.
> >>
> >> Still, at a basic level, xml is something of a nested sequence type of
> >> a thing. So one approach might leverage boxed character arrays. This
> >> will not be particularly efficient, but it's a start.
> >>
> >> For example, this xml snippet:
> >>
> >> <ab cd="ef" gh="ijk">lmnop</a>
> >>
> >> Might be represented in J as:
> >>    'ab';<('cd';'ef'),('gh';'ijk'),:'';<<'lmnop'
> >>
> >> (The extra boxing on the text is because that might in the general
> >> case actually be a sequence of elements).
> >>
> >> Another approach might be:
> >>    'ab';(('cd';'ef'),:('gh';'ijk'));<<'lmnop'
> >>
> >> Here, the [textual, in this case] content of the element is stored in
> >> a separate box from the attributes, instead of treating it as a
> >> blank-named attribute.
> >>
> >> But perhaps there are good non-boxed ways of representing the structure?
> >>
> >> Has anyone else been working with xml in J?
> >>
> >> Thanks,
> >>
> >> --
> >> Raul
> >> ----------------------------------------------------------------------
> >> For information about J forums see http://www.jsoftware.com/forums.htm
> >>
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to