On Sat, Nov 28, 2009 at 12:47 AM, Dan Bron <[email protected]> wrote:
> I had played with xml/sax before, but I'd never seen xml/sax/x2j:
>
>    http://www.jsoftware.com/jwiki/Addons/xml/sax/x2j%20Examples

Thank you for sharing the information about x2j. This is a novel and
interesting approach.

However, I don't like the general approach in that solution. It isn't
very declarative. It's explaining "how" in regards to the document.

   html/body/div/div/div/div/div/ul/li   :=  langs =: langs
,^:(a:~:{...@[)~ lang ;  ' \((\d+) members?\)' rx y
   html/body/div/div/div/div/div/ul/li/a :=  lang  =:
'^\s*((?:.(?!User|Tasks|Omit|attention|operations|by))+)\s*$' rx y

That code is dependent on the deep path structure of the document.
That means the code is broken whenever the document author changes the
structure slightly(on the path to the our target elements) -- e.g.
putting another intervening div on the way or changing li to "ordered
list". My usual approach is finding the essential/intentional
features(e.g. the id or class name of the element) in the document
instead of accidental features(e.g. how many divs do I have to go down
in this case). Yet, in this case, I haven't looked into the html
source.


>
> For others who haven't seen it, it's a compact, declarative syntax for
> addressing and transforming XML paths.  Think XSLT but more concise; so it
> preserves the joy of J programming while providing flexibility to work
> with XML (which is not an easy interface to design; J likes rectangles,
> not trees).
>
> In combination with web/httpget (another gift from my wishlist) and
> regexes, J finally has the tools to express concise, precise web data
> transformation.  For an example, see the "Sort most popular programming
> languages" task on RC:
>
>    http://rosettacode.org/wiki/Sort_most_popular_programming_languages#J
>
> and the linked notes on the Talk page.
>
> In the past to accomplish this sort of task, I would go through a lengthy
> process using several external tools: wget (fetch) > tidy (force HTML into
> XHTML) > xsltproc (address interesting data) > J (transform data).  But I
> won't do that anymore.  In particular I won't miss XSLT, which I never got
> a hang of, and was always the longest and most laborious part of the
> process [1].
>
> -Dan
>
> [1]  Of course, XSLT still has some advantages over x2j, which is built on
> SAX.  In particular, XSLT has a full implementation of XPath, but SAX is
> limited to static, literal paths (basically because it passes you every
> node and lets you decide if you're interested in it; and the x2j is simply
> a more productive syntax/interface over this mechanism, which simply
> compares the path of the current node to the paths you've registered
> interest in).
>
> However, I do see one potential improvement that x2j does have control
> over, to wit, the need for assignment in order to build tables
> (rectangles) of interest.  I understand why this is neccesary in the
> general case (and why it's easy to implement, given SAX's callback
> nature).   But I'd be interested in discussing an extension.
>
> I envision syntacic sugar that bypassed the need for assignment and
> permitted a functional description of transforming paths and sub-paths
> into tables (functional at the notational level anyway, which is all
> that's important).    Combined with the existing compact format this
> extension would allow x2j to embody the best principles of J, seamlessly
> extending its reach to XML and other tree structures.
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to