Yes, it was written many winters ago. I imagine it would need some updating. Still, fun, nonetheless.
In an ideal world, there would be a functional implementation of XPath in J (likely calling out to a prebuilt library). Even XML/SAX and x2j only had limited / simulated support for path selection. Though, given J’s bias for highly-structured data, I imagine a more specific tool to identify, extract, and represent specific kinds of tags (in particular <table>, <ol>, <ul>, and maybe a few more) would get the most bang for the buck. -Dan > On Jul 14, 2015, at 3:50 PM, Raul Miller <[email protected]> wrote: > > But that only works on 32 bit j602, because of limitations in our > support for the xml/sax addon. > > I think it also only works for xhtml? > > Thanks, > > -- > Raul > > On Tue, Jul 14, 2015 at 2:00 PM, Dan Bron <[email protected] > <mailto:[email protected]>> wrote: >> This RosettaCode solution is relevant; it demonstrates how to use J & the >> JAL to scrape websites in a particularly functional/declarative/lazy way: >> >> http://rosettacode.org/wiki/Rosetta_Code/Rank_l >> <http://rosettacode.org/wiki/Rosetta_Code/Rank_l> >> <http://rosettacode.org/wiki/Rosetta_Code/Rank_l >> <http://rosettacode.org/wiki/Rosetta_Code/Rank_l>>anguages_by_popularity#J >> <http://rosettacode.org/wiki/Rosetta_Code/Rank_languages_by_popularity#J >> <http://rosettacode.org/wiki/Rosetta_Code/Rank_languages_by_popularity#J>> >> >> -Dan >> >>> On Jul 14, 2015, at 1:32 PM, David Lambert <[email protected]> wrote: >>> >>> I've used wget followed by processing in j making extensive use of member >>> of interval E. to find what I need, steering clear of the nice tools that I >>> mistrust such as regular expressions (as I recall j re core dumped the one >>> time I tried it, I doubt I reported) or html parsers. I had good results >>> treating the file as a text file ignoring the structure. >>>> Date: Tue, 14 Jul 2015 22:19:32 +1000 >>>> From: "Ryan Eckbo"<[email protected]> >>>> To: "Programming forum"<[email protected]> >>>> Subject: [Jprogramming] html screen scraping >>>> Message-ID:<[email protected]> >>>> >>>> Hi everyone, >>>> >>>> I want to scrape some web pages and I am wondering if anyone here uses >>>> J to do so? Any tips? Otherwise I'll have to resort to python. >>>> >>>> Thanks for any suggestions, >>>> Ryan >>> >>> ---------------------------------------------------------------------- >>> For information about J forums see http://www.jsoftware.com/forums.htm >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> <http://www.jsoftware.com/forums.htm> > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > <http://www.jsoftware.com/forums.htm> ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
