Re: [Jprogramming] html screen scraping

Dan Bron Wed, 15 Jul 2015 07:00:18 -0700

Yes, it was written many winters ago.  I imagine it would need some updating. 
Still, fun, nonetheless.


In an ideal world, there would be a functional implementation of XPath in J 
(likely calling out to a prebuilt library). Even XML/SAX and x2j only had 
limited / simulated support for path selection.

Though, given J’s bias for highly-structured data, I imagine a more specific 
tool to identify, extract, and represent specific kinds of tags (in particular 
<table>, <ol>, <ul>, and maybe a few more) would get the most bang for the buck.

-Dan




> On Jul 14, 2015, at 3:50 PM, Raul Miller <[email protected]> wrote:
> 
> But that only works on 32 bit j602, because of limitations in our
> support for the xml/sax addon.
> 
> I think it also only works for xhtml?
> 
> Thanks,
> 
> -- 
> Raul
> 
> On Tue, Jul 14, 2015 at 2:00 PM, Dan Bron <[email protected] 
> <mailto:[email protected]>> wrote:
>> This RosettaCode solution is relevant; it demonstrates how to use J & the 
>> JAL to scrape websites in a particularly functional/declarative/lazy way:
>> 
>>   http://rosettacode.org/wiki/Rosetta_Code/Rank_l 
>> <http://rosettacode.org/wiki/Rosetta_Code/Rank_l> 
>> <http://rosettacode.org/wiki/Rosetta_Code/Rank_l 
>> <http://rosettacode.org/wiki/Rosetta_Code/Rank_l>>anguages_by_popularity#J 
>> <http://rosettacode.org/wiki/Rosetta_Code/Rank_languages_by_popularity#J 
>> <http://rosettacode.org/wiki/Rosetta_Code/Rank_languages_by_popularity#J>>
>> 
>> -Dan
>> 
>>> On Jul 14, 2015, at 1:32 PM, David Lambert <[email protected]> wrote:
>>> 
>>> I've used wget followed by processing in j making extensive use of member 
>>> of interval E. to find what I need, steering clear of the nice tools that I 
>>> mistrust such as regular expressions (as I recall j re core dumped the one 
>>> time I tried it, I doubt I reported) or html parsers.  I had good results 
>>> treating the file as a text file ignoring the structure.
>>>> Date: Tue, 14 Jul 2015 22:19:32 +1000
>>>> From: "Ryan Eckbo"<[email protected]>
>>>> To: "Programming forum"<[email protected]>
>>>> Subject: [Jprogramming] html screen scraping
>>>> Message-ID:<[email protected]>
>>>> 
>>>> Hi everyone,
>>>> 
>>>> I want to scrape some web pages and I am wondering if anyone here uses
>>>> J to do so?  Any tips?  Otherwise I'll have to resort to python.
>>>> 
>>>> Thanks for any suggestions,
>>>> Ryan
>>> 
>>> ----------------------------------------------------------------------
>>> For information about J forums see http://www.jsoftware.com/forums.htm
>> 
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm 
>> <http://www.jsoftware.com/forums.htm>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm 
> <http://www.jsoftware.com/forums.htm>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] html screen scraping

Reply via email to