Hello,
To parse a html and get specific tags my strategy has been the following
1) Use ;: as a parser to generate the array of tags
2) Filter on the array to get tags of interest.
For (1) I have done as follows:
st NB. state machine description
+-+---+-----+
|0|1 1|+-+-+|
| |0 0||<|>||
| |0 0|+-+-+|
| | | |
| |1 1| |
| |2 0| |
| |1 0| |
| | | |
| |1 1| |
| |0 3| |
| |0 3| |
+-+---+-----+
i =. freads h NB. sample html file h read into i
j =. st ;: i
For (2) I have created a verb seltag as follows:
seltag =: 4 : 'y{~I.@:(a: &i.) @: (x&(I.@:E.) each)y'
To find all the anchor tags, I do the following:
anc =. '<a'
k =. anc seltag j
Now, for the sample file I looked into, the space requirement for running
seltag is 1000 times the size of j! I think this is not ok.
Any suggestions on how to speed up the selection in the array based on
substring match?
Also, pointers on where I am consuming more space will help me learn.
Thanks and Regards,
Yuva
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm