It looks like XML is becoming ever more powerful http://www.informationweek.com/internet/showArticle.jhtml?articleID=197700815
Anyone tried J under XIOS? 2007/6/15, Oleg Kobchenko <[EMAIL PROTECTED]>:
A practical strategy to parse HTML is two step: - apply tidy to convert to XHTML http://tidy.sourceforge.net/ - apply a custom SAX handler from xml/sax addon to process or convert to J structures --- Yuvaraj Athur Raghuvir <[EMAIL PROTECTED]> wrote: > Hello, > > To parse a html and get specific tags my strategy has been the following > 1) Use ;: as a parser to generate the array of tags > 2) Filter on the array to get tags of interest. > > For (1) I have done as follows: > st NB. state machine description > +-+---+-----+ > |0|1 1|+-+-+| > | |0 0||<|>|| > | |0 0|+-+-+| > | | | | > | |1 1| | > | |2 0| | > | |1 0| | > | | | | > | |1 1| | > | |0 3| | > | |0 3| | > +-+---+-----+ > i =. freads h NB. sample html file h read into i > j =. st ;: i > > For (2) I have created a verb seltag as follows: > seltag =: 4 : 'y{~I.@:(a: &i.) @: (x&(I.@:E.) each)y' > > To find all the anchor tags, I do the following: > anc =. '<a' > k =. anc seltag j > > Now, for the sample file I looked into, the space requirement for running > seltag is 1000 times the size of j! I think this is not ok. > > Any suggestions on how to speed up the selection in the array based on > substring match? > > Also, pointers on where I am consuming more space will help me learn. > > Thanks and Regards, > Yuva > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ____________________________________________________________________________________ Get the Yahoo! toolbar and be alerted to new email wherever you're surfing. http://new.toolbar.yahoo.com/toolbar/features/mail/index.php ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
-- Björn Helgason, Verkfræðingur Fugl&Fiskur ehf, Þerneyjarsund 23, Box 127 801 Grímsnes ,t-póst: [EMAIL PROTECTED] Skype: gosiminn, gsm: +3546985532 Landslags og skrúðgarðagerð, gröfuþjónusta http://groups.google.com/group/J-Programming Tæknikunnátta höndlar hið flókna, sköpunargáfa er meistari einfaldleikans góður kennari getur stigið á tær án þess að glansinn fari af skónum /|_ .-----------------------------------. ,' .\ / | Með léttri lund verður | ,--' _,' | Dagurinn í dag | / / | Enn betri en gærdagurinn | ( -. | `-----------------------------------' | ) | (\_ _/) (`-. '--.) (='.'=) `. )----' (")_(")
---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
