It looks like XML is becoming ever more powerful

http://www.informationweek.com/internet/showArticle.jhtml?articleID=197700815

Anyone tried J under XIOS?

2007/6/15, Oleg Kobchenko <[EMAIL PROTECTED]>:

A practical strategy to parse HTML is two step:

- apply tidy to convert to XHTML
   http://tidy.sourceforge.net/

- apply a custom SAX handler from xml/sax addon
   to process or convert to J structures


--- Yuvaraj Athur Raghuvir <[EMAIL PROTECTED]> wrote:

> Hello,
>
> To parse a html and get specific tags my strategy has been the following
> 1) Use ;: as a parser to generate the array of tags
> 2) Filter on the array to get tags of interest.
>
> For (1) I have done as follows:
> st NB. state machine description
> +-+---+-----+
> |0|1 1|+-+-+|
> | |0 0||<|>||
> | |0 0|+-+-+|
> | |   |     |
> | |1 1|     |
> | |2 0|     |
> | |1 0|     |
> | |   |     |
> | |1 1|     |
> | |0 3|     |
> | |0 3|     |
> +-+---+-----+
> i =. freads h  NB. sample html file h read into i
> j =. st ;: i
>
> For (2) I have created a verb seltag as follows:
> seltag =: 4 : 'y{~I.@:(a: &i.) @: (x&(I.@:E.) each)y'
>
> To find all the anchor tags, I do the following:
> anc =. '<a'
> k =. anc seltag j
>
> Now, for the sample file I looked into, the space requirement for
running
> seltag is 1000 times the size of j! I think this is not ok.
>
> Any suggestions on how to speed up the selection in the array based on
> substring match?
>
> Also, pointers on where I am consuming more space will help me learn.
>
> Thanks and Regards,
> Yuva
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>





____________________________________________________________________________________
Get the Yahoo! toolbar and be alerted to new email wherever you're
surfing.
http://new.toolbar.yahoo.com/toolbar/features/mail/index.php
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm




--
Björn Helgason, Verkfræðingur
Fugl&Fiskur ehf, Þerneyjarsund 23, Box 127
801 Grímsnes ,t-póst: [EMAIL PROTECTED]
Skype: gosiminn, gsm: +3546985532
Landslags og skrúðgarðagerð, gröfuþjónusta
http://groups.google.com/group/J-Programming


Tæknikunnátta höndlar hið flókna, sköpunargáfa er meistari einfaldleikans

góður kennari getur stigið á tær án þess að glansinn fari af skónum
         /|_      .-----------------------------------.
        ,'  .\  /  | Með léttri lund verður        |
    ,--'    _,'   | Dagurinn í dag                     |
   /       /       | Enn betri en gærdagurinn  |
  (   -.  |        `-----------------------------------'
  |     ) |        (\_ _/)
 (`-.  '--.)       (='.'=)
  `. )----'        (")_(")
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to