Perfect! I am still studying what you have done, but it works like a
charm...
load 'c:\users\skip cave\j602-user\projects\stringfind.ijs'
This is where I loaded the the script file with your function in it...
ftxt6 =: textfile1 getTagsContents 'tag1s';'tag1e';'tag2s';'tag2e'
ftxt6
┌────────────────────────────────┬─────────────────────────────┐
│ good stuff that I want to keep │ more good stuff that I need │
├────────────────────────────────┼─────────────────────────────┤
│ even more stuff I want │ │
├────────────────────────────────┼─────────────────────────────┤
│ stuff to keep │ really really good stuff │
└────────────────────────────────┴─────────────────────────────┘
That's right!
ftxt7 =: textfile2 getTagsContents 'tag1s';'tag1e';'tag2s';'tag2e'
ftxt7
┌────────┬─────────┐
│ stuff2 │ stuff4 │
├────────┼─────────┤
│ stuff6 │ │
├────────┼─────────┤
│ stuff8 │ stuff10 │
├────────┼─────────┤
│ │ stuff12 │
└────────┴─────────┘
That is right also. So what happens if I add another tag pair to the list?
ftxt8 =: textfile2 getTagsContents 'tag1s';'tag1e';'tag2s';'tag2e' ;
'tag3s' ; 'tag3e'
$ftxt8
4 3
ftxt8
┌────────┬─────────┬┐
│ stuff2 │ stuff4 ││
├────────┼─────────┼┤
│ stuff6 │ ││
├────────┼─────────┼┤
│ stuff8 │ stuff10 ││
├────────┼─────────┼┤
│ │ stuff12 ││
└────────┴─────────┴┘
Again perfect! The last column was all empty boxes, indicating that there
were no tag3 pairs in the text file.
The next test will be to see how efficient this function is on thousands of
text files.
Thanks again Raul, for all the help.
Skip
On Mon, Nov 14, 2011 at 3:45 PM, Raul Miller <[email protected]> wrote:
> Ok... I hope I am not overlooking anything here:
>
> getTagsContents=: 4 :0
> 'n m'=. $tags=. > _2 <\ y
> locs=: tags [email protected]:0 }. txt=.(' ',;tags),x,;tags
> assert. -:&/:&;/ |:locs NB. tags must be balanced
> data=: _2 {:\ ((/:~ ; locs) I. i.#txt) </. txt
> expand=: ;(#~ 1&e.S:0) <@|./. |.> (e.L:0~ /:~@;) {."1 locs
> }: }.(#@>{."1 tags) }.&.>"1 (-n) ]\ expand #inv (+/expand){.data
> )
>
> If you can guarantee that tags are always balanced, you can get rid of
> the assert statement.
>
> How this works:
>
> 1. tagged data blocks are extracted from the text (in their original order)
> 2. expand is defined to be the compression vector on the ravel of the
> desired result, to get those blocks
> 3. expand #inv data gets the blocks we need
>
> Everything else is busywork to convert between data formats and
> representations.
>
> To make my work easier, I make sure that:
>
> a. There is always text to be discarded before the first tag
> b. the full set of tags appear at the beginning of the text I am working
> with
> c. the full set of tags appear at the end of the text I am working with
>
> (these boundary guards are discarded from the final result).
>
> Note: I hope that this is readable -- for some reason gmail has recently
> taken
> to mutilating line-ends on plain text messages, so I do not know how I can
> send plain text code.
>
> FYI,
>
> -
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm