Raul,

Thanks for the help. However, there is still something missing in your
function:

   textfile1
some stuff
some more stuff
stuff  tag1s good stuff that I want to keep tag1e other stuff
more stuff
lots of stuff, more stuff, tag2s more good stuff that I need tag2e
string stuff
bad stuff stuff I don't care about tag1s even more stuff I want tag1e
strange stuff
more and more stuff
stuff tag1s stuff to keep tag1e
different stuff and new stuff
bad and unusual stuff tag2s really really good stuff tag2e bad stuff
more unneeded stuff
the end

   getTagsContents=: getTagContents~S:0 1    <\~&2


     ftxt =: textfile1 getTagsContents 'tag1s';'tag1e';'tag2s';'tag2e'
   $ftxt
3 3

Hmmmmmm... Shape should be 2 3

   ftxt
┌───────────────────────────────────────────────────┬─────────────────────────────────────────────────────────────┬─────────────────────────────────────────────────────┐
│ good stuff that I want to keep                    │ even more stuff I
want                                      │ stuff to
keep                                       │
├───────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────┤
│ other stuff more stuff lots of stuff, more stuff, │ strange stuff more
and more stuff stuff tag1s stuff to keep │ different stuff and new stuff
bad and unusual stuff │
├───────────────────────────────────────────────────┼─────────────────────────────────────────────────────────────┼─────────────────────────────────────────────────────┤
│ more good stuff that I need                       │ really really good
stuff
│                                                     │
└───────────────────────────────────────────────────┴─────────────────────────────────────────────────────────────┴─────────────────────────────────────────────────────┘

I'm not sure where the middle row came from, but there should only be two
rows, as there are only two tag pairs. And, the rows in this output should
actually be columns, in the final result. The correct result should have
one column per tag pair type.

The result would be closer to what I need, if we remove the middle row, and
transpose the result:

    ftxt1 =: |: 1 0 1 # ftxt

    $ 1 0 1 # ftxt1
2 3

This has the right shape. Two columns, one for each tag pair type.

   ftxt1
┌────────────────────────────────┬─────────────────────────────┐
│ good stuff that I want to keep │ more good stuff that I need │
├────────────────────────────────┼─────────────────────────────┤
│ even more stuff I want         │ really really good stuff    │
├────────────────────────────────┼─────────────────────────────┤
│ stuff to keep                  │                             │
└────────────────────────────────┴─────────────────────────────┘

Now there is only one other problem. The second column is out of sequence.
The assumption is that the tagged strings will always be in groups, with
the tag1 string followed by the tag 2 string, then the tag1 string again,
then tag2, etc. If a tag pair is missing from this 1,2,1.2,1,2  sequence,
the missing string should be indicated by an empty box in the sequence.

In the textstring1 data, the empty box needs to be in the second row,
instead of the third row, to keep the order of tagged strings in the same
sequence as in the original text. In my application, the order of
appearance of the tagged strings is critical. The missing tag2 string
follows the 'even more stuff I want' string, not after the 'stuff to keep'
string.

A ravel of ftxt1 should provide a boxed list of all of the tagged strings,
in the order that they appear in the text, with empty boxes representing
strings missing from the 1,2,1,2 sequence.

So close, and yet so far....

Skip


On Mon, Nov 14, 2011 at 6:37 AM, Raul Miller <[email protected]> wrote:

>   getTagsContents=: getTagContents~S:0 1    <\~&2
>   textfile1 getTagsContents 'tag1s';'tag1e';'tag2s';'tag2e'
>
> --
> Raul
>
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to