Raul Miller <[email protected]> wrote:
Sorry about that, I should stop posting so carelessly:

 getTagsContents=: getTagContents~S:0 1    <\~&_2

<<>>

Skip repies:

OK, that is getting better.

 getTagsContents=: getTagContents~S:0 1    <\~&_2

    ftxt2 =: textfile1 getTagsContents 'tag1s';'tag1e';'tag2s';'tag2e'
   $ ftxt2
2 3
   ftxt2
┌────────────────────────────────┬──────────────────────────┬───────────────┐
│ good stuff that I want to keep │ even more stuff I want   │ stuff to keep
│
├────────────────────────────────┼──────────────────────────┼───────────────┤
│ more good stuff that I need    │ really really good stuff │
│
└────────────────────────────────┴──────────────────────────┴───────────────┘

So the middle row of untagged text is gone, which is a big step in the
right direction. We have captured al the tag pairs, and there is an empty
box telling me that there is a missing tag pair. Now the transpose of ftxt2
is getting really close:

  |: ftxt2
┌────────────────────────────────┬─────────────────────────────┐
│ good stuff that I want to keep │ more good stuff that I need │
├────────────────────────────────┼─────────────────────────────┤
│ even more stuff I want         │ really really good stuff    │
├────────────────────────────────┼─────────────────────────────┤
│ stuff to keep                  │                             │
└────────────────────────────────┴─────────────────────────────┘

Each column represent one tag pair. However, the sequence of the extracted
strings is still out of order.

As I stated in the my previous post, the ravel order of the boxes in the
result of getTagsContents must be in the order that the strings were in the
original text.

so

    ,ftxt2

┌────────────────────────────────┬────────────────────────┬───────────────┬─────────────────────────────┬──────────────────────────┬┐
│ good stuff that I want to keep │ even more stuff I want │ stuff to keep │
more good stuff that I need │ really really good stuff ││
└────────────────────────────────┴────────────────────────┴───────────────┴─────────────────────────────┴──────────────────────────┴┘
  This is way out of order from the original text.

we try

    , |: ftxt2

┌────────────────────────────────┬─────────────────────────────┬────────────────────────┬──────────────────────────┬───────────────┬┐
│ good stuff that I want to keep │ more good stuff that I need │ even more
stuff I want │ really really good stuff │ stuff to keep ││
└────────────────────────────────┴─────────────────────────────┴────────────────────────┴──────────────────────────┴───────────────┴┘

This is much better, but the empty box should be the third-to-last box in
the raveled list, not the last box, to keep the strings in the same order
as they were in the orginial text. The missing tag2 pair comes just before
the last tag1 and tag2 pairs,

I tried several schemes to get the transpose in the getTagsContents
function, but none worked:

getTagsContents=:  |: getTagContents~S:0 1    <\~&_2
     ftxt3 =: textfile1 getTagsContents 'tag1s';'tag1e';'tag2s';'tag2e'
|domain error: getTagsContents
|   ftxt3=:textfile1     getTagsContents'tag1s';'tag1e';'tag2s';'tag2e'


In any case, the approach you are using doesn't keep the string sequence,
including the missing strings, in the same order as the original text. It
looks like your function finds all the tag1 strings first, and puts them in
a boxed array. Then it finds all the tag2 strings, and adds them to the
array, filling out mismatched quantities with empty boxes. Your method
finds all the tagged strings, but it looses the sequence order of the
strings.

In my text files, every tag1 string should be followed by a tag2 string, in
the order 1, 2, 1, 2, etc, If either a tag1 string or a tag2 string is
missing from that sequence, that missing string should be represented by an
empty box, placed in the exact sequence that it occurred in the original
text. It is very important to place the empty box in the same place in the
sequence where the missing string was, in the original text. This type of
function is common when parsing log files (which is what I am doing), as
missing lines in the log are important to know, but their sequential
position in relation to the other tagged strings is just as important.

Here's a simpler, but more thorough, test example:

      textfile2=: 0 : 0
stuff1
tag1s stuff2 tag1e
stuff3
tag2s stuff4 tag2e
stuff5
tag1s stuff6 tag1e
stuff7
tag1s stuff8 tag1e
stuff9
tag2s stuff10 tag2e
stuff11
tag2s stuff12 tag2e
)

Here is an example of how the function should work (this is just a mockup,
as I haven't gotten the function working as yet):

    ftxt4 =: textfile2 NewgetTagsContents 'tag1s';'tag1e';'tag2s';'tag2e'

   $ ftxt4
4 2

   ftxt4
┌──────┬───────┐
│stuff2│stuff4 │
├──────┼───────┤
│stuff6│       │
├──────┼───────┤
│stuff8│stuff10│
├──────┼───────┤
│      │stuff12│
└──────┴───────┘

The first column represents all the tag1 pairs, and the second column
represents all the tag2 pairs. The empty box in row two, indicates that
there was no tag2 pair following the second tag1 pair. Also the empty box
in the first column of the last row indicates that there was a missing tag1
pair, before the last tag2 pair.

The ravel of text4 shows all the strings in their original order,
1,2,1,2,1,2, with the empty boxes showing the missing pairs in sequence:

   , ftxt4
┌──────┬──────┬──────┬┬──────┬───────┬┬───────┐
│stuff2│stuff4│stuff6││stuff8│stuff10││stuff12│
└──────┴──────┴──────┴┴──────┴───────┴┴───────┘

This is the function I'm trying to develop. Ideally the function would
scale to N tag pairs, creating an N by 2  boxed array, with the tag pairs
always in the same sequence.

Skip

On Mon, Nov 14, 2011 at 11:56 AM, Raul Miller <[email protected]> wrote:

> Sorry about that, I should stop posting so carelessly:
>
>  getTagsContents=: getTagContents~S:0 1    <\~&_2
>
> --
> Raul
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to