Skip said:
> OK, I have figured out how to get a list of each of the paths to each of
> the log.gz files in my directory tree.
>
> lp =: (> 1 }. 1 dirpath 'c:\test4') ,"1 '/log.gz'
Raul asked:
Why are you using dirpath for this?
Skip says:
It's the only way I know to traverse the directories and get all the
directories. I would much rather use some kind of "findfles" utility, where
'log.gz' findfiles 'c:\test4' would find the full paths to every file below
the c:\test4 directory. However, I was unable to find or invent such a
utility, so I went the route I knew.
Raul said:
(To extract and expand all the log.gz files) you could use:
shell"1 zipth,"1 lp
But you might want to think about
3 :'<shell zipth,y'"1 lp
Skip says:
OK, let's try Raul's first proposal:
yy =: shell"1 zipth,"1 lp
# yy
10
$ yy
10 272126
Yep. It pads all the array rows out to the largest text string, which is
272,126 characters long. Simple, but not very efficient in storage.
So we try Raul's second approach:
zz =: 3 :'<shell zipth,y'"1 lp
Now we look at the row sizes in each array:
# > 2 { yy
272126
# > 3 { yy
272126
# > 4 { yy
272126
All rows padded to the same length (how can you do this discovery in a
single line of J?)
# > 2 { zz
28630
# > 3 { zz
4814
# > 4 { zz
272126
Each boxed row just the length of each unique log file.
It looks like the second approach is much more efficient.
I would like to know the overall size of yy and zz. I found mention of a
utility called "nounsizes" that looks like it would do the job, but I was
unable to find a listing for the script, so I couldn't use it.
Now we have all the log files in a big boxed array "zz", one log file per
box. The next step is to apply the string extraction utility we developed
earlier, on each log file text string.
Well, actually, it is easier to just take zz, unbox it, and mush (ravel)
all the text strings from each log together into one long text string. Then
we can pick out the strings I need, and clean up the spaces and equals:
gg =: cleanString > ('RESULT[0]';crlf,'CONFIDENCE') getTagContents , >
zz
gg
dtmf-9 dtmf-0 dtmf-4 dtmf-7 dtmf-2
dtmf-1
kenny
dtmf-1 dtmf-2 dtmf-5 dtmf-3 dtmf-0
dtmf-1
freeways
traffic
ninety one west
yes
sixty
seventy one
dtmf-4 dtmf-9 dtmf-5 dtmf-pound
dtmf-8 dtmf-7 dtmf-1 dtmf-2 dtmf-0
dtmf-1
u s air one ninety one
yes
continental fifty three sixty four
correct
what
goodbye
ninety five
madison-heights
traffic
main menu
main menu
ninety five
yes
yes
nah
Eureka! We have it!
My only worry is the step where we unbox zz. I have a feeling that the
intermediate array after unboxing and before ravel creates the big
temporary array Raul was worried about. However, I don't have a clue how to
get around that.
Skip
On Wed, Nov 9, 2011 at 4:58 PM, Raul Miller <[email protected]> wrote:
> On Wed, Nov 9, 2011 at 5:37 PM, Skip Cave <[email protected]> wrote:
> > OK, I have figured out how to get a list of each of the paths to each of
> > the log.gz files in my directory tree.
> >
> > lp =: (> 1 }. 1 dirpath 'c:\test4') ,"1 '/log.gz'
>
> Why are you using dirpath for this?
>
> > I have predefined two path nouns:
> >
> > zipth
> > "c:\Gzip\gzip.exe -d -c "
> > filepth
> > c:\test3\log.gz
> >
> > So, I can pull out a file and unzip it, given the file path:
> >
> > $ shell zipth, filepth
> > 237567
> >
> > But that is just one log.gz file unzipped into a text array. Now I need
> to
> > iterate through all of the list of log file paths I extracted above, and
> > create an array of boxed text strings, one boxed string per expanded
> log.gz
> > file. "lp" contains the list of files, one path per row.
>
> You are probably going to want to think about this -- if you have a
> 100mb log file do you want all log file contents padded out to 100mb
> with spaces?
>
> If so, you could use:
>
> shell"1 zipth,"1 lp
>
> But you might want to think about
> 3 :'<shell zipth,y'"1 lp
>
> --
> Raul
>
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm