After much experimentation, I finally found a way to expand a .gz
compressed file in J. and put the result in a J variable.

The first issue I encountered was that I couldn't find the gunzip utility
anywhere. I did find gzip, which, with the right parameters in the command
line (-d for decompress), would unzip my files. However, gzip -d
*replaces*the compressed file with the expanded file, and in any case,
I couldn't
figure out how to easily get the text in the expanded file into a J
variable.

Ric Sherlock mentioned a program called 7z or 7-Zip. After some more
searching on the web, I found the 7-Zip utility. It can be run from a
command line, and It doesn't replace the compressed file. In fact, it can
put the expanded file wherever you want, most importantly to stdout.

The next issue was how to execute a command-line utility like 7-Zip from J,
expand the .gz file, and then capture the expanded (text) file in a J
variable.

Several methods to run command-line tasks were mentioned previously in this
thread, including spawn, fork, shell, and spawn_jtask_. The documentation
was not clear on the differences between all these different verbs, so I
attacked the problem by trial and error. Is there a doc somewhere, that
explains when to use each of these nouns, and the pros and cons of each one?

In any case, I eventually discovered that the "shell" verb would do what I
wanted.

First I built a couple of nouns:
The noun 'zipath' contains the path to the command-line 7-Zip utility, with
the utility path enclosed in double quotes, and the control parameters at
the end:
zipath =: '"c:\Program Files (x86)\7-Zip\7z" e -so '

The noun 'filepath' contains the path to the compressed file:
filepath =: 'c:\Gzip\log.gz'

Now this J sentence expands the log.gz file and places the text in a new
variable "log_txt"
 log_txt =: shell zipath, filepath

 The only problem with this scheme is that the 7z.exe utility puts verbose
confirmation text in the standard output after the file contents. So every
log_txt file has the following banner at the end of the file:

file text....

7-Zip 9.22 beta  Copyright (c) 1999-2011 Igor Pavlov  2011-04-18
Processing archive: c:\Gzip\log.gz
Extracting  log
Everything is Ok
Size:       272126
Compressed: 20713

I need to figure out how to either supress the banner, throw it away before
putting it in the log_txt file. or ignore it in the next phase.

My next challenge is to search the log_text file for two different strings
- a "start string" and an "end string". I need to extract any text string
that appears between these two terminator strings. To make matters worse,
there may be multiple start/end terminator string pairs, and I need to
extract each and every string contained between all of the start/end
terminator strings.

Here's an example of a text file I will be parsing.

    textfile
some stuff
some more stuff
stuff  start string  good stuff that I want to keep end string other stuff
more stuff
lots of stuff, more stuff, start string more good stuff that I need end
string stuff
bad stuff stuff I don't care about start string even more stuff I want end
string strange stuff
the end

<<<>>>

The verb I want will take the textfile in on the right, and return all the
text strings between "start string" and "end string" in the text file.

So the result of the verb will be:

good stuff that I want to keep
more good stuff that I need
even more stuff I want

<<<>>>

Skip
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to