Forum,
Just thought I'd share some code to efficiently convert fixed-width data to
delimited data. That is, to remove superfluous whitespace from every "cell" of
the data.
While fixed-width data is faster and easier to process with J, delimited data
is required for certain external applications, notably bulk inserts into
RDBMSes. If you leave the extraneous whitespace in your bcp file, it will be
inserted literally into the database table.
I recommend you use this method only when your data so large that it's either
impossible ("out of memory") or unacceptably slow to do it the obvious way.
The obvious way makes for more maintablable code.
I stole the fundamental idea from Roger, who presented it in:
http://www.jsoftware.com/pipermail/general/2003-September/015596.html
the method exploits several constructs which are support by special code in the
interpreter:
fw2dl =: verb define
TAB fw2dl y
:
s =. y ~: ' '
d =. y e. LF,x NB. x is the field delimiter
l =. d ([: ; <@(+./\ );.2) s NB. Assume data ends with LF
d =. 1 (0)} d
t =. s ([: ; <@(+./\.);.1) s
e =. l*.t
y =. e#y
)
This verb takes about 11 seconds to process one of my 120 MB fixed-width files.
I considered coding a FSM-based solution, but 11 seconds is fast enough for
me.
Anyway, I hope this is useful to others.
-Dan
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm