Forum,

Just thought I'd share some code to efficiently convert fixed-width data to 
delimited data.  That is, to remove superfluous whitespace from every "cell" of 
the data.  

While fixed-width data is faster and easier to process with J, delimited data 
is required for certain external applications, notably bulk inserts into 
RDBMSes.  If you leave the extraneous whitespace in your bcp file, it will be 
inserted literally into the database table.

I recommend you use this method only when your data so large that it's either 
impossible ("out of memory") or unacceptably slow to do it the obvious way.  
The obvious way makes for more maintablable code.

I stole the fundamental idea from Roger, who presented it in:

     http://www.jsoftware.com/pipermail/general/2003-September/015596.html

the method exploits several constructs which are support by special code in the 
interpreter:

        fw2dl =: verb define
         
          TAB fw2dl y
        
        :
                s =. y ~: ' '
                d =. y e. LF,x                 NB.  x is the field delimiter
        
                l =. d ([: ; <@(+./\ );.2) s   NB.  Assume data ends with LF
        
                d =. 1 (0)} d  
                t =. s ([: ; <@(+./\.);.1) s 
                
                e =. l*.t
                y =. e#y
        
        )

This verb takes about 11 seconds to process one of my 120 MB fixed-width files. 
 I considered coding a FSM-based solution, but 11 seconds is fast enough for 
me.   

Anyway, I hope this is useful to others.  

-Dan
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to