Jack Andrews wrote:
>  i've got free form csv -- newlines (0x0a) in many values.
>  surely this is something that j is good at!?

Sure, so long as you describe your data to it.  J doesn't know your data any 
better than I do.  

>  why does sequential machine (;:) only take array input?  

All verbs in J only take array input.  

> couldn't it take a verb 

Not without a major redefinition of it, that would break existing code.

> in a stream?  a socket, file, terminal...

J is block oriented.  It works really well when it has all the data at once.  
The downside is it doesn't work so well on streams,
where it has to work on small chunks and wait a lot.  Maybe that'll change if J 
ever supports lazy evaluation. 

Instead of dyad  ;:  (FSM),  have you looked at the dyadic verbs derived from  
;.  (cut)?  For simple patterns, the cut mask is
often easier to specify than the corresponding state table, and is competitive 
in performance.

For example, here's how you could cut on newlines not enclosed in quotes:


           data  =. 
'"abc","def',LF,'ghi","jkl"',LF,'"mno","pqr',LF,'stu","vwx"',LF
           mask  =. =&LF > ~:/\@:=&'"'
           cut   =. <;._2~ mask
           
           cut data
        +---------------------+---------------------+
        |"abc","def ghi","jkl"|"mno","pqr stu","vwx"|
        +---------------------+---------------------+
           
You might also want to look at the regex library in  open'regex'  which is 
general and powerful.  Or, if your data is in Excel
format (xls), have a look at "Tara".  That'll handle more than just embedded 
newlines.             

-Dan

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to