Jack Andrews wrote:
> i've got free form csv -- newlines (0x0a) in many values.
> surely this is something that j is good at!?
Sure, so long as you describe your data to it. J doesn't know your data any
better than I do.
> why does sequential machine (;:) only take array input?
All verbs in J only take array input.
> couldn't it take a verb
Not without a major redefinition of it, that would break existing code.
> in a stream? a socket, file, terminal...
J is block oriented. It works really well when it has all the data at once.
The downside is it doesn't work so well on streams,
where it has to work on small chunks and wait a lot. Maybe that'll change if J
ever supports lazy evaluation.
Instead of dyad ;: (FSM), have you looked at the dyadic verbs derived from
;. (cut)? For simple patterns, the cut mask is
often easier to specify than the corresponding state table, and is competitive
in performance.
For example, here's how you could cut on newlines not enclosed in quotes:
data =.
'"abc","def',LF,'ghi","jkl"',LF,'"mno","pqr',LF,'stu","vwx"',LF
mask =. =&LF > ~:/\@:=&'"'
cut =. <;._2~ mask
cut data
+---------------------+---------------------+
|"abc","def ghi","jkl"|"mno","pqr stu","vwx"|
+---------------------+---------------------+
You might also want to look at the regex library in open'regex' which is
general and powerful. Or, if your data is in Excel
format (xls), have a look at "Tara". That'll handle more than just embedded
newlines.
-Dan
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm