> In both cases though I had a good gut feeling > that what I wrote, once working, worked correctly > in all situations... couldn't have said the > same for C or C++.
> In defence of Haskell for this specific problem, > it was relatively simple to set up the jump > tables for a Boyer-Moore string match, > preparatory to doing a split, using Haskell Maps, > and I am just now tackling that in J. The "split" problem is actually quite difficult to get correct in all situations. The following are my solutions: split=: [EMAIL PROTECTED] }.&.> [ (E. <;.1 ]) , 'da' split 'adam dazlious daddy' +-+--+-------+---+ |a|m |zlious |ddy| +-+--+-------+---+ 'da' split 'does not contain the substring' +------------------------------+ |does not contain the substring| +------------------------------+ 'da' split 'da' +++ ||| +++ 'da' split '' ++ || ++ '' split 'dazlious' +-+-+-+-+-+-+-+-+ |d|a|z|l|i|o|u|s| +-+-+-+-+-+-+-+-+ '' split '' Some cases to watch out for are: - the prefix before the first occurrence of the substring - the suffix after the last occurrence - the prefix is empty - the suffix is empty - the substring does not occur at all - the substring and the right argument are the same - the substring is the empty string - the right argument is the empty string - the substring and the right argument are both the empty string - the right argument consists of catenations of the substring There is one more case that you may care about: are substring occurrences allowed to overlap? If they should not overlap, then the solution above gives wrong answers in such cases: 'dada' split 'zero dadada one dadada two' +-----++-----++----+ |zero || one || two| +-----++-----++----+ A correct solution obtains by replacing E. in split by nos from http://www.jsoftware.com/jwiki/Essays/Non-Overlapping_Substrings nos=: 4 : 0 s=. x [EMAIL PROTECTED] y i=. s I. s+#x (i.#y) e. (s,_1) {~ {&(i,_1)^:a: 0 ) split1=: [EMAIL PROTECTED] }.&.> [ (nos <;.1 ]) , 'dada' split1 'zero dadada one dadada two' +-----+-------+------+ |zero |da one |da two| +-----+-------+------+ I am curious about how this compares to a Haskell solution. As you can see, a J solution can be very short, even if substrings are required to be non-overlapping. ----- Original Message ----- From: Arved Sandstrom <[EMAIL PROTECTED]> Date: Monday, January 1, 2007 7:13 pm Subject: [Jprogramming] Thanks & Comments to Roger Hui, Henry Rich, Michael Dykman > I might add to the subject line: not excluding anyone else. This > is a very > useful mailing list, better than most. > > Per Roger's observation that a generic "split y on x" should > result in empty > boxes (assuming that we want a list of boxes as the returned > substrings)where x is a prefix or suffix of y, or we have > sequences of x like xxx, I > had required in my specific problem that I wanted only non-empty > substrings(I hadn't stated that in my question). Roger's comment > that my example, in > the general case, should have produced a leading empty box was > quite true, > and in fact at earlier stages my code produced it for that example. > > As an aside to that particular problem, the fix was to use " with > more than > one argument to the conjunction - I needed the rank applied to 'm' > not to be > the same as that applied to 'n', in "m verb n". > > Thanks to Henry for explaining how to set rank for a UDV - I am > now using > that for the above. Also, Ch. 35 of JfC I am looking at now; as it > standsJfC is something like my Bible, but I just hadn't gotten to > Ch. 35. :-) > > And a wry thank you to Michael. I simply didn't know about 'csv'. > :-) At > work it's AJAX, PHP, CSS, XHTML, XML, XSLT, SQL, C/C++, Java, .NET > C#/F#,and UNIX shell. At home is where I investigate J (and > Haskell and various > other programs/languages that I'll never use at work), and I don't > have much > time to do so. :-( > > I should add: at this stage, although knowing about 'csv' is good, > I don't > necessarily want to use 'csv'. Currently I do need to re-invent > the wheel. > > The hell of it is - and this is getting philosophical - J and > Haskell (and > some other languages) are so much better than the stuff I use most > of the > time. I wish I could sell people on it. I'm learning Haskell at > the same > time as I'm learning J, and it took me considerably more effort to > write a > "split y on x", where 'x' is a simple string, function in Haskell > than it > did for J; in fact the J version took me a few hours, and the Haskell > version took me several nights. In both cases though I had a good gut > feeling that what I wrote, once working, worked correctly in all > situations... couldn't have said the same for C or C++. > > In defence of Haskell for this specific problem, it was relatively > simple to > set up the jump tables for a Boyer-Moore string match, preparatory > to doing > a split, using Haskell Maps, and I am just now tackling that in J. > > AHS ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
