> In both cases though I had a good gut feeling 
> that what I wrote, once working, worked correctly 
> in all situations... couldn't have said the 
> same for C or C++.

> In defence of Haskell for this specific problem, 
> it was relatively simple to set up the jump 
> tables for a Boyer-Moore string match, 
> preparatory to doing a split, using Haskell Maps,
> and I am just now tackling that in J.

The "split" problem is actually quite difficult
to get correct in all situations.  The following
are my solutions:

   split=: [EMAIL PROTECTED] }.&.> [ (E. <;.1 ]) ,

   'da' split 'adam dazlious daddy'
+-+--+-------+---+
|a|m |zlious |ddy|
+-+--+-------+---+
   'da' split 'does not contain the substring'
+------------------------------+
|does not contain the substring|
+------------------------------+
   'da' split 'da'
+++
|||
+++
   'da' split ''
++
||
++
   '' split 'dazlious'
+-+-+-+-+-+-+-+-+
|d|a|z|l|i|o|u|s|
+-+-+-+-+-+-+-+-+
   '' split ''

   
Some cases to watch out for are:
- the prefix before the first occurrence of the substring
- the suffix after the last occurrence
- the prefix is empty
- the suffix is empty
- the substring does not occur at all
- the substring and the right argument are the same
- the substring is the empty string
- the right argument is the empty string
- the substring and the right argument are both
  the empty string
- the right argument consists of catenations
  of the substring

There is one more case that you may care about: 
are substring occurrences allowed to overlap?  
If they should not overlap, then the solution 
above gives wrong answers in such cases:

   'dada' split 'zero dadada one dadada two' 
+-----++-----++----+
|zero || one || two|
+-----++-----++----+

A correct solution obtains by replacing E. in
split by nos from 
http://www.jsoftware.com/jwiki/Essays/Non-Overlapping_Substrings

nos=: 4 : 0
 s=. x [EMAIL PROTECTED] y
 i=. s I. s+#x
 (i.#y) e. (s,_1) {~ {&(i,_1)^:a: 0
)

   split1=: [EMAIL PROTECTED] }.&.> [ (nos <;.1 ]) ,

   'dada' split1 'zero dadada one dadada two' 
+-----+-------+------+
|zero |da one |da two|
+-----+-------+------+

I am curious about how this compares to a Haskell 
solution.  As you can see, a J solution can be very 
short, even if substrings are required to be 
non-overlapping.



----- Original Message -----
From: Arved Sandstrom <[EMAIL PROTECTED]>
Date: Monday, January 1, 2007 7:13 pm
Subject: [Jprogramming] Thanks & Comments to Roger Hui, Henry Rich, Michael 
Dykman

> I might add to the subject line: not excluding anyone else. This 
> is a very
> useful mailing list, better than most.
> 
> Per Roger's observation that a generic "split y on x" should 
> result in empty
> boxes (assuming that we want a list of boxes as the returned 
> substrings)where x is a prefix or suffix of y, or we have 
> sequences of x like xxx, I
> had required in my specific problem that I wanted only non-empty 
> substrings(I hadn't stated that in my question). Roger's comment 
> that my example, in
> the general case, should have produced a leading empty box was 
> quite true,
> and in fact at earlier stages my code produced it for that example.
> 
> As an aside to that particular problem, the fix was to use " with 
> more than
> one argument to the conjunction - I needed the rank applied to 'm' 
> not to be
> the same as that applied to 'n', in "m verb n".
> 
> Thanks to Henry for explaining how to set rank for a UDV - I am 
> now using
> that for the above. Also, Ch. 35 of JfC I am looking at now; as it 
> standsJfC is something like my Bible, but I just hadn't gotten to 
> Ch. 35. :-)
> 
> And a wry thank you to Michael. I simply didn't know about 'csv'. 
> :-) At
> work it's AJAX, PHP, CSS, XHTML, XML, XSLT, SQL, C/C++, Java, .NET 
> C#/F#,and UNIX shell. At home is where I investigate J (and 
> Haskell and various
> other programs/languages that I'll never use at work), and I don't 
> have much
> time to do so. :-(
> 
> I should add: at this stage, although knowing about 'csv' is good, 
> I don't
> necessarily want to use 'csv'. Currently I do need to re-invent 
> the wheel.
> 
> The hell of it is - and this is getting philosophical - J and 
> Haskell (and
> some other languages) are so much better than the stuff I use most 
> of the
> time. I wish I could sell people on it. I'm learning Haskell at 
> the same
> time as I'm learning J, and it took me considerably more effort to 
> write a
> "split y on x", where 'x' is a simple string, function in Haskell 
> than it
> did for J; in fact the J version took me a few hours, and the Haskell
> version took me several nights. In both cases though I had a good gut
> feeling that what I wrote, once working, worked correctly in all 
> situations... couldn't have said the same for C or C++.
> 
> In defence of Haskell for this specific problem, it was relatively 
> simple to
> set up the jump tables for a Boyer-Moore string match, preparatory 
> to doing
> a split, using Haskell Maps, and I am just now tackling that in J.
> 
> AHS


----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to