note that there is an even faster version

  10 timespacex '( I. ''CTAG'' E. DNA)' 
2.656e_6 6016 
  10 timespacex '(  ''CTAG'' I.@:E. DNA)' 
2.144e_6 6400 


for Henry's example of 
'CTAG*ACTA'

there can be some enhanced flexibility for using E.

('CTAG';'ACTA') I.@:E.each < DNA

one way to use the index starts is getting all of the "overlapping matches"

rangei =: [ + >:@] i.@- [
('CTAG';'ACTA') (] {~each a:-.~ [: ''"_`rangei@.</each [: ,@:(,.each/"0 1&>/)  
I.@:E.each) < DNA

----- Original Message -----
From: Jon Hough <[email protected]>
To: "[email protected]" <[email protected]>
Cc: 
Sent: Sunday, August 16, 2015 2:09 AM
Subject: [Jprogramming] Regex vs I./E. for pattern matching

I recently went through the regex lab, and would like to know whether it is 
more idiomatic for J users to use regex when matching simple patterns in a 
string, or to use E. and similar verbs?
For example. If I have an (imaginary) DNA sequence string:
DNA=: 
'CGATTGACTAGTCGATTGCTGATGCTCTAGTCGTGATGCTATACTAGTGCGTCGATGCTAGCGCTAGTCGCATTTGA'
I want to find where 'CTAG' sequences exist in this string. Using regex, 
'CTAG' rxmatches DNA
will give the 5 indices where the CTAG pattern is found.
But I could equally do,
I. 'CTAG' E. DNA
which will give me the same indices. And it seems the non-regex way is more 
efficient (in time and space):

timespacex '( I. ''CTAG'' E. DNA)'


gives 1.5e_5 3008




timespacex '( ''CTAG'' rxmatches DNA)'


gives 0.001103 6720


Granted, the regex expression is as simple as possible. and regex can do more 
complicated matching than E. can do, and possibly rxmatches gains efficiency 
over E. for very longer DNA strings. But it seems for simple matches E. is the 
better choice.




                          
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to