On Wed, Oct 12, 2011 at 10:34 AM, Zheng, Xin (NIH) [C] <zheng...@mail.nih.gov> wrote: > Hello all, > > 'label' I.@E. 'label1label2label3' could give where the label starts in the > string. If the string is very long or infinite, only limited sites, say the > first 100 starting sites, are needed. How to do it?
If the statistical character of the right argument is well understood, you can sample it, choosing a likely length and just testing that part. For example, given: L=: 'label' D=: ,'p<label>0'8!:2 i.1e6 ({.~ 100 <. #) L I.@E. D 0 6 12 18 24 30 36 42 48 54 60 67 74 81 88 95 102 109 116 123 130 137 144 151 15... ({.~ 100 <. #) L I.@E. 1e4{.D 0 6 12 18 24 30 36 42 48 54 60 67 74 81 88 95 102 109 116 123 130 137 144 151 15... ({.~ 100 <. #) L I.@E. 100{.D 0 6 12 18 24 30 36 42 48 54 60 67 74 81 88 95 Or, if you want a more general approach: F=:1 :0 NB. x: label, m: max labels to find, y: target : b=. 2*m*#x NB. block size to search (smaller for testing, larger for typical use) r=. i.0 NB. result for_block. |: b -~/\@(] <. [ * 0 1 +/i.@>.@%~) #y do. if. m-:#r do. return. end. 'z b'=. block r=. r, ({.~ (m-#r) <. #)z+x I.@E. y{~z+i.b end. ) L 100 F D That said, unless D is extremely large (perhaps gigabytes), the extra complexity of this approach is usually not worth the overhead of breaking the problem up into blocks. Also, it's quite possible with this kind of problem statement that you need to take a step back and redefine the problem. It's quite possible that restructuring the data or expressing the result differently would be advantageous. -- Raul ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm