On Wed, Oct 12, 2011 at 10:34 AM, Zheng, Xin (NIH) [C]
<zheng...@mail.nih.gov> wrote:
> Hello all,
>
>  'label' I.@E. 'label1label2label3' could give where the label starts in the 
> string. If the string is very long or infinite, only limited sites, say the 
> first 100 starting sites, are needed. How to do it?

If the statistical character of the right argument is well understood,
you can sample it, choosing a likely length and just testing that
part.

For example, given:

L=: 'label'
D=: ,'p<label>0'8!:2 i.1e6

   ({.~ 100 <. #) L I.@E. D
0 6 12 18 24 30 36 42 48 54 60 67 74 81 88 95 102 109 116 123 130 137
144 151 15...
   ({.~ 100 <. #) L I.@E. 1e4{.D
0 6 12 18 24 30 36 42 48 54 60 67 74 81 88 95 102 109 116 123 130 137
144 151 15...
      ({.~ 100 <. #) L I.@E. 100{.D
0 6 12 18 24 30 36 42 48 54 60 67 74 81 88 95

Or, if you want a more general approach:

F=:1 :0 NB. x: label, m: max labels to find, y: target
:
  b=.  2*m*#x  NB. block size to search (smaller for testing, larger
for typical use)
  r=. i.0      NB. result
  for_block. |: b -~/\@(] <. [ * 0 1 +/i.@>.@%~) #y do.
    if. m-:#r do. return. end.
    'z b'=. block
    r=. r, ({.~ (m-#r) <. #)z+x I.@E. y{~z+i.b
  end.
)

   L 100 F D

That said, unless D is extremely large (perhaps gigabytes), the extra
complexity of this approach is usually not worth the overhead of
breaking the problem up into blocks.

Also, it's quite possible with this kind of problem statement that you
need to take a step back and redefine the problem.  It's quite
possible that restructuring the data or expressing the result
differently would be advantageous.

-- 
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to