You might be able to speed this up even more by using:

           nos =. x I.@:E. file

in place of:

           nos =. I. x E. file

When optimizing J, it's good to keep the "special code" list handy:

   http://www.jsoftware.com/help/dictionary/special.htm

and if you can reformulate any of your constructs to match one of those listed, 
you should (like above).

Also, the expression  (n i."1 (' ')){."0 1 (n)  works too hard.  Since all J 
arrays are rectangular, in the end this expression will produce a rectangular 
array, whose width is the length of the longest IP.  If you're not going to box 
the IPs to retain their heterogenous lengths, then it's better to calculate 
this directly, as in  n {.~ _ , >./ n i."1 ' '   .

Finally, it might be interesting to time the nub itself.  Taken all together, I 
might rewrite your code along these lines:

        require 'jmf'
        
        NB.  Extract IPs
        ip        =:  ] {~ I.@:E. +/ '255.255.255.255' (+i.)&#~ [
        
        NB.  Clean & nub IPs
        nub       =:  ~.@:({.~ _ , >./@:(i."1&' '))
        
        NB.  Fetch data, extract IPs
        fetch     =:  dyad define
                NB.  Mapped noun name
                mnn   =.  'file'
                
                JCHAR map_jmf_ mnn;y
                
                IPs   =.  x ip mnn~
                unmap_jmf_ mnn
                
                NB.  Could avoid assigning  IP  (which we don't use
                NB.  except to return a result):
                NB.  (unmap_jmf_ mnn) ] x ip mnn~
                IPs
        )
        
        test      =:  verb define
                
                fn     =.  jpath '~temp\auth2.log' [ 
'/media/KINGSTON/logParse/messages.2'
                txt    =.  ' rhost='
                readt  =.  6!:2 'IPs =. txt fetch fn'
                nubt   =.  6!:2 'IPs =. nub IPs'
                
                smoutput ''
                smoutput 'Read file and extract IPs.....', 's',~6j2 ": readt
                smoutput 'Clean and nub list............', 's',~6j2 ": nubt
                smoutput ''
                smoutput 'Unique IPs:'
                smoutput '-----------'
                
                NB.  Assigned locally within timed expressions
                IPs
        )
       

Raul wrote:
>  You can get some idea of the different, on your own machine, by
>  timing 1!:1 on that file.

I tried this, and it added between 0.03 and 0.05 seconds to the total time (on 
a 38MB file I generated from data I found via google, searching on [ "rhost= " 
filename:.log ])  .  Granted, that represents between 33% and 45% of the total 
time, but in absolute terms it's not so much.  If Robert's data doubled in 
size, the difference would still be less than a 10th of a second.

Given that, I would prefer plain old  fread  .  Mapped files introduce 
complexities and subtleties which aren't worth dealing to gain a 10th of a 
second.  For example, because mapped files have side effects which require 
cleanup (i.e.  unmap_jmf_  ) you can't really use them in the functional 
data-passing manner which is so common and comfortable in J (and which makes 
verb composition so easy).  

The current exercise provides an illustration of this problem.  I could write 
the entire solution as  nub @: ip @: fread  .  But I can't do that with mapped 
files.  This was a stumbling block when I wanted to compare the timings of 
fread vs. mapped files.  That is, I wanted to time file reading, IP extraction, 
and scrubbing & nubbing independently.

With  nub @: ip @: fread  , this is very easy:

           6!:2 'IPs =.  fread filename'
           6!:2 'IPs =.  ip IPs'
           6!:2 'IPs =.  nub IPs'

But with the way the  fetch  is written (above), I can't do it.  File-reading 
and IP extraction are tangled up.  The only way to do it is to rewrite the code 
and put the  unmap_jmf_  "somewhere else" [1].  But logically, it belongs in  
fetch  packaged up with the rest of the code related to the file [2].  

This is not a knock against mapped files, it's just a characteristic of 
side-effects in general.  And I'm probably a bigger fan of J's functional 
capabilities than most (and many "real" applications will have state and 
side-effects which require clean up at "the end" anyway, so mapped files won't 
add too much overhead).

-Dan

[1]  So my options are:

   (A)  Leave the code entangled.  Unsavory.  Un-J-like.

   (B)  Disentangle the code, and put the  unmap  at the end
        of the data flow, e.g.  

        ([ unmap_jmf_ bind 'FILE') @: nub @: ip @: (3 : ('JCHAR map_jmf_ 
''FILE'';y';'FILE'))

        But this is ugly and makes maintenance harder.  The verb 
        is harder to read, as the last operation is unrelated to
        the function of the verb (usually the last operation tells 
        you something important about the verb).   

        Plus the final reference to the file is disjoint from all
        the other references, so you have to keep more state in 
        your head while reading the verb (i.e. it interferes 
        with locality of reference).

   (C)  Disentangle the code, by putting the unmap somewhere it
        doesn't belong (e.g. at the bottom of  test  ).  As 
        above: ugly and hard to maintain.

   (D)  As part of the larger application, manually track all file 
        mappings, and clean them up at "the end".  This adds 
        complexity, but is the currently recommended option.

   (E)  As part of the larger application, blindly call  unmapall_jmf_  
        at "the end".  Bad idea, even JSoftware discourages it.

   (F)  Forget about the side effects, and hope J properly unmaps
        files when it shuts down.  Maybe the best current option.

   (G)  Use dangerous and unsavory hacks to emulate normal functional
        data flow:

        NB.!! Hope J unmaps the (anonymous) data when 
        NB.!! it finally goes out of scope.
        mfread    =:  verb define
                JCHAR map_jmf_ y;~mnn=.'FILE_base_'
                y =.  mnn~
                erase mnn
                mappings_jmf_=:}:mappings_jmf_
                y
        )

        IPs    =:  nub @: ip @: mfread    

... and I don't like any of them.

[2]  And even within  fetch  using mapped files forced me to write superfluous 
code (i.e. the assignment) because I need to unmap the file before the verb 
returns, but the last executed line has to be  IPs  (i.e.  not  unmap_jmf_  ).  
The only workaround depends on order of execution, a no-no.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to