Yuri Burger wrote:
> Is it possible to improve my J implementation?

Here's some suggestions (note, I've not studied your
code very carefully -- please do not be insulted if
you've already posted code which does some of these):

(1) As a general rule, the longer a J program is, the
longer it takes to run.  In other words, don't try to
do everything at once -- over-optimizing can slow things
down.

(2) Rather than conditional execution, you might be
able to use conditional assignment.  Here, you give
global variables a default, and then only update them
if data is present.

For example:

states_z_=. 3 :0
 0 10 #: <. 10 * > -.&a: <@".;._2] 0 :0
)

namesURI=:;:'httpPATH HOST PROTOCOL' NB. reversed
splitURI=:(0;(states'');'/:'i.a.)&;:
NB. /    :    other
    6.1  0.6  1.1   NB. 0 start
    0.6  2.2  1     NB. 1 protocol
    3    0.6  0.6   NB. 2 :
    4    0.6  0.6   NB. 3 /
    0.6  0.6  5.1   NB. 4 /
    6.2  5    5     NB. host part
    6    6    6     NB. local part
)

httpPATH=:''
HOST=: ;}.sdgethostname_jsocket_''[require'socket'
PROTOCOL=: 'http'
namesURI gets |.splitURI httpURI

where httpURI has the obvious meaning

2A) rather than repeatedly assigning a bunch of variables to 
give them their default values it would probably be more 
efficient to spin up a fresh locale for each record to be 
parsed.

3) If there's a Host: header provided in the RFC822 headers, 
include it in the output, as well as the required Host header.
This better meets the literal spec and should be simpler to
implement than some attempt to de-dup the headers.

4) It might even be worth doing a separate parsing pass on the
headers to extract the value of the Host: header.  (This
might be faster than doing something special to pull it out
of the full parsed header set.)

5) I've not timed 1!:3, but using 1!:3 on intermediate results 
might be faster than assembling full results for 1!:2.  Then 
again, it might not...

6) You may already be doing this, but if you're treating the
headers literally, you should arrange for the trailing double 
newline to just fall out of your header parsing.  

7) Also, rather than breaking them up into boxes, it might 
be faster to compute a bit vector of the characters that 
need to be remapped to meet your requirements.  Then do a 
character lookup, along the lines of:

((x*256)+a.i.y) { lookuptable

8) As Oleg suggested, regular expressions might be better
at getting fast performance out of this task than ;:  While
you can do anything with ;: that you can do with regular
expressions, you wind up having to use ;: multiple times
to do nested matching, while regular expressions are 
designed to do some nested matching in one pass.  Just be
careful -- sometimes regular expressions are orders of
magnitudes slower than ;: (when there's many "near matches"
involving wildcards).

Finally, a few notes about web serving:

A million hits a day on a web server corresponds to an average 
of about 12 hits per second, and assuming reasonable content 
size, that's enough to saturate a moderately fast connection 
(T1 or SDSL).

However, even that is not the true bottleneck on a typical web 
server.  The real bottleneck is slow clients.  Slow clients only 
accept a few bytes at a time, requiring the server service them 
numerous times before the request can be finished.  Much of this 
work happens in the kernel (in the tcp stack, and the socket
queues).

FYI,

-- 
Raul

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to