Yuri Burger wrote:
> Is it possible to improve my J implementation?
Here's some suggestions (note, I've not studied your
code very carefully -- please do not be insulted if
you've already posted code which does some of these):
(1) As a general rule, the longer a J program is, the
longer it takes to run. In other words, don't try to
do everything at once -- over-optimizing can slow things
down.
(2) Rather than conditional execution, you might be
able to use conditional assignment. Here, you give
global variables a default, and then only update them
if data is present.
For example:
states_z_=. 3 :0
0 10 #: <. 10 * > -.&a: <@".;._2] 0 :0
)
namesURI=:;:'httpPATH HOST PROTOCOL' NB. reversed
splitURI=:(0;(states'');'/:'i.a.)&;:
NB. / : other
6.1 0.6 1.1 NB. 0 start
0.6 2.2 1 NB. 1 protocol
3 0.6 0.6 NB. 2 :
4 0.6 0.6 NB. 3 /
0.6 0.6 5.1 NB. 4 /
6.2 5 5 NB. host part
6 6 6 NB. local part
)
httpPATH=:''
HOST=: ;}.sdgethostname_jsocket_''[require'socket'
PROTOCOL=: 'http'
namesURI gets |.splitURI httpURI
where httpURI has the obvious meaning
2A) rather than repeatedly assigning a bunch of variables to
give them their default values it would probably be more
efficient to spin up a fresh locale for each record to be
parsed.
3) If there's a Host: header provided in the RFC822 headers,
include it in the output, as well as the required Host header.
This better meets the literal spec and should be simpler to
implement than some attempt to de-dup the headers.
4) It might even be worth doing a separate parsing pass on the
headers to extract the value of the Host: header. (This
might be faster than doing something special to pull it out
of the full parsed header set.)
5) I've not timed 1!:3, but using 1!:3 on intermediate results
might be faster than assembling full results for 1!:2. Then
again, it might not...
6) You may already be doing this, but if you're treating the
headers literally, you should arrange for the trailing double
newline to just fall out of your header parsing.
7) Also, rather than breaking them up into boxes, it might
be faster to compute a bit vector of the characters that
need to be remapped to meet your requirements. Then do a
character lookup, along the lines of:
((x*256)+a.i.y) { lookuptable
8) As Oleg suggested, regular expressions might be better
at getting fast performance out of this task than ;: While
you can do anything with ;: that you can do with regular
expressions, you wind up having to use ;: multiple times
to do nested matching, while regular expressions are
designed to do some nested matching in one pass. Just be
careful -- sometimes regular expressions are orders of
magnitudes slower than ;: (when there's many "near matches"
involving wildcards).
Finally, a few notes about web serving:
A million hits a day on a web server corresponds to an average
of about 12 hits per second, and assuming reasonable content
size, that's enough to saturate a moderately fast connection
(T1 or SDSL).
However, even that is not the true bottleneck on a typical web
server. The real bottleneck is slow clients. Slow clients only
accept a few bytes at a time, requiring the server service them
numerous times before the request can be finished. Much of this
work happens in the kernel (in the tcp stack, and the socket
queues).
FYI,
--
Raul
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm