I'am comparing different languages... I have selected a simple task
(http request headers converter) for this benchmark. See my
implementation of this task in J, example of input and example of output
in attachment.
My implementation in J converts about 2000 headers per second (on my
3GHz Pentium 4).
Perl implementation converts about 6600 headers per second
C++ implementation converts about 10000 headers per second.
Is it possible to improve my J implementation?
PS:
Here is the task's description:
The parser shall take the file with the stream of the http requests with empty
message bodies (as specified in RFC2616). The output of the
parser shall be the file consisting of the records separated by empty-line. The
actions of the parser in case of invalid input
are explicitely unspecified.
Each record shall be in RFC822 headers format. That is, record consists of
attribute/value pairs. Record attributes are stored one
per line. Beginning of the line is attribute name terminated with a colon
followed by whitespace. Attribute names do not contain
whitespace; a dash is substituted instead. The attribute value is the entire
remainder of the line, exclusive of trailing whitespace and
newline. A physical line that begins with tab or whitespace is interpreted as a
continuation of the current logical line. A blank
line is a record terminator.
Each record shall contain attributes named METHOD, HTTP-PROTOCOL-VERSION,
PROTOCOL, HOST, PORT, RESOURCE and QUERY with obvious
meaning. Besides this, all message headers of the original request shall be
presented in the record. The order of the headers in the
output is unspecified.
Example: for input file
------------------ BEGIN --------------------
GET http://somewhere:1023/fdsfsdf?fdsfd HTTP/1.1
X-TTTT: sdfsdfsdf
GET /12345 HTTP/1.1
Host: localhost
GET /x
------------------ END ----------------------
the result could be
------------------ BEGIN --------------------
METHOD: GET
HTTP-PROTOCOL-VERSION: HTTP/1.1
PROTOCOL: http
HOST: somewhere
PORT: 1023
RESOURCE: fdsfsdf
QUERY: fdsfd
X-TTTT: sdfsdfsdf
METHOD: GET
HTTP-PROTOCOL-VERSION: HTTP/1.1
PROTOCOL: http
HOST: localhost
PORT: 80
RESOURCE: 12345
QUERY:
METHOD: GET
HTTP-PROTOCOL-VERSION: HTTP/0.9
PROTOCOL: http
HOST: <your host name>
PORT: 80
RESOURCE: /x
QUERY:
------------------ END ----------------------
GET http://somewhere:1023/fdsfsdf?fdsfd HTTP/1.1
X-TTTT: sdfsdfsdf
GET https://www.site.com/asdf/qwer?query=asdf HTTP/1.1
GET /asdf/qwer?query=asdf HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, application/vnd.ms-excel, application/msword,
application/vnd.ms-powerpoint, application/x-icq, */*
Accept-Language: ru
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)
Host: ws:81
Connection: Keep-Alive
GET /asdf/qwer?query=asdf HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, application/vnd.ms-excel, application/msword,
application/vnd.ms-powerpoint, application/x-icq, */*
Accept-Language: ru
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)
Host: ws:81
Connection: Keep-Alive
#!/bin/j
require 'jpm socket strings'
IFNAME =: 'headers'
OFNAME =: 'records'
HCNT =: 0
LHN =: , 1{::sdgethostname_jsocket_''
NB. ================== parse ====================
PPs =: 50 3 $ 'pp0'
PPs =: ('pp1')(7 8 9 27 28 29)}PPs
PPs =: ('pp2')(13 14)}PPs
PPs =: ('pp3')(15 18 19)}PPs
PPs =: ('pp4')(17)}PPs
PPs =: ('pp5')(38 39)}PPs
PPs =: ('pp6')(49)}PPs
pp0 =: monad : ''''''
pp1 =: monad : '''''[AVs =: (<y.)(3)}AVs'
pp2 =: monad : 'pp1 _1}.y.'
pp3 =: monad : 'pp1 _2}.y.'
pp4 =: monad : '''''[AVs =: (<_2}.y.)(2)}AVs'
pp5 =: monad : '''''[AVs =: (<y.)(5)}AVs'
pp6 =: monad : '''''[AVs =: (<y.)(6)}AVs'
pusm =: 10 5 2 $ 1 1 5 1 6 0 8 0 0 0 1 0 2 0 6 3 8 3 0 3 5 0 0 6 3 0 8 3 0 3 7
2 0 6 4 3 8 3 0 3 5 1 5 1 6 1 8 0 0 0 5 0 5 0 6 3 8 3 0 3 7 1 0 6 6 0 8 0 0 0 7
0 0 6 7 0 8 3 0 3 9 1 9 1 9 1 9 1 0 0 9 0 9 0 9 0 9 0 0 3
process_url =: monad define
y. =. y.,LF
t =. (4;pusm;<(a.-.':/? ',TAB,LF);':';'/';'?';' ',TAB,LF) ;: y.
for_j. t do.
'i l f' =. j
(f{PPs)128!:2(l{.i}.y.)
end.
''return.
)
process_method =: monad define
ANs =:
'METHOD:';'HTTP-PROTOCOL-VERSION:';'PROTOCOL:';'HOST:';'PORT:';'RESOURCE:';'QUERY:'
AVs =: '';'HTTP/0.9';'http';LHN;'80';'';''
t =. (<;._2)y.,' '
if. 2=#t do. t =. t,<'HTTP/0.9' end.
if. 3=#t do.
'm u v' =. t
AVs =: (m;v)(0 1)}AVs
process_url u
end.
AVs =: (' '&,@:,&LF)each AVs
''return.
)
pasm =: 5 4 2 $ 1 1 0 6 0 6 0 6 1 0 2 3 0 6 0 6 3 1 3 1 3 1 3 1 3 0 3 0 4 0 3 0
1 2 0 3 0 6 3 0
process_attrs =: monad define
if. #y. do.
t =. (0;pasm;<(a.-.': ',LF,TAB);':';LF;' ',TAB);:y.
t =. |:((2%~#t),2)$t
n =. (toupper@,&':') each 0{t
v =. 1{t
i =. ANs i.n
u =. i=#ANs
k =. -.u
AVs =: (k#v)(k#i)}AVs
ANs =: ANs,u#n
AVs =: AVs,u#v
end.
''return.
)
postprocess_host =: monad define
v =. 3{::AVs
i =. v i.':'
if. i<#v do.
h =. deb i{.v
if. 0=#h do. h =. LHN end.
p =. ((>:i)}.v)-.LF
if. 0=#p do. p =. '80' end.
AVs =: ((' ',h,LF);(' ',p,LF))(3 4)}AVs
end.
''return.
)
save_result =: monad : 'TEXT =: TEXT,(;ANs,.AVs),LF'
process_header =: monad define
y. =. 2}.y.,LF
HCNT =: >:HCNT
i =. y.i.LF
process_method i{.y.
process_attrs (>:i)}.y.
postprocess_host''
save_result''
''return.
)
NB. ================== read =====================
main =: monad define
TEXT =: ''
fn =. IFNAME
fs =. 1!:4<fn
fp =. 0
fss =. 1e7
buf =. LF,LF
'' 1!:2 <OFNAME
while. 1 do.
if. fp<fs do.
NB. read chunk
n =. fss<.fs-fp
t =. (buf, 1!:11 fn;fp,n)-.CR
fp =. fp+n
NB. cut headers
m =. (LF,LF,LF)E.t
m process_header;._2 t
NB. member tail
i =. m i:1
buf =. (>:i)}.t
else.
if. 3<#buf do. process_header <buf end.
break.
end.
end.
(TEXT,LF) 1!:3 <OFNAME
''return.
)
NB.start_jpm_''
main''
NB.echo (0 0 100 showtotal_jpm_'')
NB.echo showdetail_jpm_ 'process_url'
echo HCNT
exit''
METHOD: GET
HTTP-PROTOCOL-VERSION: HTTP/1.1
PROTOCOL: http
HOST: somewhere
PORT: 1023
RESOURCE: fdsfsdf
QUERY: fdsfd
X-TTTT: sdfsdfsdf
METHOD: GET
HTTP-PROTOCOL-VERSION: HTTP/1.1
PROTOCOL: https
HOST: www.site.com
PORT: 80
RESOURCE: asdf/qwer
QUERY: query=asdf
METHOD: GET
HTTP-PROTOCOL-VERSION: HTTP/1.1
PROTOCOL: http
HOST: ws
PORT: 81
RESOURCE: asdf/qwer
QUERY: query=asdf
ACCEPT: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, application/vnd.ms-excel, application/msword,
application/vnd.ms-powerpoint, application/x-icq, */*
ACCEPT-LANGUAGE: ru
ACCEPT-ENCODING: gzip, deflate
USER-AGENT: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)
CONNECTION: Keep-Alive
METHOD: GET
HTTP-PROTOCOL-VERSION: HTTP/1.1
PROTOCOL: http
HOST: ws
PORT: 81
RESOURCE: asdf/qwer
QUERY: query=asdf
ACCEPT: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg,
application/x-shockwave-flash, application/vnd.ms-excel, application/msword,
application/vnd.ms-powerpoint, application/x-icq, */*
ACCEPT-LANGUAGE: ru
ACCEPT-ENCODING: gzip, deflate
USER-AGENT: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR
1.1.4322)
CONNECTION: Keep-Alive
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm