hi Werner,

i think there is plenty of space to optimize and get better performance as i think it is required much more detailed profiling to know what exactly is percentage of time spent in crypto libs and what is xml-sec/c14n/DOM overhead.

i think the best way to argue that is to compare to what can be done in C - if Java calls C libs (such as openssl) then overhead is only in xml-sec/c14n/DOM and we see that overhead on order of magnitude level (10x or so) when comparing Java and C in case of XML signature verification for SOAP messages.

your performance results are consistent with what we see in XSUL as well: you need no more than 10-20 signed messages / second and you have made a successful DoS attack against "secure" web service.

the key observation based on C/C++ tests and libraries [1] is that DOM does not scale when signing/verifying large XML docs (or even small) both in raw performance (it was designed for it) and memory footprint (too easy to get OME)

even more importantly there is lot to squeeze performance-wise that is *not* spent in cryptographic functions: see Figure 9 in [1] as *total* signature verification takes less than few ms in C so assuming Java uses highly optimized crypto lib (such as openssl in C) there is lot that can be done to go from 10 message verifications per second to 100 message verifications per second.

so i do think there is a need to dig deeper into xml-sec lib and make sure we understand all costs and where to optimize - we have some work on that in java domain [2] (especially check Table 1 where C14n was the biggest culprit and time of that testing) but that was just a start and a complete analysis is still needed.

and of course doing WS-SecureConv is the best solution if you have clients that actual send more than one message to a service ...

thanks,

alek

[1] http://www.extreme.indiana.edu/labpubs.html#welu:streaming:2005 - Wei Lu, Kenneth Chiu, Aleksander Slominski, and Dennis Gannon. A streaming validation model for soap digital signature. In The 14th IEEE International Symposium on High Performance Distributed Computing (HPDC-14), 2005.

[2] http://www.extreme.indiana.edu/labpubs.html#sshirasu:sec-perf-full  and http://www.extreme.indiana.edu/labpubs.html#sshirasu:sec-perf:grid2004
Satoshi Shirasuna, Aleksander Slominski, Liang Fang, and Dennis Gannon. Performance Comparison of Security Mechanisms for Grid Services. In 5th IEEE/ACM International Workshop on Grid Computing, November 2004.



Werner Dittmann wrote:
All,

because quite some mail was triggered because of ideas to use better
parsers, faster memory algorithms, etc. I did some performance tests. To
do this I used the "scenario3" of the interop tests. You may find the
deployment files in the interop directory and subdirectories thereof.

Test case: Scenario 3
Request flow: Sign body, encrypt body after signing, add a timestap
- Signature certificate in the request (DirectReference), Signature
  algorithm is RSA

- Certificate used for encryption identified using SKI, symmetric
  encryption algorithm is tripple DES, encyrption key generated using a
  random generator and is encrypted using private key linked to the
  certificate, algorithm used here is RSA

Response flow: same actions and attributes as for request

System:
Hardware: AMD Athlon64 3000, 1GB Main mem
Software: Linux (SuSe 9.2), Java J2SE JDK 1.4.2_07

Test setup:
Standalone Axis client using WSDoAllSender / WSDoAllReceiver
SimpleAxisServer as server using WSDoAllSender / WSDoAllReceiver

During the test the system was not empty, but no heavy active task
were running. However, some time-spikes arise from other system
activities.

Test run:
One "Warmup run" request/response, then 20 timed request/response pairs,
several test runs using this test setup.

Several timeing points are used to get information about each step of
the request and response processing.

No real time clock available, thus using the  standard system clock.
This may result in time deviations.

Findings:
The total time to execute the 20 request/response pairs was about
4100-4200ms, resulting in about 205-210ms per request/response.

Request send (client side):
Total time from enter WSDoAllSender up to exit: typically about 34-37ms
 - time spend to sign:		       about 27-28ms (24-24 ms XML-SEC
						     processing in
						     Signature class)
 - time spend to encrypt (smy/asym):   about 5-6ms

 - the rest of 1-3ms was spend in Axis processing (get message, get
   SOAP Enevelope as document, c14n the resulting SOAP request, etc.)

Request receive (server side):
Total time from enter WSDoAllReceiver up to exit: typically about
42-47ms

 - time spend to verify:	      about 5-6ms

 - time spend to decrypt:	      about 24-26ms (about 60-65% of
						    time spent in
						    decrypting the
						    symmetric key using
						    the private key,
						    rest in 3DES)

 - time spend to check cert trust, check timestamp, header processing:
   about 4-5ms

 - the rest of 6-10ms was spend in Axis processing (get message, get
   SOAP Enevelope as document, c14n the resulting SOAP request, etc.).
   Most of that spent for Axis serialization (about 4-6ms), because the
   received data is much larger (includes all the security headers etc.)
   thus deserialization takes more time.

The timing for the response flow is very similar to the request flow
with one notable difference, but this could not be verified in all cases
(maybe this is due to the "not empty" system).

Conclusion:
- The security processing for one direction takes ~75-85ms, for a
  request / response pair about 150-170ms

- Most of that time is spent in Signature/Encryption/Decryption
  processing (120-140ms), thus accounting for about 80-90% of the
  time for security processing

- the rest is consumed by Axis, DOM, c14n, certificate lookup, and other
  security handler activities, operating system to send/receive the
  data, etc.

Question: would such a result justify the efforts to speed up DOM,
parser, etc. processing? Just to speed up the remaining 10-20%?
  

The real time consuming things in security are always the encryption,
digest computations, public/private key computations etc.

I did a similar test with scenarion #1 (UsernameToken). With this
scenarion a request/resonse pair takes about 22-25ms on my system (same
test setup as described above). Thus Axis and its engine is also not
the bottleneck compared to the "real" security. If someone makes
Axis or parsing faster - very good for normal usage, but it doesn't help
much for security unless someone finds the superfast implementation
of the digest, encyrption, decryption algorithms.

Regards,
Werner










---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

  


-- 
The best way to predict the future is to invent it - Alan Kay

Reply via email to