Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release()not invoked]
Costin Manolache wrote: Hans Bergsten wrote: Costin Manolache wrote: [...] In an ideal world, all core tags would be recyclable and garbage-free - that may allow them to run at comparable speed with a hard-coded page. I think it's more important to implement open coding of JSTL, i.e. generate if and for statement instead of using c:if and c:forEach tag handlers. That would really make a difference for all apps that use JSTL, while the potential gains from tag handler reuse depend on a lot of factors that varies between applications and the runtime environment. +1 - open coding is far better ( for performance ). Is the API/model for portable open coding defined and stable ? That would be by far the biggest improvement in JSP performance. Kin-Man told me it wasn't done yet (he mentioned the curent methods should stay, though), unfortunately. From what he said, he needed to write more tag plugins to see if he was able to implement JSTL tags with the current API. I agree it's definitely a good way to get around the problems with tags without adding too much complexity. In the end, I decided to talk about it a bit in a chapter of my book. What's really funny is that you can get rid of the tag handler, and write only the tag plugin. That's of course, if you don't care about portability, and you have the tag plugin able to handle all forms of you tag. But for people who use regular tag handlers - I think we need to fix the tag pool, and a fixed tag pool will improve the performance significantly. And if regular tags are used, for whatever reason, recycling should be taken into account - if people care a bit about performance. +1. Remy -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release() not invoked]
I haven't read all the posts on this discussion, but here's some facts from personal observations. for pages with only a few tags, ie less than 30, tag pooling doesn't help. On the otherhand, if your page has 100+ tags, it improves performance. Some of the pages I benchmarked with had about 135 tags. In those situations, I saw a 20-50% improvement. I would argue that sites that don't have a lot of load should simply turn off tag pooling. Site that use tags extensively and get 1millions page views a day, will gain significantly from tag pooling. peter lin Costin Manolache [EMAIL PROTECTED] wrote:Hans Bergsten wrote: Without pooling With pooling Reuse w/o overhead - 5 threads Avg.: 330 ms 349 ms N/A Rate: 15.2/sec 13.6/sec N/A 20 threads Avg.: 1,752 ms 1,446 ms 1,265 ms Rate: 12.1/sec 13.6/sec 14.7/sec To me, this indicates that if you can avoid _all_ reuse overhead, there's some performace to be gained from reuse but not much. With the From 1.2s to 1.7s there is about 35% difference. I would call this quite significant. Even between 1.4 and 1.7 - you have 20%. Try to increase the thread count to 100 - and you'll see this going up. The difference ( 0.5s ) is probably 2-3 times the response time of apache for a static page. And most users will feel it. I agree that in percentage, the difference is somewhat significant, but don't make too much out of the real value. My test server is not representative of the type of hardware you would use for a site with this type of load. On hardware suitable for the task, the difference in And the test page is not representative of the type of pages that will run on a real site. I know that. All we can measure with relative accuracy is the overhead of the container/jsp implementation - at least in relative terms. Take as the reference the time ( or RPS ) for Apache to serve the same output as a static page. Or the time a servlet will take to generate the same output. Run your tests with 5, 20, 100 RPS ( and ab may be a better driver ). Compare the results - and most likely a production server will see similar ratios. I'll try to find some time ( next week - I hope ) and run the same tests with the no sync pool. the real values will likely be a lot smaller, and IMHO, insignificant. But please, let's not start a long debate about what's significant or not (that depends on too many factors). All I'm trying to show with these simple tests is that for pooling to really make a difference at all, you need to avoid all overhead, which may be very hard, and that the overhead with current pooling seems to eat all potential gain. Well - it shows pretty clearly that the _current_ implementation of thread pool is broken. Even if we don't take sync into account, the pool has a 5 object limit - what else could you expect ?? I ran 10,000 requests for each test case after a manual warm up (just a few requests to give the JIT a chance to kick in). If I rerun the tests to capture GC data (as Glen was asking for), I can run a longer warm-up as well. I didn't record the max values, but IIRC they were around 100 sec in all cases. The 1.4 JIT takes some time to kick in, if you run batches of 1000 requests you'll see the time keeps improving. I would do at leat 5000 request to warm up the jit. This is a very good start, thanks for bringing this up. I hope it at least gives us a better idea about what types of gains we can realistically expect from tag handler reuse. Most of the improvements in coyote ( or in 3.3 over 3.2 ) are due to object reuse. It is possible that tag handlers are different and the other overheads will obscure any benefit ( at least under low load ), but I can bet that under heavy load recycling will be very significant, if done correctly. Costin -- To unsubscribe, e-mail: For additional commands, e-mail: - Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release() not invoked]
Peter Lin wrote: I haven't read all the posts on this discussion, but here's some facts from personal observations. for pages with only a few tags, ie less than 30, tag pooling doesn't help. On the otherhand, if your page has 100+ tags, it improves performance. Some of the pages I benchmarked with had about 135 tags. In those situations, I saw a 20-50% improvement. I would argue that sites that don't have a lot of load should simply turn off tag pooling. Site that use tags extensively and get 1millions page views a day, will gain significantly from tag pooling. Is this based on the current tag pool implementation in jasper2 ? Because it is pretty clear that the tag pool has few problems. I would say the nature of the tags will also have a big impact. If your tag is very simple - you'll probably get some small benefits under load ( 20..30% ?). If the tag uses internal data structures, buffers, etc - it's very likely you'll see more ( since creating each tag instance will also create the additional hashtable, StringBuffers, etc ). I would bet that with complex tags that are specifically written to take advantage of the recycling you would see at least 2x better performance ( with a good sync-free and large enough tag pool ). If your tag is using any buffers or complex/expensive data structures that can be recycled - you'll save a lot. I don't think the number of tags in a page is too important - even if you have 1 complex tag - with 100 concurent users - you should see a difference. In an ideal world, all core tags would be recyclable and garbage-free - that may allow them to run at comparable speed with a hard-coded page. Costin peter lin Costin Manolache [EMAIL PROTECTED] wrote:Hans Bergsten wrote: Without pooling With pooling Reuse w/o overhead - 5 threads Avg.: 330 ms 349 ms N/A Rate: 15.2/sec 13.6/sec N/A 20 threads Avg.: 1,752 ms 1,446 ms 1,265 ms Rate: 12.1/sec 13.6/sec 14.7/sec To me, this indicates that if you can avoid _all_ reuse overhead, there's some performace to be gained from reuse but not much. With the From 1.2s to 1.7s there is about 35% difference. I would call this quite significant. Even between 1.4 and 1.7 - you have 20%. Try to increase the thread count to 100 - and you'll see this going up. The difference ( 0.5s ) is probably 2-3 times the response time of apache for a static page. And most users will feel it. I agree that in percentage, the difference is somewhat significant, but don't make too much out of the real value. My test server is not representative of the type of hardware you would use for a site with this type of load. On hardware suitable for the task, the difference in And the test page is not representative of the type of pages that will run on a real site. I know that. All we can measure with relative accuracy is the overhead of the container/jsp implementation - at least in relative terms. Take as the reference the time ( or RPS ) for Apache to serve the same output as a static page. Or the time a servlet will take to generate the same output. Run your tests with 5, 20, 100 RPS ( and ab may be a better driver ). Compare the results - and most likely a production server will see similar ratios. I'll try to find some time ( next week - I hope ) and run the same tests with the no sync pool. the real values will likely be a lot smaller, and IMHO, insignificant. But please, let's not start a long debate about what's significant or not (that depends on too many factors). All I'm trying to show with these simple tests is that for pooling to really make a difference at all, you need to avoid all overhead, which may be very hard, and that the overhead with current pooling seems to eat all potential gain. Well - it shows pretty clearly that the _current_ implementation of thread pool is broken. Even if we don't take sync into account, the pool has a 5 object limit - what else could you expect ?? I ran 10,000 requests for each test case after a manual warm up (just a few requests to give the JIT a chance to kick in). If I rerun the tests to capture GC data (as Glen was asking for), I can run a longer warm-up as well. I didn't record the max values, but IIRC they were around 100 sec in all cases. The 1.4 JIT takes some time to kick in, if you run batches of 1000 requests you'll see the time keeps improving. I would do at leat 5000 request to warm up the jit. This is a very good start, thanks for bringing this up. I hope it at least gives us a better idea about what types of gains we can realistically expect from tag handler reuse. Most of the improvements in coyote ( or in 3.3 over 3.2 ) are due to object reuse. It is possible that tag handlers are different and the other overheads will obscure any benefit ( at least under low load ), but I can bet that under heavy load recycling will be very
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release() not invoked]
these were all JSTL tags. Back when I ran the tests, I posted some of the results. I did tests that were synthetic, ie out 100 JSTL out tags in one page. Others were based on an actual page layout with lots of markup logic that use jstl c:choose in conjunction with jslt xml tags. the tests were with tomcat 4.1's jasper2 and with 4.0x jasper1. obviously the tag pooling was only with jasper2. I didn't have time to test tomcat 3.x tag pooling. peter lin Costin Manolache [EMAIL PROTECTED] wrote:Peter Lin wrote: I haven't read all the posts on this discussion, but here's some facts from personal observations. for pages with only a few tags, ie less than 30, tag pooling doesn't help. On the otherhand, if your page has 100+ tags, it improves performance. Some of the pages I benchmarked with had about 135 tags. In those situations, I saw a 20-50% improvement. I would argue that sites that don't have a lot of load should simply turn off tag pooling. Site that use tags extensively and get 1millions page views a day, will gain significantly from tag pooling. Is this based on the current tag pool implementation in jasper2 ? Because it is pretty clear that the tag pool has few problems. I would say the nature of the tags will also have a big impact. If your tag is very simple - you'll probably get some small benefits under load ( 20..30% ?). If the tag uses internal data structures, buffers, etc - it's very likely you'll see more ( since creating each tag instance will also create the additional hashtable, StringBuffers, etc ). I would bet that with complex tags that are specifically written to take advantage of the recycling you would see at least 2x better performance ( with a good sync-free and large enough tag pool ). If your tag is using any buffers or complex/expensive data structures that can be recycled - you'll save a lot. I don't think the number of tags in a page is too important - even if you have 1 complex tag - with 100 concurent users - you should see a difference. In an ideal world, all core tags would be recyclable and garbage-free - that may allow them to run at comparable speed with a hard-coded page. Costin - Do you Yahoo!? Yahoo! Mail Plus - Powerful. Affordable. Sign up now
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release()not invoked]
Costin Manolache wrote: [...] In an ideal world, all core tags would be recyclable and garbage-free - that may allow them to run at comparable speed with a hard-coded page. I think it's more important to implement open coding of JSTL, i.e. generate if and for statement instead of using c:if and c:forEach tag handlers. That would really make a difference for all apps that use JSTL, while the potential gains from tag handler reuse depend on a lot of factors that varies between applications and the runtime environment. Hans -- Hans Bergsten[EMAIL PROTECTED] Gefion Software http://www.gefionsoftware.com/ Author of O'Reilly's JavaServer Pages, covering JSP 1.2 and JSTL 1.0 Details athttp://TheJSPBook.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release() not invoked]
Hans Bergsten wrote: Costin Manolache wrote: [...] In an ideal world, all core tags would be recyclable and garbage-free - that may allow them to run at comparable speed with a hard-coded page. I think it's more important to implement open coding of JSTL, i.e. generate if and for statement instead of using c:if and c:forEach tag handlers. That would really make a difference for all apps that use JSTL, while the potential gains from tag handler reuse depend on a lot of factors that varies between applications and the runtime environment. +1 - open coding is far better ( for performance ). Is the API/model for portable open coding defined and stable ? That would be by far the biggest improvement in JSP performance. But for people who use regular tag handlers - I think we need to fix the tag pool, and a fixed tag pool will improve the performance significantly. And if regular tags are used, for whatever reason, recycling should be taken into account - if people care a bit about performance. Costin -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release() not invoked]
Hans Bergsten wrote: Without pooling With pooling Reuse w/o overhead - 5 threads Avg.: 330 ms349 ms N/A Rate:15.2/sec 13.6/sec N/A 20 threads Avg.:1,752 ms 1,446 ms1,265 ms Rate:12.1/sec 13.6/sec14.7/sec To me, this indicates that if you can avoid _all_ reuse overhead, there's some performace to be gained from reuse but not much. With the From 1.2s to 1.7s there is about 35% difference. I would call this quite significant. Even between 1.4 and 1.7 - you have 20%. Try to increase the thread count to 100 - and you'll see this going up. The difference ( 0.5s ) is probably 2-3 times the response time of apache for a static page. And most users will feel it. current implementation, however, the overhead seems to kill all gains from creating fewer instances. I doubt increasing MAX_POOL_SIZE makes much of a difference. Increasing it from the current 5 - it would make a difference. I agree - the ideal no overhead is harder to achieve, but I think the thread-local,no-sync case is close enough. I'll try to reproduce the test. BTW, how many requests did you make, and what was the max response time ( max is very affected by GC ) ? I usually do 5000 to warm up and 10.000 to run the test. This is a very good start, thanks for bringing this up. Costin -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release() not invoked]
Interesting. Your test JSP page looks like a valid test. There is no data about GC in your tests, of course GC can happen at any time. I would be interested in seeing the tests run with -Xincgc and -Xverbose:gc. Then run a high enough volume of tests that a Full GC gets triggered a dozen times or so. I would think the GC data would be very different between the 5 thread and 20 thread tests with tag pooling enabled. The metrics to use might be the time spent doing GC, the number of incremental GC's, and the number of Full GC's. There is also no data about system CPU load. The tests show performance from a request latency and requests per second viewpoint, but do not necessarily show the difference in scaling. Showing CPU load might indicate whether one solution scales better than another. Regards, Glenn Hans Bergsten wrote: Hans Bergsten wrote: Costin Manolache wrote: [...] Wow. I would be _very_ curious to see those benchmarks and the modern JVM that was used. All my tests ( including JDK1.4, IBM vms, GCJ ) show that reusing is well worth the trouble - at least if you have 100s of requests per second ( it is not worht the trouble for very low loads ). But I'm happy to hear that I'm wrong. I'll try to find the figures we looked at and post them, or run a new benchmark against TC 4.1 with and without tag handler pooling enabled. But it may take some time, because right now I'm busy with other stuff. If you disagree with the decision, you may want to send your feedback to the EG: [EMAIL PROTECTED] JSP 2.0 is still just PFD. Okay, I ran a few test cases with Tomcat 4.1.18. Benchmarks are of course never perfect, but it should be good enough to evaluate the difference with and without tag handler reuse. My test server is a 1 GHz Pentium with 256 MB, with Sun's Linux JDK 1.4.1. I ran all tests with Apache JMeter, once with 5 threads (so the MAX_POOL_SIZE is not exceeded) and one time with 20 threads, with and without pooling enabled. I also hacked the servlet class generated with pooling enabled so that there's no overhead from the reuse itself. I simple create one instance of each tag handler at the beginning of the _jspService() method and use this instance for all invokations. This is as efficient as it can be, with no extra cost for synchronization or Map lookups. I ran this test once with 20 threads. I used this test page: %@ page contentType=text/plain % %@ taglib prefix=c uri=http://java.sun.com/jstl/core; % c:forEach begin=1 end=100 c:forEach begin=1 end=10 c:out value= / /c:forEach /c:forEach While it's simple, it should show the impact of tag handler reuse. With pooling disabled, one tag handler instance is created for the outer c:forEach, a new one is created for the inner c:forEach for each pass through the loop (i.e. 100 instances), and a new instance for c:out is created for each invokation (i.e. 1000 instances). With pooling enabled, the total number of instances depends on the number of concurrent requests. For the 5 threads tests, it should stay close to 5 instances (although non-pooled instances may occasionally be created and released immediately). For the 20 threads test, a lot more instances are created (since the pool is currently limited to 5 instances), but it should still be less than when pooling is disabled. Okay, here are the results Without pooling With pooling Reuse w/o overhead - 5 threads Avg.: 330 ms349 ms N/A Rate:15.2/sec 13.6/sec N/A 20 threads Avg.:1,752 ms 1,446 ms1,265 ms Rate:12.1/sec 13.6/sec14.7/sec To me, this indicates that if you can avoid _all_ reuse overhead, there's some performace to be gained from reuse but not much. With the current implementation, however, the overhead seems to kill all gains from creating fewer instances. I doubt increasing MAX_POOL_SIZE makes much of a difference. Feel free to run the test on your platform. It could be interesting to see some more results. Also, if you think my test page is flawed, I'd appreciate ideas for how to improve it. Hans -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release() not invoked]
Glenn Nielsen wrote: Interesting. Your test JSP page looks like a valid test. Good. There is no data about GC in your tests, of course GC can happen at any time. I would be interested in seeing the tests run with -Xincgc and -Xverbose:gc. Then run a high enough volume of tests that a Full GC gets triggered a dozen times or so. I would think the GC data would be very different between the 5 thread and 20 thread tests with tag pooling enabled. The metrics to use might be the time spent doing GC, the number of incremental GC's, and the number of Full GC's. I can rerun the tests with these options and include a sample of the verbose:gc output. Does that help? I'm afraid I don't have the time to summarize the GC data as you suggest, but you're welcome to do so ;-) There is also no data about system CPU load. The tests show performance from a request latency and requests per second viewpoint, but do not necessarily show the difference in scaling. Showing CPU load might indicate whether one solution scales better than another. Any tips about the best way to measure CPU in a meaningful way on Linux? I know about top/gtop, vmstat, uptime etc. but they show system values (or snapshots of the current CPU for a thread). I vaguely recall using a command on Solaris many years ago that takes the command you want to measure as an argument and gives you min, max and average CPU when it completes, but I've forgot what it's called. Does it ring any bells? If not, what do you recommend to measure CPU? Hans Hans Bergsten wrote: Hans Bergsten wrote: Costin Manolache wrote: [...] Wow. I would be _very_ curious to see those benchmarks and the modern JVM that was used. All my tests ( including JDK1.4, IBM vms, GCJ ) show that reusing is well worth the trouble - at least if you have 100s of requests per second ( it is not worht the trouble for very low loads ). But I'm happy to hear that I'm wrong. I'll try to find the figures we looked at and post them, or run a new benchmark against TC 4.1 with and without tag handler pooling enabled. But it may take some time, because right now I'm busy with other stuff. If you disagree with the decision, you may want to send your feedback to the EG: [EMAIL PROTECTED] JSP 2.0 is still just PFD. Okay, I ran a few test cases with Tomcat 4.1.18. Benchmarks are of course never perfect, but it should be good enough to evaluate the difference with and without tag handler reuse. My test server is a 1 GHz Pentium with 256 MB, with Sun's Linux JDK 1.4.1. I ran all tests with Apache JMeter, once with 5 threads (so the MAX_POOL_SIZE is not exceeded) and one time with 20 threads, with and without pooling enabled. I also hacked the servlet class generated with pooling enabled so that there's no overhead from the reuse itself. I simple create one instance of each tag handler at the beginning of the _jspService() method and use this instance for all invokations. This is as efficient as it can be, with no extra cost for synchronization or Map lookups. I ran this test once with 20 threads. I used this test page: %@ page contentType=text/plain % %@ taglib prefix=c uri=http://java.sun.com/jstl/core; % c:forEach begin=1 end=100 c:forEach begin=1 end=10 c:out value= / /c:forEach /c:forEach While it's simple, it should show the impact of tag handler reuse. With pooling disabled, one tag handler instance is created for the outer c:forEach, a new one is created for the inner c:forEach for each pass through the loop (i.e. 100 instances), and a new instance for c:out is created for each invokation (i.e. 1000 instances). With pooling enabled, the total number of instances depends on the number of concurrent requests. For the 5 threads tests, it should stay close to 5 instances (although non-pooled instances may occasionally be created and released immediately). For the 20 threads test, a lot more instances are created (since the pool is currently limited to 5 instances), but it should still be less than when pooling is disabled. Okay, here are the results Without pooling With pooling Reuse w/o overhead - 5 threads Avg.: 330 ms349 ms N/A Rate:15.2/sec 13.6/sec N/A 20 threads Avg.:1,752 ms 1,446 ms1,265 ms Rate:12.1/sec 13.6/sec14.7/sec To me, this indicates that if you can avoid _all_ reuse overhead, there's some performace to be gained from reuse but not much. With the current implementation, however, the overhead seems to kill all gains from creating fewer instances. I doubt increasing MAX_POOL_SIZE makes much of a difference. Feel free to run the test on your platform. It could be interesting to see some more results. Also, if you think my test page is flawed, I'd appreciate ideas for how to improve it. Hans -- To unsubscribe, e-mail:
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release() not invoked]
Costin Manolache wrote: Hans Bergsten wrote: Without pooling With pooling Reuse w/o overhead - 5 threads Avg.: 330 ms349 ms N/A Rate:15.2/sec 13.6/sec N/A 20 threads Avg.:1,752 ms 1,446 ms1,265 ms Rate:12.1/sec 13.6/sec14.7/sec To me, this indicates that if you can avoid _all_ reuse overhead, there's some performace to be gained from reuse but not much. With the From 1.2s to 1.7s there is about 35% difference. I would call this quite significant. Even between 1.4 and 1.7 - you have 20%. Try to increase the thread count to 100 - and you'll see this going up. The difference ( 0.5s ) is probably 2-3 times the response time of apache for a static page. And most users will feel it. I agree that in percentage, the difference is somewhat significant, but don't make too much out of the real value. My test server is not representative of the type of hardware you would use for a site with this type of load. On hardware suitable for the task, the difference in the real values will likely be a lot smaller, and IMHO, insignificant. But please, let's not start a long debate about what's significant or not (that depends on too many factors). All I'm trying to show with these simple tests is that for pooling to really make a difference at all, you need to avoid all overhead, which may be very hard, and that the overhead with current pooling seems to eat all potential gain. current implementation, however, the overhead seems to kill all gains from creating fewer instances. I doubt increasing MAX_POOL_SIZE makes much of a difference. Increasing it from the current 5 - it would make a difference. I agree - the ideal no overhead is harder to achieve, but I think the thread-local,no-sync case is close enough. I'll try to reproduce the test. BTW, how many requests did you make, and what was the max response time ( max is very affected by GC ) ? I usually do 5000 to warm up and 10.000 to run the test. I ran 10,000 requests for each test case after a manual warm up (just a few requests to give the JIT a chance to kick in). If I rerun the tests to capture GC data (as Glen was asking for), I can run a longer warm-up as well. I didn't record the max values, but IIRC they were around 100 sec in all cases. This is a very good start, thanks for bringing this up. I hope it at least gives us a better idea about what types of gains we can realistically expect from tag handler reuse. Hans -- Hans Bergsten[EMAIL PROTECTED] Gefion Software http://www.gefionsoftware.com/ Author of O'Reilly's JavaServer Pages, covering JSP 1.2 and JSTL 1.0 Details athttp://TheJSPBook.com/ -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release() not invoked]
Hans Bergsten wrote: Costin Manolache wrote: quite significant. Even between 1.4 and 1.7 - you have 20%. Try to increase the thread count to 100 - and you'll see this going up. The difference ( 0.5s ) is probably 2-3 times the response time of apache for a static page. And most users will feel it. I agree that in percentage, the difference is somewhat significant, but don't make too much out of the real value. My test server is not representative of the type of hardware you would use for a site with this type of load. On hardware suitable for the task, the difference in the real values will likely be a lot smaller, and IMHO, insignificant. But please, let's not start a long debate about what's significant or not (that depends on too many factors). All I'm trying to show with these simple tests is that for pooling to really make a difference at all, you need to avoid all overhead, which may be very hard, and that the overhead with current pooling seems to eat all potential gain. Personally, I like to think in terms of percentages, as it gives something to measure improvement, and anything which will give 10% of free performance is good. (Of course, if you don't need the extra performance, then good for you) There have been a lot of changes which individually gave only a small performance gain when going from 4.0 to 4.1, but put everything together and it ends up being much faster. I also think it's great there's no longer any huge hotspot left in Tomcat, which would represent that much in (wasted) processing time. 5.0 will be no different, and will feature a lot of incremental performance improvements, which hopefully will be significant once all put together. Remy -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]
Re: Tag reuse performance [Was: Re: DO NOT REPLY [Bug 16001] - Tag.release() not invoked]
Hans Bergsten wrote: Without pooling With pooling Reuse w/o overhead - 5 threads Avg.: 330 ms349 ms N/A Rate:15.2/sec 13.6/sec N/A 20 threads Avg.:1,752 ms 1,446 ms1,265 ms Rate:12.1/sec 13.6/sec14.7/sec To me, this indicates that if you can avoid _all_ reuse overhead, there's some performace to be gained from reuse but not much. With the From 1.2s to 1.7s there is about 35% difference. I would call this quite significant. Even between 1.4 and 1.7 - you have 20%. Try to increase the thread count to 100 - and you'll see this going up. The difference ( 0.5s ) is probably 2-3 times the response time of apache for a static page. And most users will feel it. I agree that in percentage, the difference is somewhat significant, but don't make too much out of the real value. My test server is not representative of the type of hardware you would use for a site with this type of load. On hardware suitable for the task, the difference in And the test page is not representative of the type of pages that will run on a real site. I know that. All we can measure with relative accuracy is the overhead of the container/jsp implementation - at least in relative terms. Take as the reference the time ( or RPS ) for Apache to serve the same output as a static page. Or the time a servlet will take to generate the same output. Run your tests with 5, 20, 100 RPS ( and ab may be a better driver ). Compare the results - and most likely a production server will see similar ratios. I'll try to find some time ( next week - I hope ) and run the same tests with the no sync pool. the real values will likely be a lot smaller, and IMHO, insignificant. But please, let's not start a long debate about what's significant or not (that depends on too many factors). All I'm trying to show with these simple tests is that for pooling to really make a difference at all, you need to avoid all overhead, which may be very hard, and that the overhead with current pooling seems to eat all potential gain. Well - it shows pretty clearly that the _current_ implementation of thread pool is broken. Even if we don't take sync into account, the pool has a 5 object limit - what else could you expect ?? I ran 10,000 requests for each test case after a manual warm up (just a few requests to give the JIT a chance to kick in). If I rerun the tests to capture GC data (as Glen was asking for), I can run a longer warm-up as well. I didn't record the max values, but IIRC they were around 100 sec in all cases. The 1.4 JIT takes some time to kick in, if you run batches of 1000 requests you'll see the time keeps improving. I would do at leat 5000 request to warm up the jit. This is a very good start, thanks for bringing this up. I hope it at least gives us a better idea about what types of gains we can realistically expect from tag handler reuse. Most of the improvements in coyote ( or in 3.3 over 3.2 ) are due to object reuse. It is possible that tag handlers are different and the other overheads will obscure any benefit ( at least under low load ), but I can bet that under heavy load recycling will be very significant, if done correctly. Costin -- To unsubscribe, e-mail: mailto:[EMAIL PROTECTED] For additional commands, e-mail: mailto:[EMAIL PROTECTED]