Hi, Dave. Sorry for the messed up text. The following I re-send my last mail.
Still, this is about the weblog view data access. The web handles specified in roller properties rendering weblogMapper.rollerProtectedUrls are all for user account console and they are not going to appear in user created websites. They are not of any concern. What concern us are the requests with URI pattern ‘/roller-ui/rendering/resources’, which are specified in theme.xml as elements of <resource/>. WeblogRequestMapper validates the handle of an incoming web page text/html content and then validates the handle of each incoming request sent from the corresponding browser client following the URL links specified in that incoming text/html content. The validating function is WeblogRequestMapper.isWeblog(String potentialHandle). Take an example, for a web page has ten links for css, js and images, we are going to have one request and then eleven requests. For each request Roller will do the following things: 1. Retrieve a connection instance from connection pool, or create a new JDBC connection 2. Retrieve the prepared statement from server statement cache, or create a prepared statement for the named query 3. Set parameter ‘handle’ and execute the sql queryGet all the data for the specified weblog, this includes instances of root category and categories 4. Recycle the connection or close and discard it for GC 5. Create a new weblog object and populate data to this object So in this example, for one web page request Roller consumes eleven JDBC connection instances, and creates eleven weblog objects to just check whether the object exists or not. If some websites on Roller take high volume of http requests, the Roller database could easily be overwhelmed and turn into deadlock. With all those later incoming requests in line, the memory usage will touch the ceiling. And now the database is the single point of failure. Without the database standing there validate web handle for each request and Last-Modified for each text/html request, we are going to see a dead-white page that will go nowhere. I believe this is highly possible. Take a look at those technical parameters and usage of database servers, it is obvious that database servers are not designed for a kind of tasks Roller is doing now in validating each http request. I would suggest that cache should be used for weblog page view. Put it simply, Roller should have cache for weblog and weblog entries. Roller users manage their account, persist changes to database and update the changes into cache. Roller users' passwords are not cached, this is for security reason. Roller viewers retrieve web content, all they see are from cache, they should never touch database. Something like referrer address or hit counts will be cached and be persisted to database at server stopping, or at administrators’ command. The current caching system does not fit the task I described. Current Roller caches are just local hash maps or hash tables, they are not distributed; It has no synchronization of weblog content, especially the value ‘Last-Modified’ for multiple server threads. While nowadays most production environments are clustering environment, composed of multiple JVMs and application server runtimes. I learned that Ehcache support distributed map. I know that WebSphere cache instance implements IBM distributed map. The best solution for Roller is an interface for third party distributed cache accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very good. Thank you. David --- On Wed, 5/26/10, (David) Ming Xia <david.ming....@ibol.biz> wrote: From: (David) Ming Xia <david.ming....@ibol.biz> Subject: About weblog view data access To: user@roller.apache.org, "Mailing List Apache Roller Developer" <d...@roller.apache.org> Date: Wednesday, May 26, 2010, 8:30 PM Hi, Dave. Still, this is about the weblog view data access. The web handles specified in roller properties rendering weblogMapper.rollerProtectedUrls are all for user account console and they are not going to appear in user created websites. They are not of any concern. What concern us are the requests with URI pattern ‘/roller-ui/rendering/resources’, which are specified in theme.xml as elements of <resource/>. WeblogRequestMapper validates the handle of an incoming web page text/html content and then validates the handle of each incoming request sent from the corresponding browser client following the URL links specified in that incoming text/html content. The validating function is WeblogRequestMapper.isWeblog(String potentialHandle). Take an example, for a web page has ten links for css, js and images, we are going to have one request and then eleven requests. For each request Roller will do the following things: Retrieve a connection instance from connection pool, or create a new JDBC connectionRetrieve the prepared statement from server statement cache, or create a prepared statement for the named querySet parameter ‘handle’ and execute the sql queryGet all the data for the specified weblog, this includes instances of root category and categoriesRecycle the connection or close and discard it for GC Create a new weblog object and populate data to this object So in this example, for one web page request Roller consumes eleven JDBC connection instances, and creates eleven weblog objects to just check whether the object exists or not. If some websites on Roller take high volume of http requests, the Roller database could easily be overwhelmed and turn into deadlock. With all those later incoming requests in line, the memory usage will touch the ceiling. And now the database is the single point of failure. Without the database standing there validate web handle for each request and Last-Modified for each text/html request, we are going to see a dead-white page that will go nowhere. I believe this is highly possible. Take a look at those technical parameters and usage of database servers, it is obvious that database servers are not designed for a kind of tasks Roller is doing now in validating each http request. I would suggest that cache should be used for weblog page view. Put is simply, Roller should have cache for weblog and weblog entries. Roller users manage their account, persist changes to database and update the changes into cache. Roller users' passwords are not cached, this is for security reason. Roller viewers retrieve web content, all they see are from cache, they should never touch database. Something like referrer address or hit counts will be cached and be persisted to database at server stopping, or at administrators’ command. The current caching system does not fit the task I described. Current Roller caches are just local hash maps or hash tables, they are not distributed; It has no synchronization of weblog content, especially the value ‘Last-Modified’ for multiple server threads. While nowadays most production environments are clustering environment, composed of multiple JVMs and application server runtimes. I learned that Ehcache support distributed map. I know that WebSphere cache instance implements IBM distributed map. The best solution for Roller is an interface for third party distributed cache accessed with JNDI lookup, otherwise, Roller bundled with Ehcache is also very good. Thank you. David --- On Wed, 5/26/10, Dave <snoopd...@gmail.com> wrote: From: Dave <snoopd...@gmail.com> Subject: Re: Roller's implementation on conditional Get To: user@roller.apache.org, david.ming....@ibol.biz Date: Wednesday, May 26, 2010, 7:59 AM On Wed, May 26, 2010 at 12:11 AM, (David) Ming Xia <david.ming....@ibol.biz> wrote: > I took a look into it and I found another place that has very intensive > database queries. > > RequestMappingFilter.doFilter() --> WeblogRequestMapper.handleRequest(). > > RequestMapingFilter's URL mapping is /*, so it check every http request. > > WeblogRequestMapper.handleRequest() verifies ALL requests, I mean, > including those css, js and image files with named JPA queries. > > > Actually, both PageServlet and RequestMappingFilter query weblog with > handle. It looks like database is used as hashtable in these two functions. > While database is usually used for account data transaction, relational data > management. > > Now for each web page request there are at least 'eleven' database queries, > one for the text/html content in PageServelt and ten requests in mapping > filter for everything including the text/html. > > I feel that there could be even more database wires. Since many people > work on Roller and everyone tends to add some more wires. > > It seems that there should be a top-down design solution for this issue. > > Like to hear something from you. Hi David, You are correct, WeblogRequestMapper is invoked on every request, but does nothing when it encounters URLs that begin with these patterns: rendering.weblogMapper.rollerProtectedUrls=\ roller-ui,images,theme,themes,CommentAuthenticatorServlet,\ index.jsp,favicon.ico,robots.txt,\ page,flavor,rss,atom,language,search,comments,rsd,resource,xmlrpc,planetrss It ignores static theme resources (images, CSS, JS, etc.) and everything else that is not dynamically generated by a weblog page template. Perhaps the problem is not quite as bad as you think. There have not been that many people working on Roller and the ones that have worked on the code have been pretty disciplined about when database calls are made. But of course, even disciplined developers make mistakes. I'm sure there is much room for improvement and I encourage you to continue your research into performance bottlenecks. If you have a proposal for a top-down solution, or some patches to improve things -- I'd be happy to review them or even commit them for you if they look good. - Dave