Part of a web application I'm working on requires a large number of queries joining several tables, and each table containing millions of rows. Accessing the database (Frontbase) using either the JDBC drivers or the Rev drivers provided by Frontbase, resulted in single queries that would take anywhere between 300-900ms. As there could be hundreds of such queries in order to produce one response, I had to look at ways to speed this up. I don't believe that it is a question of writing better SQL - the queries have to spider across the data, and keep a track of the route they take to get to each node in the graph.
I tuned the database, making sure that all the 'where' columns were indexed, and checked the query plan to ensure that it was precisely those indexes being used in the queries. I gave the database 100mb of RAM cache, and even tried loading the core tables 100% into RAM. But still I could not really get much better performance. From my testing, the data did not seem to be found in RAM until after the first time it was retrieved from disk, despite my setting the tables to pre-load in RAM. Perhaps over time RAM-cached data would have shown a much better performance than at this stage. I decided to try HSQLDB - a rdbms written in Java, and which can contain all its tables in RAM, so I could be sure that the data was in RAM. After exporting the core tables to CSV format (and transforming them to SQL Inserts using Rev), it took about 1 hour to start the database, and when started the database took 1.5gb of RAM. But the queries were still slow (my guess is that HSQLDB is just not designed to take that much data). So, I took the CSV data and imported it into a stack, turning each table into a custom property set. I could now search the data in a tiny fraction of a second -- literally. The hundreds of queries involved in making a single response are completed in 200ms. That's right -- I can now run the entire series of queries in less time than it took to do a single query accessing a rdbms. Obviously, custom properties being used in this way are not appropriate if one needs referential integrity, locking, concurrent access, etc. But for a read-only query, Rev has provided a very satisfactory solution. Moreover, whilst HSQLDB would take an hour and 1.5gb of RAM, Rev could load the 100mb of CSV data in 2-4 secs, making the frequent updating of data and subsequent restarting of the application feasible. After following Pierre Sahores' tutorial on creating persistent Rev CGI processes, I now have these queries returning a web page in 200ms, instead of the 30 secs they would have taken using Java and a database. I know all of this flies in the face of many people's expectations, and it is certainly not the position I saw myself being in 2 weeks ago. The rest of the application is written in Java, but this core feature has been made possible using Rev. I expect that associative arrays in e.g. PHP might also be as fast in this regard. Bernard _______________________________________________ use-revolution mailing list [email protected] Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
