I re-implemented indexing in a better way to try for better results, and the results that I got were very positive. In the worst case (Retrieving all XML from all files) the indexing and the collection function perform equally well. In other cases, indexing outperformed collection. Here are some results that I got:
Retrieve all data from 42000 files: Collection: 292 seconds Index: 290 seconds Retrieve 1 line of data from 42000 files: Collection: 50 seconds Index: 12 seconds I also wanted to see how well indexing did when the relevant data is found on a small subset of files. I ran a query to retrieve all data from only files matching an equality search (600 out of 42000 files). The results came in about 8 seconds for indexing. Unfortunately my machine was not powerful enough to run a collection and equality on the same set of data (A large enough frame size could not fit everything into my 4 gigs of memory). On the positive side, this seems to flush out another advantage of the indexing version. It ends up using a smaller frame size to perform the same task, meaning it can operate on larger files. I would love any feedback or other comparisons that you would like to see. Steven
