I have a modest (17M triple) dataset, fairly flat graph. I run some queries selecting nodes with anywhere from 12-20 different property values.
Result set counts are anywhere from 10,000 to 30,000 nodes. Total execution time measured at client are in the 30-40 second range. The web request begins streaming results immediately, but seems to take longer than it should (based on the number of results and size of data transfer). I also notice that the time is roughly linear with the size of dataset--halving the dataset size halves the result set and the execution time. I wouldn't have expected this behavior if all the time was due to an indexed search. My question is: is total query time limited by search execution speed, or by marshaling and serialization of search results? I have tried different query patterns, and believe I have the best queries possible for the use case. I'm looking for other suggestions to reduce overall execution time. The performance does not improve drastically going from 4Gb to 8 or 16Gb RAM. My test platforms are 64-bit Windows, ranging from small server (16Gb RAM, 4 CPU) to laptops with 4Gb RAM. Thanks, --Paul
