On Sat, Jun 19, 2010 at 12:58:45PM -0400, Eric Smith scratched on the wall: > Jay A. Kreibich wrote: > > I think the use case will usually be (only) writes followed by (only) > reads. There may be incremental writes later, but they will hopefully > be small compared to the initial seed. I intend to create a set of > maybe 7 indices to serve the particular queries I think the user intends > to run. Let's all hope we can update the db with those indices at a > higher rate than the user can generate data. :p
Indexes do, unfortunately, take some time. The PK values you were inserting in-order, which helps, but if you need that many indexes you're going to need to shuffle a lot of data. > Insert rates are still a rather crappy 10k records/second (when we were > in RAM we were doing ~65k recs/sec, which I think was bound by my app's > speed). As others have said, a 6.5x difference between disk and memory is darn good, even with the most expensive arrays. > I think I'm at the stupid raid5's limit -- not really sure what > I should expect there, or how to find out what I should expect. Try adjusting the page size. If possible, match the page size and the RAID strip size. RAID 4 and 5 suffer from a "read on write" condition to gather the parity data. If you can write chunks of data that match the stripe size, the read can be skipped, which boosts performance. This is especially true for writing data that was not previously read (e.g. growing a file) and might still be in the RAID cache. The max page size is 32K, which might not be big enough to cover the strip size. Depends a lot on how many disks you have and how the RAID is configured. Regardless, even if you can't match the strip size, I would try increasing the page size. With a database like yours, you can likely take the possible storage hit if it improves performance by reducing the number of I/O transactions. Just be sure you adjust your page cache. The cache is defined in pages, so if you bump up the page size you might need to reduce the cache size to avoid using too much memory. > Anyway, I'll build the indices after the seed. I'll go out on a limb > and assume that'll be a lot faster than it would've been under my > initial approach. Not always. Building after insert is usually a bit faster, but only slightly. Such is life. If you want the performance, you need to pay the price. -j -- Jay A. Kreibich < J A Y @ K R E I B I.C H > "Intelligence is like underwear: it is important that you have it, but showing it to the wrong people has the tendency to make them feel uncomfortable." -- Angela Johnson _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users