On Sat, Jun 19, 2010 at 12:58:45PM -0400, Eric Smith scratched on the wall:
> Jay A.  Kreibich wrote: 
> 
> I think the use case will usually be (only) writes followed by (only) 
> reads.  There may be incremental writes later, but they will hopefully 
> be small compared to the initial seed.  I intend to create a set of 
> maybe 7 indices to serve the particular queries I think the user intends 
> to run.  Let's all hope we can update the db with those indices at a 
> higher rate than the user can generate data.  :p 

  Indexes do, unfortunately, take some time.  The PK values you were
  inserting in-order, which helps, but if you need that many indexes
  you're going to need to shuffle a lot of data.

> Insert rates are still a rather crappy 10k records/second (when we were 
> in RAM we were doing ~65k recs/sec, which I think was bound by my app's 
> speed). 

  As others have said, a 6.5x difference between disk and memory is
  darn good, even with the most expensive arrays.

> I think I'm at the stupid raid5's limit -- not really sure what 
> I should expect there, or how to find out what I should expect.  

  Try adjusting the page size.  If possible, match the page size and
  the RAID strip size.  RAID 4 and 5 suffer from a "read on write"
  condition to gather the parity data.  If you can write chunks of data
  that match the stripe size, the read can be skipped, which boosts
  performance.  This is especially true for writing data that was not
  previously read (e.g. growing a file) and might still be in the
  RAID cache.

  The max page size is 32K, which might not be big enough to cover the
  strip size.  Depends a lot on how many disks you have and how the
  RAID is configured.
  
  Regardless, even if you can't match the strip size, I would try
  increasing the page size.  With a database like yours, you can likely
  take the possible storage hit if it improves performance by reducing
  the number of I/O transactions.

  Just be sure you adjust your page cache.  The cache is defined in
  pages, so if you bump up the page size you might need to reduce the
  cache size to avoid using too much memory.

> Anyway, I'll build the indices after the seed.  I'll go out on a limb 
> and assume that'll be a lot faster than it would've been under my 
> initial approach.  

  Not always.  Building after insert is usually a bit faster, but only
  slightly.  Such is life.  If you want the performance, you need to
  pay the price.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Intelligence is like underwear: it is important that you have it,
 but showing it to the wrong people has the tendency to make them
 feel uncomfortable." -- Angela Johnson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to