On Fri, Jun 26, 2009 at 10:06:48AM -0700, Kosenko Max scratched on the wall:
> 
> 
> Doug Fajardo wrote:
> > No, I admit I haven't tried this under SQLITE.
> > 
> > Whether this approach will help for the specific application will depend
> > on data usage patterns, which we haven't delved into for this application.
> > Call me simple: since the main issue is degraded performance with larger
> > groupings of data, it seemed to make sense that breaking the data into
> > smaller groupings would help. 
> > 
> > Of course it's very possible that the size of the database in question may
> > mean that the number of hash buckets needed to reap significant benefits
> > makes this approach counter-productive. That's why it is only a suggestion
> > :-)
> 
> I think you're assumptions wrong initially. I just can't imagine scenario
> where your proposal witl give any benefit except wrong implementation of
> B-Tree which is not the case with SQLite.
> 
> As I have posted in answer to thread-starter, degradation of performance
> because of the cache hit ratio becoming less with amount of data. Your
> proposal may work in case keys feeding not random and hitting only several
> tables and that gives higher cache hit ratio. But if that's the case - the
> same will occur with B-Tree. And if that's the case - why not to feed data
> sorted - in that case by logic and as it's proven SQLite will insert the
> data without any performance degradation.
> 
> Could you describe me situation in which your proposal would help and why?


  Assuming we have a huge number of data points and that our operations
  are on random rows, it would be possible to quickly develop the
  situation you describe: the cache hit-ratio crashes, and each and
  every B-tree traversal requires us to pull a page off disk.  This
  makes a deep tree traversal very expensive, as each moving across
  each level of the tree requires I/O.

  Now consider the hash system.  We setup 100 tables in a database and
  use a hash of the key to figure out which table to access.  From
  there, it is more or less the same thing.

  What we're doing is cutting the top of the B-tree off and reducing it
  to 100 (or whatever) sub-trees.  The whole "super-tree" that we've cut
  off is replaced by the hash algorithm, allowing us to jump right to
  the correct sub-tree and start our tree traversal there.  This skips
  traversal of the layers that make up the "super-tree" structure,
  saving on the I/O.

  At least in theory.

  The problem is two fold.  This is dependent on the cache-replacement
  algorithm used by SQLite (of which I know nothing about), but in
  theory the nodes that make up the "super-tree" are exactly the nodes
  you would expect to remain in the cache.  They're used by every
  lookup on the dataset, after all.  Even if the replacement algorithm
  is random and they're written over, they're going to be pulled back
  in soon enough.

  Second, if I understand the BTree node structure used in SQLite, a
  single node can hold a fairly hefty number of children.  This is not a
  binary tree, that's for sure. This means you're not cutting off all
  that many layers, even with 100 or more tables, which means you're
  not actually going to see a ton of savings.



  Overall, I agree that the OP will likely see a noticeable improvement 
  if they boost the cache size.  Bumping it up 10x or even 100x on a
  modern workstation is not that big of a deal (100x ~= 300MB if you
  start with the default 1K page and 2000 page cache).  We see the same
  thing when you build a big index (which is accessing semi-random
  data and doing a lot of tree node shuffling).  The best way to
  increase index build performance is to boost the cache size so that
  as much of the Btree as possible is always in RAM.  I suspect this is
  much the same case.

   -j

-- 
Jay A. Kreibich < J A Y  @  K R E I B I.C H >

"Our opponent is an alien starship packed with atomic bombs.  We have
 a protractor."   "I'll go home and see if I can scrounge up a ruler
 and a piece of string."  --from Anathem by Neal Stephenson
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to