Well, I understand idea in general and how it works. But as you have described in second part of your letter - this won't help. Even if you will create 100 tables that will save you just 1 step from 5-7 IO steps, but won't make Cache hit ratio significantly higher. And I'm pretty sure that even having Most Recently Used cache instead of Most Often Used in SQLite, underlying OS will cache that step (root page) for you. But having +100 tables will put a lot of overhead and as a result you won't have any benefit in theory and I'm sure in practice.
BTW, default page size now not less than cluster size and that's most of the time > 4K. Thanks. Max. Jay A. Kreibich-2 wrote: > > Assuming we have a huge number of data points and that our operations > are on random rows, it would be possible to quickly develop the > situation you describe: the cache hit-ratio crashes, and each and > every B-tree traversal requires us to pull a page off disk. This > makes a deep tree traversal very expensive, as each moving across > each level of the tree requires I/O. > > Now consider the hash system. We setup 100 tables in a database and > use a hash of the key to figure out which table to access. From > there, it is more or less the same thing. > > What we're doing is cutting the top of the B-tree off and reducing it > to 100 (or whatever) sub-trees. The whole "super-tree" that we've cut > off is replaced by the hash algorithm, allowing us to jump right to > the correct sub-tree and start our tree traversal there. This skips > traversal of the layers that make up the "super-tree" structure, > saving on the I/O. > > At least in theory. > > The problem is two fold. This is dependent on the cache-replacement > algorithm used by SQLite (of which I know nothing about), but in > theory the nodes that make up the "super-tree" are exactly the nodes > you would expect to remain in the cache. They're used by every > lookup on the dataset, after all. Even if the replacement algorithm > is random and they're written over, they're going to be pulled back > in soon enough. > > Second, if I understand the BTree node structure used in SQLite, a > single node can hold a fairly hefty number of children. This is not a > binary tree, that's for sure. This means you're not cutting off all > that many layers, even with 100 or more tables, which means you're > not actually going to see a ton of savings. > > Overall, I agree that the OP will likely see a noticeable improvement > if they boost the cache size. Bumping it up 10x or even 100x on a > modern workstation is not that big of a deal (100x ~= 300MB if you > start with the default 1K page and 2000 page cache). We see the same > thing when you build a big index (which is accessing semi-random > data and doing a lot of tree node shuffling). The best way to > increase index build performance is to boost the cache size so that > as much of the Btree as possible is always in RAM. I suspect this is > much the same case. > -- View this message in context: http://www.nabble.com/very-large-SQLite-tables-tp24201098p24224634.html Sent from the SQLite mailing list archive at Nabble.com. _______________________________________________ sqlite-users mailing list [email protected] http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

