On 8/20/10, Cory Nelson <phro...@gmail.com> wrote:
> +1 for this feature request.  They've got a very specific and fairly
> rare use case, but when opportunity strikes partial indexes are much
> more convenient, straightforward, and efficient than the alternative.
>
> -   If a table has 100,000,000 rows and each index page for it holds
> 100 rows, a full index will need 1,000,000 pages of storage and 4 page
> reads to search.
> -   With a partial index covering only 10,000 rows of the same table,
> it will only need 100 pages of storage and 2 page reads to search.
>
> Big improvement while keeping query syntax and results exactly the same!
>
>> Are there any known hacks to implement something similar?
>
> There's not really any good solution.  You can create a separate table
> with the subset in it, that's pretty much it.  But that has some
> overhead too, and can complicate your queries.

A while back I needed something like a partial index.  The backup
program I'm working on does block dedup, so I made an index on each
block's SHA1.  But then I wanted to only dedup some blocks, not all.
I needed to say "insert this block row and also index the sha in the
sha index", or "insert this block row but don't index the sha in the
sha index".  Then I wanted to be able to say "is X in the sha index?",
ie, only look in the index.  SQL doesn't really have a way to do that.
 It's a simple concept, but it doesn't seem to fit within the SQL
framework.

The best I could come up with is a separate table.  The problem is,
indexing the SHA1 normally means there is a copy in the row and a copy
in the index.  Using a separate table, which still has to be indexed,
means there is a copy in the row of the main table, a copy in the
separate table, and a copy in the separate table's index.

I guess one way to do this using the SQL partial index feature being
discussed would be to have an extra column in the row called
"isindexed", setting that to 1 or 0 depending on whether the SHA
should be included in the index, and using create index ... where
isindexed=1.  Then on the queries, say "select blockid from blocks
where isindexed=1 and sha=X".  And any query that didn't have
isindexed=1 wouldn't be able to use the index at all.

Is that how it would work?

Jim
--
HashBackup: easy onsite and offsite Unix backup
http://sites.google.com/site/hashbackup
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to