Hello, Jérôme

Nice to hear you finally joined us with this really interesting discussion )


>
> To Max Vlasov:
>
> > in sorted order to sqlite base other 5 minutes, so about 10 minutes it
> > total. First 5 minutes was possible since we exchange only offsets,
> > not data
> > and other 5 minutes since inserting sorted data into B -tree is really a
> > fast operation.
> .
> Nice solution (of the type I already fiddled around, actually, as you
> can imagine).
> This variant still poses 2 problems for me:
>
> 1) Its workability is RAM-limited, and therefore not necessarily robust
> to an
> increase in dataset size beyond a certain limit I am already close to
> (Win32-based
> processes are still bound to max. 2GB/process, unfortunately).
>
> 2) I need to create 2 indices on 2 different columns whose contents is
> totally
> uncorrelated with respect to sort order. Your solution would nicely
> reduce indexing time
> of the 1st column but what about the 2nd one ?...
>
>
>
You addressed real problems, and my when I try to run my test on a system
with lower RAM the results confirms these observations. But at least we
found some way to increase the speed for some datasets and some hardware
systems. Maybe some other approaches can improve the solution. The
suggestion about using RAM drive form Ibrahim for example was interesting,
I'd also mention for example using different hard drives together with merge
sort, but all these solutions breaks the beauty of sqlite imho, and as a
consequence the flexibiliy.

But the second one is really hard to solve, that's where sqlite internally
could take advantage of low-level data access, but I doubt this is an easy
task. I suppose making any special sorting with direct file access can even
break the beauty of vdbe not mentioning the danger of changing the code
significantly

By the way, you didn't mention the size of your "fingeprints". So can you
calculate the average index record size or total index size in case of your
20M records?

Max
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to