Hello, Jérôme Nice to hear you finally joined us with this really interesting discussion )
> > To Max Vlasov: > > > in sorted order to sqlite base other 5 minutes, so about 10 minutes it > > total. First 5 minutes was possible since we exchange only offsets, > > not data > > and other 5 minutes since inserting sorted data into B -tree is really a > > fast operation. > . > Nice solution (of the type I already fiddled around, actually, as you > can imagine). > This variant still poses 2 problems for me: > > 1) Its workability is RAM-limited, and therefore not necessarily robust > to an > increase in dataset size beyond a certain limit I am already close to > (Win32-based > processes are still bound to max. 2GB/process, unfortunately). > > 2) I need to create 2 indices on 2 different columns whose contents is > totally > uncorrelated with respect to sort order. Your solution would nicely > reduce indexing time > of the 1st column but what about the 2nd one ?... > > > You addressed real problems, and my when I try to run my test on a system with lower RAM the results confirms these observations. But at least we found some way to increase the speed for some datasets and some hardware systems. Maybe some other approaches can improve the solution. The suggestion about using RAM drive form Ibrahim for example was interesting, I'd also mention for example using different hard drives together with merge sort, but all these solutions breaks the beauty of sqlite imho, and as a consequence the flexibiliy. But the second one is really hard to solve, that's where sqlite internally could take advantage of low-level data access, but I doubt this is an easy task. I suppose making any special sorting with direct file access can even break the beauty of vdbe not mentioning the danger of changing the code significantly By the way, you didn't mention the size of your "fingeprints". So can you calculate the average index record size or total index size in case of your 20M records? Max _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users