--- [EMAIL PROTECTED] wrote: > Gerry Snyder <[EMAIL PROTECTED]> wrote: > > Chris Jones wrote: > > > Hi all, > > > > > > I have a very simple schema. I need to assign a unique identifier to a > > > large collection of strings, each at most 80-bytes, although typically > > > shorter. > > > > > > The problem is I have 112 million of them. > > > > Maybe you could start by breaking the data into 8 equal groups and make > > a table of each group. Then merge the original groups pairwise, then > > merge those 4 groups, and finally the 2 semifinal groups (kinda like > > March Madness, come to think of it). Since each merging will be of > > already sorted/indexed data, it might save a lot of time. > > > > This is the right idea. > > The problem is that your working set is bigger than your cache > which is causing thrashing. I suggest a solution like this: > > Add entries to table ONE until the table and its unique index get > so big that they no longer fit in cache all at once. Then > transfer ONE into TWO like this: > > INSERT INTO two SELECT * FROM one ORDER BY unique_column; > > The ORDER BY is important here. > > Do the above a time or two until TWO is signficantly larger than ONE. > Then do the same into TWO_B. Later combine TWO and TWO_B into THREE: > > INSERT INTO three SELECT * FROM two ORDER BY unique_column; > INSERT INTO three SELECT * FROM two_b ORDER BY unique_column; > > Repeat as necessary for FOUR, FIVE, SIX and so forth.
I've tried something like the algorithm you've proposed and it's much slower than pre-sorting all the data prior to bulk insert. It may have something to do with the cost of multiple repeated inserts for each original row. Here's a different attempt at speeding up bulk insert following your suggestion from Ticket 2075: http://www.mail-archive.com/sqlite-users%40sqlite.org/msg22143.html Any suggestions to speed it up are welcome. Sample insert speed test included. ____________________________________________________________________________________ Don't get soaked. Take a quick peek at the forecast with the Yahoo! Search weather shortcut. http://tools.search.yahoo.com/shortcuts/#loc_weather ----------------------------------------------------------------------------- To unsubscribe, send email to [EMAIL PROTECTED] -----------------------------------------------------------------------------