Has anyone here attempted to run a delta of any sort on a table of 1m+ records?
I ask because, about a year ago, I tried using the date-based delta indexing on a table of ours that was about 3 million records. This table normally takes TS about an hour to index. Using the datetime deltas improved performance by a factor of about 2x, so indexing went to about half an hour instead of an hour. But, taking the numerous tables we have to index, the datetime delta index still ended up running more than an hour total. I believe it was slow because MySQL just isn't good at finding records by a datestamp in an enormous table, but I'm not sure exactly why the performance was what it was. At any rate, I've investigated all three of the delta mechanisms offered by TS, and from what I can tell, all rely on calling Sphinx's indexer on a database table, and if my previous experience was any indication, this won't be fast enough to use for regularly updating records on our huge tables. Thus, I'm inquiring whether anyone else has tried using delta indexing with 0.9.9 on tables that are millions of records and GBs of data, and whether your experience is similar or different than mine was? If it looks like delta performance is still going to be slow-ish, I'll probably end up hand-building something that uses Sphinx's xmlpipe2 and I'll just create the index data myself from an ongoing task, similar to the way delayed job seems to work, but without calling Sphinx's indexer on our DB table. Thanks for any insights, Bill
-- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
