Has anyone here attempted to run a delta of any sort on a table of 1m+
records?

I ask because, about a year ago, I tried using the date-based delta
indexing on a table of ours that was about 3 million records.  This
table normally takes TS about an hour to index.  Using the datetime
deltas improved performance by a factor of about 2x, so indexing went
to about half an hour instead of an hour.  But, taking the numerous
tables we have to index, the datetime delta index still ended up
running more than an hour total.

I believe it was slow because MySQL just isn't good at finding records
by a datestamp in an enormous table, but I'm not sure exactly why the
performance was what it was.

At any rate, I've investigated all three of the delta mechanisms
offered by TS, and from what I can tell, all rely on calling Sphinx's
indexer on a database table, and if my previous experience was any
indication, this won't be fast enough to use for regularly updating
records on our huge tables.

Thus, I'm inquiring whether anyone else has tried using delta indexing
with 0.9.9 on tables that are millions of records and GBs of data, and
whether your experience is similar or different than mine was?

If it looks like delta performance is still going to be slow-ish, I'll
probably end up hand-building something that uses Sphinx's xmlpipe2
and I'll just create the index data myself from an ongoing task,
similar to the way delayed job seems to work, but without calling
Sphinx's indexer on our DB table.

Thanks for any insights,
Bill

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.


Reply via email to