We evaluated index merging with datetime deltas and in the end rewrote the ts:delta:index tasks to not use index merging. (We also avoided loading the whole environment every X minutes and just have a shell script perform the delta indexing.)
The biggest problem with delta index merging is that in memory attributes are lost during the merge. Currently TS is great and updates attributes right away via the API, however Sphinx only stores these updates in memory and doesn't persist to disk until the searchd process is shutdown gracefully. During a merge sphinx uses the disk version of the attributes instead of the in memory versions. So some updated attributes are lost and revert back to the original disk version. Normal attributes should be fine because the correct values will appear in the delta index and override the main index however this isn't the case for deleted records. They magically re-appear after a merge. Here is a quick example 1. Record A exists in the main index. 2. Record A gets deleted and the sphinx index set an in-memory copy of the 'deleted' attribute to true. (disk version is false) 3. Delta Index occurs - and does not contain Record A because it was deleted 5. Merge occurs - Attributes are reloaded from disk and Record A appears in the index with a 'deleted' attribute of false. (reloads from disk) For us, not doing the merge is not a big deal because our complete index is still relatively quick. About once a day we run a full index. The only downside of doing this is that text changes appear for both the new value and the old value until the complete index runs. Not really that big of a deal for us. We ran into a couple issues with incorrect facet counts with datetime deltas but resolved those using a killist[1][2]. It may provide some use for index merging. [3] ------- [1] http://www.sphinxsearch.com/docs/current.html#conf-sql-query-killlist [2] We forked TS and added killlist integration. We have been running it in production for over 6 months with no problems. http://github.com/adamcooper/thinking-sphinx/tree/killlist_integration && http://github.com/adamcooper/ts-datetime-delta/tree/killlist_integration [3] http://www.sphinxsearch.com/forum/view.html?id=3501 - Not exactly the best link but you can also find this option on the command line for CLI. I really don't have clue as to what this option does though. ;) On Jul 27, 8:34 pm, Pat Allan <[email protected]> wrote: > At this point, only the datetime deltas use merging... when I tried to get > merging working with standard deltas, I found it unreliable. That was over a > year ago, though, so maybe things have changed. > > However, with real-time indexes now appearing in the latest Sphinx beta, > hopefully we can get that into Thinking Sphinx over the coming months. > > -- > Pat > > On 28/07/2010, at 1:28 PM, nnn wrote: > > > > > =quote fromhttp://www.sphinxsearch.com/docs/current.html > > 3.12. Index merging > > > Merging two existing indexes can be more efficient that indexing the > > data from scratch, and desired in some cases (such as merging 'main' > > and 'delta' indexes instead of simply reindexing 'main' in 'main > > +delta' partitioning scheme). So indexer has an option to do that. > > Merging the indexes is normally faster than reindexing but still not > > instant on huge indexes. Basically, it will need to read the contents > > of both indexes once and write the result once. Merging 100 GB and 1 > > GB index, for example, will result in 202 GB of IO (but that's still > > likely less than the indexing from scratch requires). > > > The basic command syntax is as follows: > > > indexer --merge DSTINDEX SRCINDEX [--rotate] > > = > > > I think if I can merge delta data into main data, then we don't have > > to reindex everyday, that would be great. but I saw the post[1], Pat > > said something like that[2].) > > > it seems delta doesn't merge into man index? we need reindex everyday > > or every few hours? > > > -------- > > 1:(http://groups.google.com/group/thinking-sphinx/browse_thread/thread/ > > cdf0c676177d336a/83d756b221ba3a16?lnk=raot) > > > 2:(If you were running the rake task every two hours, then changes > > made in that first hour after you run it will not be caught, until you > > do a full reindex (which should still happen regularly - once a day, > > perhaps?) > > > -- > > You received this message because you are subscribed to the Google Groups > > "Thinking Sphinx" group. > > To post to this group, send email to [email protected]. > > To unsubscribe from this group, send email to > > [email protected]. > > For more options, visit this group > > athttp://groups.google.com/group/thinking-sphinx?hl=en. -- You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
