We evaluated index merging with datetime deltas and in the end rewrote
the ts:delta:index tasks to not use index merging.  (We also avoided
loading the whole environment every X minutes and just have a shell
script perform the delta indexing.)

The biggest problem with delta index merging is that in memory
attributes are lost during the merge.  Currently TS is great and
updates attributes right away via the API, however Sphinx only stores
these updates in memory and doesn't persist to disk until the searchd
process is shutdown gracefully.  During a merge sphinx uses the disk
version of the attributes instead of the in memory versions.  So some
updated attributes are lost and revert back to the original disk
version.  Normal attributes should be fine because the correct values
will appear in the delta index and override the main index however
this isn't the case for deleted records.  They magically re-appear
after a merge.

Here is a quick example

1. Record A exists in the main index.
2. Record A gets deleted and the sphinx index set an in-memory copy of
the 'deleted' attribute to true.  (disk version is false)
3. Delta Index occurs - and does not contain Record A because it was
deleted
5. Merge occurs - Attributes are reloaded from disk and Record A
appears in the index with a 'deleted' attribute of false.  (reloads
from disk)

For us, not doing the merge is not a big deal because our complete
index is still relatively quick.  About once a day we run a full
index.  The only downside of doing this is that text changes appear
for both the new value and the old value until the complete index
runs.  Not really that big of a deal for us.

We ran into a couple issues with incorrect facet counts with datetime
deltas but resolved those using a killist[1][2].  It may provide some
use for index merging. [3]

-------
[1] http://www.sphinxsearch.com/docs/current.html#conf-sql-query-killlist

[2] We forked TS and added killlist integration.  We have been running
it in production for over 6 months with no problems.
http://github.com/adamcooper/thinking-sphinx/tree/killlist_integration
&& http://github.com/adamcooper/ts-datetime-delta/tree/killlist_integration

[3] http://www.sphinxsearch.com/forum/view.html?id=3501   - Not
exactly the best link but you can also find this option on the command
line for CLI.  I really don't have clue as to what this option does
though.  ;)

On Jul 27, 8:34 pm, Pat Allan <[email protected]> wrote:
> At this point, only the datetime deltas use merging... when I tried to get 
> merging working with standard deltas, I found it unreliable. That was over a 
> year ago, though, so maybe things have changed.
>
> However, with real-time indexes now appearing in the latest Sphinx beta, 
> hopefully we can get that into Thinking Sphinx over the coming months.
>
> --
> Pat
>
> On 28/07/2010, at 1:28 PM, nnn wrote:
>
>
>
> > =quote fromhttp://www.sphinxsearch.com/docs/current.html
> > 3.12. Index merging
>
> > Merging two existing indexes can be more efficient that indexing the
> > data from scratch, and desired in some cases (such as merging 'main'
> > and 'delta' indexes instead of simply reindexing 'main' in 'main
> > +delta' partitioning scheme). So indexer has an option to do that.
> > Merging the indexes is normally faster than reindexing but still not
> > instant on huge indexes. Basically, it will need to read the contents
> > of both indexes once and write the result once. Merging 100 GB and 1
> > GB index, for example, will result in 202 GB of IO (but that's still
> > likely less than the indexing from scratch requires).
>
> > The basic command syntax is as follows:
>
> > indexer --merge DSTINDEX SRCINDEX [--rotate]
> > =
>
> > I think if I can merge delta data into main data, then we don't have
> > to reindex everyday, that would be great. but I saw the post[1], Pat
> > said something like that[2].)
>
> > it seems delta doesn't merge into man index? we need reindex everyday
> > or every few hours?
>
> > --------
> > 1:(http://groups.google.com/group/thinking-sphinx/browse_thread/thread/
> > cdf0c676177d336a/83d756b221ba3a16?lnk=raot)
>
> > 2:(If you were running the rake task every two hours, then changes
> > made in that first hour after you run it will not be caught, until you
> > do a full reindex (which should still happen regularly - once a day,
> > perhaps?)
>
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "Thinking Sphinx" group.
> > To post to this group, send email to [email protected].
> > To unsubscribe from this group, send email to 
> > [email protected].
> > For more options, visit this group 
> > athttp://groups.google.com/group/thinking-sphinx?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Thinking Sphinx" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/thinking-sphinx?hl=en.

Reply via email to